Training: 2022-04-27 01:46:27,189-rank_id: 0
Training: 2022-04-27 01:46:53,870-: margin_list              [1.0, 0.0, 0.4]
Training: 2022-04-27 01:46:53,871-: network                  r100
Training: 2022-04-27 01:46:53,871-: resume                   False
Training: 2022-04-27 01:46:53,871-: output                   work_dirs/wf12m_pfc02_r100
Training: 2022-04-27 01:46:53,871-: embedding_size           512
Training: 2022-04-27 01:46:53,871-: sample_rate              0.2
Training: 2022-04-27 01:46:53,871-: interclass_filtering_threshold0
Training: 2022-04-27 01:46:53,871-: fp16                     True
Training: 2022-04-27 01:46:53,871-: batch_size               128
Training: 2022-04-27 01:46:53,871-: optimizer                sgd
Training: 2022-04-27 01:46:53,872-: lr                       0.1
Training: 2022-04-27 01:46:53,872-: momentum                 0.9
Training: 2022-04-27 01:46:53,872-: weight_decay             0.0005
Training: 2022-04-27 01:46:53,872-: verbose                  2000
Training: 2022-04-27 01:46:53,872-: frequent                 10
Training: 2022-04-27 01:46:53,872-: dali                     False
Training: 2022-04-27 01:46:53,872-: rec                      /train_tmp/WebFace12M
Training: 2022-04-27 01:46:53,872-: num_classes              617970
Training: 2022-04-27 01:46:53,872-: num_image                12720066
Training: 2022-04-27 01:46:53,872-: num_epoch                20
Training: 2022-04-27 01:46:53,872-: warmup_epoch             0
Training: 2022-04-27 01:46:53,872-: val_targets              []
Training: 2022-04-27 01:46:53,872-: total_batch_size         1024
Training: 2022-04-27 01:46:53,872-: warmup_step              0
Training: 2022-04-27 01:46:53,872-: total_step               248420
Training: 2022-04-27 01:47:18,732-Reducer buckets have been rebuilt in this iteration.
Training: 2022-04-27 01:47:24,297-Speed 3331.84 samples/sec   Loss 41.3790   LearningRate 0.1000   Epoch: 0   Global Step: 20   Fp16 Grad Scale: 8192   Required: 100 hours
Training: 2022-04-27 01:47:27,387-Speed 3314.89 samples/sec   Loss 42.6057   LearningRate 0.1000   Epoch: 0   Global Step: 30   Fp16 Grad Scale: 8192   Required: 75 hours
Training: 2022-04-27 01:47:30,454-Speed 3340.67 samples/sec   Loss 43.2166   LearningRate 0.1000   Epoch: 0   Global Step: 40   Fp16 Grad Scale: 8192   Required: 62 hours
Training: 2022-04-27 01:47:33,520-Speed 3340.52 samples/sec   Loss 43.3005   LearningRate 0.1000   Epoch: 0   Global Step: 50   Fp16 Grad Scale: 8192   Required: 54 hours
Training: 2022-04-27 01:47:36,601-Speed 3325.10 samples/sec   Loss 43.7820   LearningRate 0.1000   Epoch: 0   Global Step: 60   Fp16 Grad Scale: 8192   Required: 48 hours
Training: 2022-04-27 01:47:39,605-Speed 3410.16 samples/sec   Loss 42.6988   LearningRate 0.0999   Epoch: 0   Global Step: 70   Fp16 Grad Scale: 8192   Required: 44 hours
Training: 2022-04-27 01:47:42,633-Speed 3381.98 samples/sec   Loss 42.7705   LearningRate 0.0999   Epoch: 0   Global Step: 80   Fp16 Grad Scale: 8192   Required: 42 hours
Training: 2022-04-27 01:47:45,631-Speed 3417.21 samples/sec   Loss 42.5426   LearningRate 0.0999   Epoch: 0   Global Step: 90   Fp16 Grad Scale: 8192   Required: 39 hours
Training: 2022-04-27 01:47:48,654-Speed 3388.00 samples/sec   Loss 42.8206   LearningRate 0.0999   Epoch: 0   Global Step: 100   Fp16 Grad Scale: 8192   Required: 37 hours
Training: 2022-04-27 01:47:51,678-Speed 3388.02 samples/sec   Loss 42.6233   LearningRate 0.0999   Epoch: 0   Global Step: 110   Fp16 Grad Scale: 16384   Required: 36 hours
Training: 2022-04-27 01:47:54,731-Speed 3354.10 samples/sec   Loss 42.6284   LearningRate 0.0999   Epoch: 0   Global Step: 120   Fp16 Grad Scale: 16384   Required: 35 hours
Training: 2022-04-27 01:47:57,877-Speed 3256.49 samples/sec   Loss 42.5117   LearningRate 0.0999   Epoch: 0   Global Step: 130   Fp16 Grad Scale: 16384   Required: 34 hours
Training: 2022-04-27 01:48:00,935-Speed 3349.53 samples/sec   Loss 42.2593   LearningRate 0.0999   Epoch: 0   Global Step: 140   Fp16 Grad Scale: 16384   Required: 33 hours
Training: 2022-04-27 01:48:04,033-Speed 3306.98 samples/sec   Loss 42.1054   LearningRate 0.0999   Epoch: 0   Global Step: 150   Fp16 Grad Scale: 16384   Required: 32 hours
Training: 2022-04-27 01:48:07,480-Speed 2971.04 samples/sec   Loss 42.0537   LearningRate 0.0999   Epoch: 0   Global Step: 160   Fp16 Grad Scale: 16384   Required: 32 hours
Training: 2022-04-27 01:48:10,476-Speed 3419.24 samples/sec   Loss 41.9583   LearningRate 0.0999   Epoch: 0   Global Step: 170   Fp16 Grad Scale: 16384   Required: 31 hours
Training: 2022-04-27 01:48:13,571-Speed 3309.32 samples/sec   Loss 41.9055   LearningRate 0.0999   Epoch: 0   Global Step: 180   Fp16 Grad Scale: 16384   Required: 30 hours
Training: 2022-04-27 01:48:16,627-Speed 3351.85 samples/sec   Loss 41.9170   LearningRate 0.0998   Epoch: 0   Global Step: 190   Fp16 Grad Scale: 16384   Required: 30 hours
Training: 2022-04-27 01:48:19,665-Speed 3372.34 samples/sec   Loss 41.7816   LearningRate 0.0998   Epoch: 0   Global Step: 200   Fp16 Grad Scale: 16384   Required: 29 hours
Training: 2022-04-27 01:48:22,682-Speed 3394.59 samples/sec   Loss 41.7735   LearningRate 0.0998   Epoch: 0   Global Step: 210   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-27 01:48:25,757-Speed 3330.61 samples/sec   Loss 41.6515   LearningRate 0.0998   Epoch: 0   Global Step: 220   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-04-27 01:48:28,790-Speed 3378.14 samples/sec   Loss 41.6399   LearningRate 0.0998   Epoch: 0   Global Step: 230   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-27 01:48:31,779-Speed 3426.67 samples/sec   Loss 41.4791   LearningRate 0.0998   Epoch: 0   Global Step: 240   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-27 01:48:34,793-Speed 3398.66 samples/sec   Loss 41.3228   LearningRate 0.0998   Epoch: 0   Global Step: 250   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-04-27 01:48:37,891-Speed 3306.22 samples/sec   Loss 41.2849   LearningRate 0.0998   Epoch: 0   Global Step: 260   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-27 01:48:40,944-Speed 3355.17 samples/sec   Loss 41.1821   LearningRate 0.0998   Epoch: 0   Global Step: 270   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-27 01:48:43,958-Speed 3398.95 samples/sec   Loss 41.1447   LearningRate 0.0998   Epoch: 0   Global Step: 280   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-27 01:48:46,954-Speed 3419.19 samples/sec   Loss 41.0849   LearningRate 0.0998   Epoch: 0   Global Step: 290   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-27 01:48:50,015-Speed 3345.62 samples/sec   Loss 41.1176   LearningRate 0.0998   Epoch: 0   Global Step: 300   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-04-27 01:48:53,016-Speed 3414.02 samples/sec   Loss 41.1187   LearningRate 0.0998   Epoch: 0   Global Step: 310   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-27 01:48:56,043-Speed 3383.86 samples/sec   Loss 41.0176   LearningRate 0.0997   Epoch: 0   Global Step: 320   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-27 01:48:59,103-Speed 3347.96 samples/sec   Loss 40.9962   LearningRate 0.0997   Epoch: 0   Global Step: 330   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-27 01:49:02,149-Speed 3362.75 samples/sec   Loss 40.9790   LearningRate 0.0997   Epoch: 0   Global Step: 340   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-27 01:49:05,158-Speed 3403.64 samples/sec   Loss 40.8196   LearningRate 0.0997   Epoch: 0   Global Step: 350   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-27 01:49:08,178-Speed 3392.30 samples/sec   Loss 40.8698   LearningRate 0.0997   Epoch: 0   Global Step: 360   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-27 01:49:11,201-Speed 3388.15 samples/sec   Loss 40.7770   LearningRate 0.0997   Epoch: 0   Global Step: 370   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-04-27 01:49:14,233-Speed 3378.78 samples/sec   Loss 40.7694   LearningRate 0.0997   Epoch: 0   Global Step: 380   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-27 01:49:17,248-Speed 3396.95 samples/sec   Loss 40.6589   LearningRate 0.0997   Epoch: 0   Global Step: 390   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-27 01:49:20,266-Speed 3393.94 samples/sec   Loss 40.5825   LearningRate 0.0997   Epoch: 0   Global Step: 400   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-27 01:49:23,349-Speed 3323.52 samples/sec   Loss 40.6289   LearningRate 0.0997   Epoch: 0   Global Step: 410   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-27 01:49:26,504-Speed 3246.80 samples/sec   Loss 40.6395   LearningRate 0.0997   Epoch: 0   Global Step: 420   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-27 01:49:29,607-Speed 3300.66 samples/sec   Loss 40.5211   LearningRate 0.0997   Epoch: 0   Global Step: 430   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-27 01:49:32,674-Speed 3339.67 samples/sec   Loss 40.4601   LearningRate 0.0996   Epoch: 0   Global Step: 440   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-27 01:49:35,785-Speed 3293.61 samples/sec   Loss 40.3800   LearningRate 0.0996   Epoch: 0   Global Step: 450   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-27 01:49:38,829-Speed 3364.14 samples/sec   Loss 40.3778   LearningRate 0.0996   Epoch: 0   Global Step: 460   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-27 01:49:41,876-Speed 3361.92 samples/sec   Loss 40.3221   LearningRate 0.0996   Epoch: 0   Global Step: 470   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-04-27 01:49:44,894-Speed 3395.10 samples/sec   Loss 40.3445   LearningRate 0.0996   Epoch: 0   Global Step: 480   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-27 01:49:47,947-Speed 3354.93 samples/sec   Loss 40.2169   LearningRate 0.0996   Epoch: 0   Global Step: 490   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-27 01:49:50,965-Speed 3393.11 samples/sec   Loss 40.2448   LearningRate 0.0996   Epoch: 0   Global Step: 500   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-27 01:49:54,026-Speed 3346.71 samples/sec   Loss 40.1101   LearningRate 0.0996   Epoch: 0   Global Step: 510   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:49:57,027-Speed 3413.91 samples/sec   Loss 40.0999   LearningRate 0.0996   Epoch: 0   Global Step: 520   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:50:00,080-Speed 3354.44 samples/sec   Loss 40.1136   LearningRate 0.0996   Epoch: 0   Global Step: 530   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:50:03,132-Speed 3356.51 samples/sec   Loss 40.0496   LearningRate 0.0996   Epoch: 0   Global Step: 540   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:50:06,220-Speed 3317.39 samples/sec   Loss 40.0210   LearningRate 0.0996   Epoch: 0   Global Step: 550   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:50:09,261-Speed 3368.73 samples/sec   Loss 39.9529   LearningRate 0.0995   Epoch: 0   Global Step: 560   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:50:12,299-Speed 3370.99 samples/sec   Loss 39.9604   LearningRate 0.0995   Epoch: 0   Global Step: 570   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-04-27 01:50:15,364-Speed 3342.34 samples/sec   Loss 39.8885   LearningRate 0.0995   Epoch: 0   Global Step: 580   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-27 01:50:18,379-Speed 3397.49 samples/sec   Loss 39.8565   LearningRate 0.0995   Epoch: 0   Global Step: 590   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-27 01:50:21,414-Speed 3375.32 samples/sec   Loss 39.9028   LearningRate 0.0995   Epoch: 0   Global Step: 600   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-27 01:50:24,443-Speed 3381.12 samples/sec   Loss 39.8180   LearningRate 0.0995   Epoch: 0   Global Step: 610   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-27 01:50:27,484-Speed 3368.89 samples/sec   Loss 39.7620   LearningRate 0.0995   Epoch: 0   Global Step: 620   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-04-27 01:50:30,504-Speed 3391.83 samples/sec   Loss 39.6574   LearningRate 0.0995   Epoch: 0   Global Step: 630   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-27 01:50:33,518-Speed 3398.10 samples/sec   Loss 39.7127   LearningRate 0.0995   Epoch: 0   Global Step: 640   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-27 01:50:36,637-Speed 3284.84 samples/sec   Loss 39.6027   LearningRate 0.0995   Epoch: 0   Global Step: 650   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-27 01:50:39,666-Speed 3381.34 samples/sec   Loss 39.6790   LearningRate 0.0995   Epoch: 0   Global Step: 660   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-04-27 01:50:42,697-Speed 3379.43 samples/sec   Loss 39.6265   LearningRate 0.0995   Epoch: 0   Global Step: 670   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-27 01:50:45,695-Speed 3416.99 samples/sec   Loss 39.5278   LearningRate 0.0995   Epoch: 0   Global Step: 680   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-27 01:50:48,722-Speed 3383.60 samples/sec   Loss 39.5224   LearningRate 0.0994   Epoch: 0   Global Step: 690   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-27 01:50:51,750-Speed 3383.18 samples/sec   Loss 39.4829   LearningRate 0.0994   Epoch: 0   Global Step: 700   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-27 01:50:54,796-Speed 3363.04 samples/sec   Loss 39.4744   LearningRate 0.0994   Epoch: 0   Global Step: 710   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-27 01:50:57,817-Speed 3390.30 samples/sec   Loss 39.3605   LearningRate 0.0994   Epoch: 0   Global Step: 720   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-27 01:51:00,853-Speed 3374.14 samples/sec   Loss 39.3383   LearningRate 0.0994   Epoch: 0   Global Step: 730   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 01:51:03,929-Speed 3330.10 samples/sec   Loss 39.3132   LearningRate 0.0994   Epoch: 0   Global Step: 740   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 01:51:06,971-Speed 3366.98 samples/sec   Loss 39.3251   LearningRate 0.0994   Epoch: 0   Global Step: 750   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 01:51:09,959-Speed 3428.76 samples/sec   Loss 39.3430   LearningRate 0.0994   Epoch: 0   Global Step: 760   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-27 01:51:13,023-Speed 3342.74 samples/sec   Loss 39.1818   LearningRate 0.0994   Epoch: 0   Global Step: 770   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-27 01:51:16,115-Speed 3313.41 samples/sec   Loss 39.1515   LearningRate 0.0994   Epoch: 0   Global Step: 780   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-27 01:51:19,203-Speed 3316.93 samples/sec   Loss 39.1796   LearningRate 0.0994   Epoch: 0   Global Step: 790   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-27 01:51:22,240-Speed 3372.09 samples/sec   Loss 39.1013   LearningRate 0.0994   Epoch: 0   Global Step: 800   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-27 01:51:25,307-Speed 3340.31 samples/sec   Loss 39.0563   LearningRate 0.0993   Epoch: 0   Global Step: 810   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-27 01:51:28,432-Speed 3277.86 samples/sec   Loss 38.9651   LearningRate 0.0993   Epoch: 0   Global Step: 820   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-27 01:51:31,510-Speed 3327.87 samples/sec   Loss 38.9890   LearningRate 0.0993   Epoch: 0   Global Step: 830   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-27 01:51:34,585-Speed 3330.86 samples/sec   Loss 38.9750   LearningRate 0.0993   Epoch: 0   Global Step: 840   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-27 01:51:37,648-Speed 3343.88 samples/sec   Loss 38.9340   LearningRate 0.0993   Epoch: 0   Global Step: 850   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-27 01:51:40,777-Speed 3273.99 samples/sec   Loss 38.8314   LearningRate 0.0993   Epoch: 0   Global Step: 860   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 01:51:43,820-Speed 3366.09 samples/sec   Loss 38.8764   LearningRate 0.0993   Epoch: 0   Global Step: 870   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-04-27 01:51:46,853-Speed 3377.65 samples/sec   Loss 38.8523   LearningRate 0.0993   Epoch: 0   Global Step: 880   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-27 01:51:49,993-Speed 3261.78 samples/sec   Loss 38.8357   LearningRate 0.0993   Epoch: 0   Global Step: 890   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-27 01:51:53,103-Speed 3293.51 samples/sec   Loss 38.7631   LearningRate 0.0993   Epoch: 0   Global Step: 900   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-27 01:51:56,182-Speed 3326.88 samples/sec   Loss 38.6058   LearningRate 0.0993   Epoch: 0   Global Step: 910   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-27 01:51:59,225-Speed 3366.49 samples/sec   Loss 38.6693   LearningRate 0.0993   Epoch: 0   Global Step: 920   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-27 01:52:02,355-Speed 3271.68 samples/sec   Loss 38.6394   LearningRate 0.0993   Epoch: 0   Global Step: 930   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-27 01:52:05,429-Speed 3332.18 samples/sec   Loss 38.6148   LearningRate 0.0992   Epoch: 0   Global Step: 940   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-27 01:52:08,463-Speed 3376.67 samples/sec   Loss 38.6299   LearningRate 0.0992   Epoch: 0   Global Step: 950   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-27 01:52:11,495-Speed 3378.12 samples/sec   Loss 38.6108   LearningRate 0.0992   Epoch: 0   Global Step: 960   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-27 01:52:14,596-Speed 3303.34 samples/sec   Loss 38.4830   LearningRate 0.0992   Epoch: 0   Global Step: 970   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-27 01:52:17,694-Speed 3306.08 samples/sec   Loss 38.4203   LearningRate 0.0992   Epoch: 0   Global Step: 980   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-27 01:52:20,730-Speed 3375.11 samples/sec   Loss 38.5578   LearningRate 0.0992   Epoch: 0   Global Step: 990   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-27 01:52:23,762-Speed 3377.71 samples/sec   Loss 38.4096   LearningRate 0.0992   Epoch: 0   Global Step: 1000   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-27 01:52:26,802-Speed 3370.12 samples/sec   Loss 38.3274   LearningRate 0.0992   Epoch: 0   Global Step: 1010   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-27 01:52:29,873-Speed 3334.96 samples/sec   Loss 38.3592   LearningRate 0.0992   Epoch: 0   Global Step: 1020   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-27 01:52:32,896-Speed 3389.67 samples/sec   Loss 38.1975   LearningRate 0.0992   Epoch: 0   Global Step: 1030   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-27 01:52:35,965-Speed 3337.64 samples/sec   Loss 38.2958   LearningRate 0.0992   Epoch: 0   Global Step: 1040   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-04-27 01:52:39,036-Speed 3335.76 samples/sec   Loss 38.2801   LearningRate 0.0992   Epoch: 0   Global Step: 1050   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-27 01:52:42,162-Speed 3276.35 samples/sec   Loss 38.1453   LearningRate 0.0991   Epoch: 0   Global Step: 1060   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-27 01:52:45,189-Speed 3383.86 samples/sec   Loss 38.1575   LearningRate 0.0991   Epoch: 0   Global Step: 1070   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-27 01:52:48,247-Speed 3349.54 samples/sec   Loss 38.0401   LearningRate 0.0991   Epoch: 0   Global Step: 1080   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-27 01:52:51,293-Speed 3363.33 samples/sec   Loss 38.1277   LearningRate 0.0991   Epoch: 0   Global Step: 1090   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-04-27 01:52:54,356-Speed 3344.66 samples/sec   Loss 38.1433   LearningRate 0.0991   Epoch: 0   Global Step: 1100   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-27 01:52:57,358-Speed 3411.48 samples/sec   Loss 37.9603   LearningRate 0.0991   Epoch: 0   Global Step: 1110   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-27 01:53:00,443-Speed 3319.91 samples/sec   Loss 37.9967   LearningRate 0.0991   Epoch: 0   Global Step: 1120   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-27 01:53:03,608-Speed 3236.98 samples/sec   Loss 38.0022   LearningRate 0.0991   Epoch: 0   Global Step: 1130   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-27 01:53:06,750-Speed 3259.96 samples/sec   Loss 37.8266   LearningRate 0.0991   Epoch: 0   Global Step: 1140   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-04-27 01:53:09,754-Speed 3409.34 samples/sec   Loss 37.9039   LearningRate 0.0991   Epoch: 0   Global Step: 1150   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 01:53:12,764-Speed 3403.25 samples/sec   Loss 37.8633   LearningRate 0.0991   Epoch: 0   Global Step: 1160   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 01:53:15,816-Speed 3356.33 samples/sec   Loss 37.8850   LearningRate 0.0991   Epoch: 0   Global Step: 1170   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 01:53:18,876-Speed 3348.08 samples/sec   Loss 37.7700   LearningRate 0.0991   Epoch: 0   Global Step: 1180   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 01:53:21,891-Speed 3397.95 samples/sec   Loss 37.6883   LearningRate 0.0990   Epoch: 0   Global Step: 1190   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 01:53:24,932-Speed 3368.60 samples/sec   Loss 37.5968   LearningRate 0.0990   Epoch: 0   Global Step: 1200   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 01:53:27,980-Speed 3360.91 samples/sec   Loss 37.6857   LearningRate 0.0990   Epoch: 0   Global Step: 1210   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 01:53:31,095-Speed 3287.46 samples/sec   Loss 37.6003   LearningRate 0.0990   Epoch: 0   Global Step: 1220   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 01:53:34,123-Speed 3383.61 samples/sec   Loss 37.5551   LearningRate 0.0990   Epoch: 0   Global Step: 1230   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 01:53:37,157-Speed 3376.64 samples/sec   Loss 37.5468   LearningRate 0.0990   Epoch: 0   Global Step: 1240   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 01:53:40,196-Speed 3370.93 samples/sec   Loss 37.5010   LearningRate 0.0990   Epoch: 0   Global Step: 1250   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:53:43,234-Speed 3371.57 samples/sec   Loss 37.3570   LearningRate 0.0990   Epoch: 0   Global Step: 1260   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:53:46,271-Speed 3372.80 samples/sec   Loss 37.3103   LearningRate 0.0990   Epoch: 0   Global Step: 1270   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:53:49,297-Speed 3384.68 samples/sec   Loss 37.2856   LearningRate 0.0990   Epoch: 0   Global Step: 1280   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:53:52,391-Speed 3310.67 samples/sec   Loss 37.2521   LearningRate 0.0990   Epoch: 0   Global Step: 1290   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:53:55,430-Speed 3370.76 samples/sec   Loss 37.2840   LearningRate 0.0990   Epoch: 0   Global Step: 1300   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:53:58,436-Speed 3407.77 samples/sec   Loss 37.2665   LearningRate 0.0989   Epoch: 0   Global Step: 1310   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:54:01,454-Speed 3392.80 samples/sec   Loss 37.1833   LearningRate 0.0989   Epoch: 0   Global Step: 1320   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:54:04,471-Speed 3395.87 samples/sec   Loss 37.1366   LearningRate 0.0989   Epoch: 0   Global Step: 1330   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:54:07,515-Speed 3365.28 samples/sec   Loss 37.0269   LearningRate 0.0989   Epoch: 0   Global Step: 1340   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:54:10,549-Speed 3375.31 samples/sec   Loss 37.0381   LearningRate 0.0989   Epoch: 0   Global Step: 1350   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 01:54:13,609-Speed 3347.97 samples/sec   Loss 36.9762   LearningRate 0.0989   Epoch: 0   Global Step: 1360   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 01:54:16,629-Speed 3391.88 samples/sec   Loss 36.9910   LearningRate 0.0989   Epoch: 0   Global Step: 1370   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 01:54:19,650-Speed 3390.66 samples/sec   Loss 37.0017   LearningRate 0.0989   Epoch: 0   Global Step: 1380   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 01:54:22,649-Speed 3416.04 samples/sec   Loss 36.8510   LearningRate 0.0989   Epoch: 0   Global Step: 1390   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 01:54:25,660-Speed 3401.08 samples/sec   Loss 36.8457   LearningRate 0.0989   Epoch: 0   Global Step: 1400   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 01:54:28,730-Speed 3336.58 samples/sec   Loss 36.8839   LearningRate 0.0989   Epoch: 0   Global Step: 1410   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 01:54:31,789-Speed 3348.27 samples/sec   Loss 36.7760   LearningRate 0.0989   Epoch: 0   Global Step: 1420   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 01:54:34,799-Speed 3403.26 samples/sec   Loss 36.7593   LearningRate 0.0989   Epoch: 0   Global Step: 1430   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 01:54:37,877-Speed 3328.03 samples/sec   Loss 36.7918   LearningRate 0.0988   Epoch: 0   Global Step: 1440   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:54:40,913-Speed 3374.78 samples/sec   Loss 36.6543   LearningRate 0.0988   Epoch: 0   Global Step: 1450   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:54:44,017-Speed 3299.03 samples/sec   Loss 36.6291   LearningRate 0.0988   Epoch: 0   Global Step: 1460   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:54:47,052-Speed 3375.30 samples/sec   Loss 36.5027   LearningRate 0.0988   Epoch: 0   Global Step: 1470   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:54:50,191-Speed 3263.66 samples/sec   Loss 36.5686   LearningRate 0.0988   Epoch: 0   Global Step: 1480   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:54:53,292-Speed 3303.08 samples/sec   Loss 36.6091   LearningRate 0.0988   Epoch: 0   Global Step: 1490   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:54:56,350-Speed 3349.57 samples/sec   Loss 36.5174   LearningRate 0.0988   Epoch: 0   Global Step: 1500   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:54:59,374-Speed 3387.93 samples/sec   Loss 36.4393   LearningRate 0.0988   Epoch: 0   Global Step: 1510   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:55:02,425-Speed 3357.38 samples/sec   Loss 36.4670   LearningRate 0.0988   Epoch: 0   Global Step: 1520   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:55:05,495-Speed 3336.89 samples/sec   Loss 36.2839   LearningRate 0.0988   Epoch: 0   Global Step: 1530   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:55:08,509-Speed 3397.90 samples/sec   Loss 36.3179   LearningRate 0.0988   Epoch: 0   Global Step: 1540   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:55:11,545-Speed 3373.70 samples/sec   Loss 36.2258   LearningRate 0.0988   Epoch: 0   Global Step: 1550   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:55:14,581-Speed 3374.36 samples/sec   Loss 36.2884   LearningRate 0.0987   Epoch: 0   Global Step: 1560   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:55:17,606-Speed 3385.78 samples/sec   Loss 36.2437   LearningRate 0.0987   Epoch: 0   Global Step: 1570   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:55:20,636-Speed 3380.33 samples/sec   Loss 36.1549   LearningRate 0.0987   Epoch: 0   Global Step: 1580   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:55:23,693-Speed 3351.12 samples/sec   Loss 36.0873   LearningRate 0.0987   Epoch: 0   Global Step: 1590   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:55:26,759-Speed 3340.79 samples/sec   Loss 36.0640   LearningRate 0.0987   Epoch: 0   Global Step: 1600   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:55:29,857-Speed 3306.16 samples/sec   Loss 36.0210   LearningRate 0.0987   Epoch: 0   Global Step: 1610   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:55:32,906-Speed 3360.25 samples/sec   Loss 35.9901   LearningRate 0.0987   Epoch: 0   Global Step: 1620   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:55:35,934-Speed 3382.99 samples/sec   Loss 35.8666   LearningRate 0.0987   Epoch: 0   Global Step: 1630   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:55:38,974-Speed 3369.27 samples/sec   Loss 35.9511   LearningRate 0.0987   Epoch: 0   Global Step: 1640   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 01:55:42,023-Speed 3359.23 samples/sec   Loss 35.8333   LearningRate 0.0987   Epoch: 0   Global Step: 1650   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 01:55:45,068-Speed 3364.01 samples/sec   Loss 35.9154   LearningRate 0.0987   Epoch: 0   Global Step: 1660   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 01:55:48,091-Speed 3387.69 samples/sec   Loss 35.8219   LearningRate 0.0987   Epoch: 0   Global Step: 1670   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 01:55:51,126-Speed 3375.13 samples/sec   Loss 35.8987   LearningRate 0.0987   Epoch: 0   Global Step: 1680   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:55:54,167-Speed 3368.38 samples/sec   Loss 35.6614   LearningRate 0.0986   Epoch: 0   Global Step: 1690   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:55:57,196-Speed 3382.22 samples/sec   Loss 35.7201   LearningRate 0.0986   Epoch: 0   Global Step: 1700   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:56:00,276-Speed 3325.85 samples/sec   Loss 35.6189   LearningRate 0.0986   Epoch: 0   Global Step: 1710   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:56:03,341-Speed 3342.14 samples/sec   Loss 35.7531   LearningRate 0.0986   Epoch: 0   Global Step: 1720   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:56:06,391-Speed 3358.16 samples/sec   Loss 35.5763   LearningRate 0.0986   Epoch: 0   Global Step: 1730   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:56:09,436-Speed 3363.86 samples/sec   Loss 35.5198   LearningRate 0.0986   Epoch: 0   Global Step: 1740   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:56:12,497-Speed 3346.18 samples/sec   Loss 35.5378   LearningRate 0.0986   Epoch: 0   Global Step: 1750   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:56:15,518-Speed 3390.63 samples/sec   Loss 35.3618   LearningRate 0.0986   Epoch: 0   Global Step: 1760   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:56:18,588-Speed 3336.85 samples/sec   Loss 35.3533   LearningRate 0.0986   Epoch: 0   Global Step: 1770   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:56:21,631-Speed 3366.55 samples/sec   Loss 35.5181   LearningRate 0.0986   Epoch: 0   Global Step: 1780   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 01:56:24,705-Speed 3332.11 samples/sec   Loss 35.3031   LearningRate 0.0986   Epoch: 0   Global Step: 1790   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 01:56:27,768-Speed 3343.73 samples/sec   Loss 35.1678   LearningRate 0.0986   Epoch: 0   Global Step: 1800   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 01:56:30,830-Speed 3345.71 samples/sec   Loss 35.2148   LearningRate 0.0985   Epoch: 0   Global Step: 1810   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 01:56:33,847-Speed 3395.30 samples/sec   Loss 35.1550   LearningRate 0.0985   Epoch: 0   Global Step: 1820   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 01:56:36,910-Speed 3343.66 samples/sec   Loss 35.1341   LearningRate 0.0985   Epoch: 0   Global Step: 1830   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 01:56:39,982-Speed 3335.31 samples/sec   Loss 35.0288   LearningRate 0.0985   Epoch: 0   Global Step: 1840   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 01:56:43,065-Speed 3322.75 samples/sec   Loss 35.0999   LearningRate 0.0985   Epoch: 0   Global Step: 1850   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 01:56:46,100-Speed 3374.27 samples/sec   Loss 35.1152   LearningRate 0.0985   Epoch: 0   Global Step: 1860   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 01:56:49,146-Speed 3363.28 samples/sec   Loss 34.8271   LearningRate 0.0985   Epoch: 0   Global Step: 1870   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 01:56:52,197-Speed 3358.37 samples/sec   Loss 34.9084   LearningRate 0.0985   Epoch: 0   Global Step: 1880   Fp16 Grad Scale: 524288   Required: 22 hours
Training: 2022-04-27 01:56:55,213-Speed 3395.95 samples/sec   Loss 34.9979   LearningRate 0.0985   Epoch: 0   Global Step: 1890   Fp16 Grad Scale: 524288   Required: 22 hours
Training: 2022-04-27 01:56:58,256-Speed 3366.24 samples/sec   Loss 34.9201   LearningRate 0.0985   Epoch: 0   Global Step: 1900   Fp16 Grad Scale: 524288   Required: 22 hours
Training: 2022-04-27 01:57:01,317-Speed 3346.27 samples/sec   Loss 34.7434   LearningRate 0.0985   Epoch: 0   Global Step: 1910   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 01:57:04,427-Speed 3293.71 samples/sec   Loss 34.7154   LearningRate 0.0985   Epoch: 0   Global Step: 1920   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 01:57:07,464-Speed 3372.23 samples/sec   Loss 34.7496   LearningRate 0.0985   Epoch: 0   Global Step: 1930   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:57:10,509-Speed 3363.95 samples/sec   Loss 34.6284   LearningRate 0.0984   Epoch: 0   Global Step: 1940   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:57:13,563-Speed 3354.25 samples/sec   Loss 34.6582   LearningRate 0.0984   Epoch: 0   Global Step: 1950   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:57:16,670-Speed 3297.22 samples/sec   Loss 34.5810   LearningRate 0.0984   Epoch: 0   Global Step: 1960   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:57:19,702-Speed 3378.31 samples/sec   Loss 34.5712   LearningRate 0.0984   Epoch: 0   Global Step: 1970   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:57:22,729-Speed 3383.72 samples/sec   Loss 34.5485   LearningRate 0.0984   Epoch: 0   Global Step: 1980   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:57:25,822-Speed 3312.10 samples/sec   Loss 34.5315   LearningRate 0.0984   Epoch: 0   Global Step: 1990   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:57:28,835-Speed 3400.62 samples/sec   Loss 34.4350   LearningRate 0.0984   Epoch: 0   Global Step: 2000   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:57:31,853-Speed 3394.28 samples/sec   Loss 34.3605   LearningRate 0.0984   Epoch: 0   Global Step: 2010   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 01:57:34,883-Speed 3379.77 samples/sec   Loss 34.3506   LearningRate 0.0984   Epoch: 0   Global Step: 2020   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 01:57:37,918-Speed 3375.31 samples/sec   Loss 34.2997   LearningRate 0.0984   Epoch: 0   Global Step: 2030   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 01:57:41,006-Speed 3316.71 samples/sec   Loss 34.3040   LearningRate 0.0984   Epoch: 0   Global Step: 2040   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 01:57:44,021-Speed 3397.22 samples/sec   Loss 34.1602   LearningRate 0.0984   Epoch: 0   Global Step: 2050   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 01:57:47,075-Speed 3355.00 samples/sec   Loss 34.2168   LearningRate 0.0983   Epoch: 0   Global Step: 2060   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 01:57:50,217-Speed 3259.80 samples/sec   Loss 34.0678   LearningRate 0.0983   Epoch: 0   Global Step: 2070   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 01:57:53,272-Speed 3352.45 samples/sec   Loss 34.1722   LearningRate 0.0983   Epoch: 0   Global Step: 2080   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 01:57:56,309-Speed 3372.71 samples/sec   Loss 34.0779   LearningRate 0.0983   Epoch: 0   Global Step: 2090   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 01:57:59,332-Speed 3389.32 samples/sec   Loss 34.0705   LearningRate 0.0983   Epoch: 0   Global Step: 2100   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 01:58:02,394-Speed 3344.57 samples/sec   Loss 34.0464   LearningRate 0.0983   Epoch: 0   Global Step: 2110   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:58:05,442-Speed 3361.25 samples/sec   Loss 33.9623   LearningRate 0.0983   Epoch: 0   Global Step: 2120   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:58:08,520-Speed 3327.59 samples/sec   Loss 33.9167   LearningRate 0.0983   Epoch: 0   Global Step: 2130   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:58:11,598-Speed 3327.07 samples/sec   Loss 33.7424   LearningRate 0.0983   Epoch: 0   Global Step: 2140   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:58:14,644-Speed 3363.49 samples/sec   Loss 33.8296   LearningRate 0.0983   Epoch: 0   Global Step: 2150   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:58:17,693-Speed 3359.91 samples/sec   Loss 33.5927   LearningRate 0.0983   Epoch: 0   Global Step: 2160   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:58:20,730-Speed 3372.13 samples/sec   Loss 33.6627   LearningRate 0.0983   Epoch: 0   Global Step: 2170   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:58:23,824-Speed 3309.99 samples/sec   Loss 33.5071   LearningRate 0.0983   Epoch: 0   Global Step: 2180   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:58:26,962-Speed 3264.77 samples/sec   Loss 33.6190   LearningRate 0.0982   Epoch: 0   Global Step: 2190   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:58:30,020-Speed 3349.95 samples/sec   Loss 33.5211   LearningRate 0.0982   Epoch: 0   Global Step: 2200   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:58:33,084-Speed 3343.94 samples/sec   Loss 33.4554   LearningRate 0.0982   Epoch: 0   Global Step: 2210   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 01:58:36,248-Speed 3237.09 samples/sec   Loss 33.4375   LearningRate 0.0982   Epoch: 0   Global Step: 2220   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 01:58:39,281-Speed 3377.00 samples/sec   Loss 33.4836   LearningRate 0.0982   Epoch: 0   Global Step: 2230   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 01:58:42,316-Speed 3375.01 samples/sec   Loss 33.2629   LearningRate 0.0982   Epoch: 0   Global Step: 2240   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 01:58:45,340-Speed 3388.40 samples/sec   Loss 33.2155   LearningRate 0.0982   Epoch: 0   Global Step: 2250   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 01:58:48,366-Speed 3384.40 samples/sec   Loss 33.3122   LearningRate 0.0982   Epoch: 0   Global Step: 2260   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 01:58:51,372-Speed 3407.14 samples/sec   Loss 33.2388   LearningRate 0.0982   Epoch: 0   Global Step: 2270   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 01:58:54,463-Speed 3314.14 samples/sec   Loss 33.1970   LearningRate 0.0982   Epoch: 0   Global Step: 2280   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 01:58:57,517-Speed 3354.11 samples/sec   Loss 33.1603   LearningRate 0.0982   Epoch: 0   Global Step: 2290   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 01:59:00,589-Speed 3335.32 samples/sec   Loss 33.1628   LearningRate 0.0982   Epoch: 0   Global Step: 2300   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 01:59:03,651-Speed 3344.98 samples/sec   Loss 33.2257   LearningRate 0.0981   Epoch: 0   Global Step: 2310   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 01:59:06,707-Speed 3352.38 samples/sec   Loss 33.0458   LearningRate 0.0981   Epoch: 0   Global Step: 2320   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 01:59:09,709-Speed 3411.50 samples/sec   Loss 32.9564   LearningRate 0.0981   Epoch: 0   Global Step: 2330   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 01:59:12,768-Speed 3349.46 samples/sec   Loss 33.1025   LearningRate 0.0981   Epoch: 0   Global Step: 2340   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 01:59:15,803-Speed 3374.94 samples/sec   Loss 32.9806   LearningRate 0.0981   Epoch: 0   Global Step: 2350   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 01:59:18,894-Speed 3313.35 samples/sec   Loss 32.8135   LearningRate 0.0981   Epoch: 0   Global Step: 2360   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 01:59:21,930-Speed 3374.60 samples/sec   Loss 32.9709   LearningRate 0.0981   Epoch: 0   Global Step: 2370   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 01:59:24,996-Speed 3340.50 samples/sec   Loss 32.7121   LearningRate 0.0981   Epoch: 0   Global Step: 2380   Fp16 Grad Scale: 262144   Required: 22 hours
Training: 2022-04-27 01:59:28,102-Speed 3298.37 samples/sec   Loss 32.7189   LearningRate 0.0981   Epoch: 0   Global Step: 2390   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 01:59:31,129-Speed 3384.54 samples/sec   Loss 32.7555   LearningRate 0.0981   Epoch: 0   Global Step: 2400   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 01:59:34,150-Speed 3390.41 samples/sec   Loss 32.6102   LearningRate 0.0981   Epoch: 0   Global Step: 2410   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 01:59:37,154-Speed 3409.65 samples/sec   Loss 32.7092   LearningRate 0.0981   Epoch: 0   Global Step: 2420   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 01:59:40,251-Speed 3306.65 samples/sec   Loss 32.6277   LearningRate 0.0981   Epoch: 0   Global Step: 2430   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 01:59:43,320-Speed 3338.06 samples/sec   Loss 32.4461   LearningRate 0.0980   Epoch: 0   Global Step: 2440   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 01:59:46,347-Speed 3384.49 samples/sec   Loss 32.6116   LearningRate 0.0980   Epoch: 0   Global Step: 2450   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 01:59:49,363-Speed 3396.01 samples/sec   Loss 32.5542   LearningRate 0.0980   Epoch: 0   Global Step: 2460   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 01:59:52,412-Speed 3359.58 samples/sec   Loss 32.3634   LearningRate 0.0980   Epoch: 0   Global Step: 2470   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 01:59:55,471-Speed 3348.27 samples/sec   Loss 32.2255   LearningRate 0.0980   Epoch: 0   Global Step: 2480   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 01:59:58,522-Speed 3357.79 samples/sec   Loss 32.4872   LearningRate 0.0980   Epoch: 0   Global Step: 2490   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-04-27 02:00:01,660-Speed 3264.77 samples/sec   Loss 32.1327   LearningRate 0.0980   Epoch: 0   Global Step: 2500   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:00:04,729-Speed 3337.38 samples/sec   Loss 32.3564   LearningRate 0.0980   Epoch: 0   Global Step: 2510   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:00:07,749-Speed 3391.17 samples/sec   Loss 32.1058   LearningRate 0.0980   Epoch: 0   Global Step: 2520   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:00:10,787-Speed 3372.32 samples/sec   Loss 32.0564   LearningRate 0.0980   Epoch: 0   Global Step: 2530   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:00:13,839-Speed 3356.18 samples/sec   Loss 32.2217   LearningRate 0.0980   Epoch: 0   Global Step: 2540   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-04-27 02:00:16,906-Speed 3339.01 samples/sec   Loss 32.1351   LearningRate 0.0980   Epoch: 0   Global Step: 2550   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:00:19,947-Speed 3368.33 samples/sec   Loss 32.1047   LearningRate 0.0979   Epoch: 0   Global Step: 2560   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:00:22,979-Speed 3378.32 samples/sec   Loss 31.8771   LearningRate 0.0979   Epoch: 0   Global Step: 2570   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:00:26,132-Speed 3249.66 samples/sec   Loss 31.8771   LearningRate 0.0979   Epoch: 0   Global Step: 2580   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:00:29,195-Speed 3343.15 samples/sec   Loss 31.9735   LearningRate 0.0979   Epoch: 0   Global Step: 2590   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:00:32,265-Speed 3337.46 samples/sec   Loss 31.6709   LearningRate 0.0979   Epoch: 0   Global Step: 2600   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-04-27 02:00:35,306-Speed 3368.15 samples/sec   Loss 31.7809   LearningRate 0.0979   Epoch: 0   Global Step: 2610   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-04-27 02:00:38,400-Speed 3311.14 samples/sec   Loss 31.7211   LearningRate 0.0979   Epoch: 0   Global Step: 2620   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:00:41,413-Speed 3398.78 samples/sec   Loss 31.6951   LearningRate 0.0979   Epoch: 0   Global Step: 2630   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:00:44,431-Speed 3395.11 samples/sec   Loss 31.5371   LearningRate 0.0979   Epoch: 0   Global Step: 2640   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:00:47,481-Speed 3357.51 samples/sec   Loss 31.5490   LearningRate 0.0979   Epoch: 0   Global Step: 2650   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:00:50,551-Speed 3337.28 samples/sec   Loss 31.5991   LearningRate 0.0979   Epoch: 0   Global Step: 2660   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:00:53,639-Speed 3316.89 samples/sec   Loss 31.5426   LearningRate 0.0979   Epoch: 0   Global Step: 2670   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:00:56,680-Speed 3368.63 samples/sec   Loss 31.3511   LearningRate 0.0979   Epoch: 0   Global Step: 2680   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:00:59,697-Speed 3394.54 samples/sec   Loss 31.4630   LearningRate 0.0978   Epoch: 0   Global Step: 2690   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:01:02,737-Speed 3370.02 samples/sec   Loss 31.4372   LearningRate 0.0978   Epoch: 0   Global Step: 2700   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:01:05,768-Speed 3379.90 samples/sec   Loss 31.4874   LearningRate 0.0978   Epoch: 0   Global Step: 2710   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:01:08,800-Speed 3378.57 samples/sec   Loss 31.2393   LearningRate 0.0978   Epoch: 0   Global Step: 2720   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:01:11,872-Speed 3333.13 samples/sec   Loss 31.2061   LearningRate 0.0978   Epoch: 0   Global Step: 2730   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:01:14,895-Speed 3389.24 samples/sec   Loss 31.1605   LearningRate 0.0978   Epoch: 0   Global Step: 2740   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:01:17,946-Speed 3357.35 samples/sec   Loss 31.2148   LearningRate 0.0978   Epoch: 0   Global Step: 2750   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:01:20,970-Speed 3387.12 samples/sec   Loss 30.9471   LearningRate 0.0978   Epoch: 0   Global Step: 2760   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:01:24,024-Speed 3353.80 samples/sec   Loss 30.9397   LearningRate 0.0978   Epoch: 0   Global Step: 2770   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:01:27,110-Speed 3319.64 samples/sec   Loss 31.0279   LearningRate 0.0978   Epoch: 0   Global Step: 2780   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:01:30,145-Speed 3374.50 samples/sec   Loss 30.9029   LearningRate 0.0978   Epoch: 0   Global Step: 2790   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:01:33,190-Speed 3364.91 samples/sec   Loss 30.9557   LearningRate 0.0978   Epoch: 0   Global Step: 2800   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:01:36,232-Speed 3366.36 samples/sec   Loss 30.9447   LearningRate 0.0978   Epoch: 0   Global Step: 2810   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:01:39,289-Speed 3350.63 samples/sec   Loss 30.8983   LearningRate 0.0977   Epoch: 0   Global Step: 2820   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:01:42,366-Speed 3329.71 samples/sec   Loss 30.7303   LearningRate 0.0977   Epoch: 0   Global Step: 2830   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:01:45,389-Speed 3387.90 samples/sec   Loss 30.7503   LearningRate 0.0977   Epoch: 0   Global Step: 2840   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:01:48,438-Speed 3360.51 samples/sec   Loss 30.7275   LearningRate 0.0977   Epoch: 0   Global Step: 2850   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:01:51,513-Speed 3330.95 samples/sec   Loss 30.6591   LearningRate 0.0977   Epoch: 0   Global Step: 2860   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:01:54,550-Speed 3372.45 samples/sec   Loss 30.5228   LearningRate 0.0977   Epoch: 0   Global Step: 2870   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:01:57,568-Speed 3393.72 samples/sec   Loss 30.5720   LearningRate 0.0977   Epoch: 0   Global Step: 2880   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:02:00,691-Speed 3280.72 samples/sec   Loss 30.5071   LearningRate 0.0977   Epoch: 0   Global Step: 2890   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:02:03,745-Speed 3353.79 samples/sec   Loss 30.6058   LearningRate 0.0977   Epoch: 0   Global Step: 2900   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:02:06,777-Speed 3377.85 samples/sec   Loss 30.4940   LearningRate 0.0977   Epoch: 0   Global Step: 2910   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:02:09,834-Speed 3350.92 samples/sec   Loss 30.3444   LearningRate 0.0977   Epoch: 0   Global Step: 2920   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:02:12,944-Speed 3293.93 samples/sec   Loss 30.2513   LearningRate 0.0977   Epoch: 0   Global Step: 2930   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:02:16,013-Speed 3337.87 samples/sec   Loss 30.3496   LearningRate 0.0976   Epoch: 0   Global Step: 2940   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:02:19,081-Speed 3338.75 samples/sec   Loss 30.0628   LearningRate 0.0976   Epoch: 0   Global Step: 2950   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:02:22,105-Speed 3386.40 samples/sec   Loss 30.1535   LearningRate 0.0976   Epoch: 0   Global Step: 2960   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:02:25,175-Speed 3337.10 samples/sec   Loss 30.1175   LearningRate 0.0976   Epoch: 0   Global Step: 2970   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:02:28,274-Speed 3305.57 samples/sec   Loss 30.0294   LearningRate 0.0976   Epoch: 0   Global Step: 2980   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:02:31,304-Speed 3379.75 samples/sec   Loss 30.0082   LearningRate 0.0976   Epoch: 0   Global Step: 2990   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:02:34,341-Speed 3373.05 samples/sec   Loss 30.0372   LearningRate 0.0976   Epoch: 0   Global Step: 3000   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:02:37,383-Speed 3368.03 samples/sec   Loss 29.8353   LearningRate 0.0976   Epoch: 0   Global Step: 3010   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:02:40,457-Speed 3331.86 samples/sec   Loss 30.0196   LearningRate 0.0976   Epoch: 0   Global Step: 3020   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:02:43,545-Speed 3317.05 samples/sec   Loss 29.8695   LearningRate 0.0976   Epoch: 0   Global Step: 3030   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:02:46,576-Speed 3379.15 samples/sec   Loss 29.7520   LearningRate 0.0976   Epoch: 0   Global Step: 3040   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:02:49,589-Speed 3399.22 samples/sec   Loss 29.8082   LearningRate 0.0976   Epoch: 0   Global Step: 3050   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:02:52,653-Speed 3343.20 samples/sec   Loss 29.6802   LearningRate 0.0976   Epoch: 0   Global Step: 3060   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:02:55,681-Speed 3383.61 samples/sec   Loss 29.6490   LearningRate 0.0975   Epoch: 0   Global Step: 3070   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:02:58,696-Speed 3397.20 samples/sec   Loss 29.6759   LearningRate 0.0975   Epoch: 0   Global Step: 3080   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:03:01,737-Speed 3368.53 samples/sec   Loss 29.6513   LearningRate 0.0975   Epoch: 0   Global Step: 3090   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:03:04,820-Speed 3321.81 samples/sec   Loss 29.4203   LearningRate 0.0975   Epoch: 0   Global Step: 3100   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:03:07,865-Speed 3365.04 samples/sec   Loss 29.5904   LearningRate 0.0975   Epoch: 0   Global Step: 3110   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:03:10,903-Speed 3371.23 samples/sec   Loss 29.4270   LearningRate 0.0975   Epoch: 0   Global Step: 3120   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:03:13,985-Speed 3323.81 samples/sec   Loss 29.4073   LearningRate 0.0975   Epoch: 0   Global Step: 3130   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:03:17,040-Speed 3352.86 samples/sec   Loss 29.3962   LearningRate 0.0975   Epoch: 0   Global Step: 3140   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:03:20,076-Speed 3373.47 samples/sec   Loss 29.2226   LearningRate 0.0975   Epoch: 0   Global Step: 3150   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:03:23,096-Speed 3392.28 samples/sec   Loss 29.1727   LearningRate 0.0975   Epoch: 0   Global Step: 3160   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:03:26,131-Speed 3375.17 samples/sec   Loss 29.2226   LearningRate 0.0975   Epoch: 0   Global Step: 3170   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:03:29,186-Speed 3351.98 samples/sec   Loss 29.2951   LearningRate 0.0975   Epoch: 0   Global Step: 3180   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:03:32,276-Speed 3315.52 samples/sec   Loss 29.1263   LearningRate 0.0974   Epoch: 0   Global Step: 3190   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:03:35,318-Speed 3367.91 samples/sec   Loss 29.0351   LearningRate 0.0974   Epoch: 0   Global Step: 3200   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:03:38,360-Speed 3366.50 samples/sec   Loss 29.1407   LearningRate 0.0974   Epoch: 0   Global Step: 3210   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:03:41,442-Speed 3323.49 samples/sec   Loss 28.9678   LearningRate 0.0974   Epoch: 0   Global Step: 3220   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:03:44,459-Speed 3396.06 samples/sec   Loss 28.9629   LearningRate 0.0974   Epoch: 0   Global Step: 3230   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:03:47,543-Speed 3321.46 samples/sec   Loss 28.9199   LearningRate 0.0974   Epoch: 0   Global Step: 3240   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:03:50,588-Speed 3364.01 samples/sec   Loss 28.7484   LearningRate 0.0974   Epoch: 0   Global Step: 3250   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:03:53,683-Speed 3308.88 samples/sec   Loss 28.6564   LearningRate 0.0974   Epoch: 0   Global Step: 3260   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:03:56,698-Speed 3397.71 samples/sec   Loss 28.7795   LearningRate 0.0974   Epoch: 0   Global Step: 3270   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:03:59,767-Speed 3337.53 samples/sec   Loss 28.7957   LearningRate 0.0974   Epoch: 0   Global Step: 3280   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:04:02,801-Speed 3376.68 samples/sec   Loss 28.7782   LearningRate 0.0974   Epoch: 0   Global Step: 3290   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:04:05,856-Speed 3352.30 samples/sec   Loss 28.4583   LearningRate 0.0974   Epoch: 0   Global Step: 3300   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:04:08,865-Speed 3405.38 samples/sec   Loss 28.6801   LearningRate 0.0974   Epoch: 0   Global Step: 3310   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:04:12,001-Speed 3266.57 samples/sec   Loss 28.5461   LearningRate 0.0973   Epoch: 0   Global Step: 3320   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:04:15,093-Speed 3312.25 samples/sec   Loss 28.3665   LearningRate 0.0973   Epoch: 0   Global Step: 3330   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:04:18,150-Speed 3350.60 samples/sec   Loss 28.3219   LearningRate 0.0973   Epoch: 0   Global Step: 3340   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:04:21,183-Speed 3377.54 samples/sec   Loss 28.3290   LearningRate 0.0973   Epoch: 0   Global Step: 3350   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:04:24,226-Speed 3366.15 samples/sec   Loss 28.4336   LearningRate 0.0973   Epoch: 0   Global Step: 3360   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:04:27,322-Speed 3307.67 samples/sec   Loss 28.2266   LearningRate 0.0973   Epoch: 0   Global Step: 3370   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:04:30,468-Speed 3256.12 samples/sec   Loss 28.0418   LearningRate 0.0973   Epoch: 0   Global Step: 3380   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:04:33,498-Speed 3380.08 samples/sec   Loss 28.0531   LearningRate 0.0973   Epoch: 0   Global Step: 3390   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:04:36,586-Speed 3318.05 samples/sec   Loss 28.1476   LearningRate 0.0973   Epoch: 0   Global Step: 3400   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:04:39,618-Speed 3377.62 samples/sec   Loss 27.9580   LearningRate 0.0973   Epoch: 0   Global Step: 3410   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:04:42,643-Speed 3386.72 samples/sec   Loss 28.0544   LearningRate 0.0973   Epoch: 0   Global Step: 3420   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:04:45,653-Speed 3402.21 samples/sec   Loss 28.1431   LearningRate 0.0973   Epoch: 0   Global Step: 3430   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:04:48,706-Speed 3355.93 samples/sec   Loss 27.8662   LearningRate 0.0972   Epoch: 0   Global Step: 3440   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:04:51,747-Speed 3367.96 samples/sec   Loss 27.8903   LearningRate 0.0972   Epoch: 0   Global Step: 3450   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:04:54,778-Speed 3379.33 samples/sec   Loss 27.8997   LearningRate 0.0972   Epoch: 0   Global Step: 3460   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:04:57,802-Speed 3387.47 samples/sec   Loss 27.8557   LearningRate 0.0972   Epoch: 0   Global Step: 3470   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:05:00,851-Speed 3359.91 samples/sec   Loss 27.6676   LearningRate 0.0972   Epoch: 0   Global Step: 3480   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:05:03,894-Speed 3365.67 samples/sec   Loss 27.5943   LearningRate 0.0972   Epoch: 0   Global Step: 3490   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:05:06,954-Speed 3346.84 samples/sec   Loss 27.7842   LearningRate 0.0972   Epoch: 0   Global Step: 3500   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-04-27 02:05:09,971-Speed 3395.43 samples/sec   Loss 27.5962   LearningRate 0.0972   Epoch: 0   Global Step: 3510   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-04-27 02:05:13,086-Speed 3288.68 samples/sec   Loss 27.4764   LearningRate 0.0972   Epoch: 0   Global Step: 3520   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-04-27 02:05:16,122-Speed 3374.14 samples/sec   Loss 27.5138   LearningRate 0.0972   Epoch: 0   Global Step: 3530   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-04-27 02:05:19,134-Speed 3400.66 samples/sec   Loss 27.4917   LearningRate 0.0972   Epoch: 0   Global Step: 3540   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:05:22,141-Speed 3405.45 samples/sec   Loss 27.4574   LearningRate 0.0972   Epoch: 0   Global Step: 3550   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:05:25,210-Speed 3338.10 samples/sec   Loss 27.5100   LearningRate 0.0972   Epoch: 0   Global Step: 3560   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:05:28,272-Speed 3344.87 samples/sec   Loss 27.3330   LearningRate 0.0971   Epoch: 0   Global Step: 3570   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:05:31,354-Speed 3323.49 samples/sec   Loss 27.4141   LearningRate 0.0971   Epoch: 0   Global Step: 3580   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:05:34,383-Speed 3381.84 samples/sec   Loss 27.2396   LearningRate 0.0971   Epoch: 0   Global Step: 3590   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:05:37,411-Speed 3382.92 samples/sec   Loss 27.1334   LearningRate 0.0971   Epoch: 0   Global Step: 3600   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:05:40,440-Speed 3381.61 samples/sec   Loss 27.1677   LearningRate 0.0971   Epoch: 0   Global Step: 3610   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:05:43,499-Speed 3349.25 samples/sec   Loss 27.1206   LearningRate 0.0971   Epoch: 0   Global Step: 3620   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:05:46,544-Speed 3363.85 samples/sec   Loss 27.1836   LearningRate 0.0971   Epoch: 0   Global Step: 3630   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:05:49,574-Speed 3380.20 samples/sec   Loss 26.9725   LearningRate 0.0971   Epoch: 0   Global Step: 3640   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-04-27 02:05:52,622-Speed 3360.17 samples/sec   Loss 27.0419   LearningRate 0.0971   Epoch: 0   Global Step: 3650   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:05:55,681-Speed 3349.64 samples/sec   Loss 26.6906   LearningRate 0.0971   Epoch: 0   Global Step: 3660   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:05:58,749-Speed 3338.16 samples/sec   Loss 27.0330   LearningRate 0.0971   Epoch: 0   Global Step: 3670   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:06:01,770-Speed 3390.78 samples/sec   Loss 26.9396   LearningRate 0.0971   Epoch: 0   Global Step: 3680   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:06:04,798-Speed 3383.15 samples/sec   Loss 26.8900   LearningRate 0.0971   Epoch: 0   Global Step: 3690   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:06:07,837-Speed 3370.80 samples/sec   Loss 26.6533   LearningRate 0.0970   Epoch: 0   Global Step: 3700   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:06:10,870-Speed 3376.95 samples/sec   Loss 26.6379   LearningRate 0.0970   Epoch: 0   Global Step: 3710   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:06:13,899-Speed 3381.79 samples/sec   Loss 26.8796   LearningRate 0.0970   Epoch: 0   Global Step: 3720   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:06:16,949-Speed 3358.19 samples/sec   Loss 26.5542   LearningRate 0.0970   Epoch: 0   Global Step: 3730   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:06:19,975-Speed 3385.58 samples/sec   Loss 26.5676   LearningRate 0.0970   Epoch: 0   Global Step: 3740   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:06:23,017-Speed 3367.09 samples/sec   Loss 26.6223   LearningRate 0.0970   Epoch: 0   Global Step: 3750   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:06:26,082-Speed 3341.67 samples/sec   Loss 26.3039   LearningRate 0.0970   Epoch: 0   Global Step: 3760   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:06:29,225-Speed 3258.73 samples/sec   Loss 26.4077   LearningRate 0.0970   Epoch: 0   Global Step: 3770   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:06:32,274-Speed 3359.64 samples/sec   Loss 26.4164   LearningRate 0.0970   Epoch: 0   Global Step: 3780   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:06:35,329-Speed 3352.84 samples/sec   Loss 26.5432   LearningRate 0.0970   Epoch: 0   Global Step: 3790   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:06:38,369-Speed 3370.18 samples/sec   Loss 26.3629   LearningRate 0.0970   Epoch: 0   Global Step: 3800   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:06:41,451-Speed 3323.65 samples/sec   Loss 26.3054   LearningRate 0.0970   Epoch: 0   Global Step: 3810   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:06:44,522-Speed 3334.78 samples/sec   Loss 26.2133   LearningRate 0.0969   Epoch: 0   Global Step: 3820   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:06:47,543-Speed 3391.40 samples/sec   Loss 26.1765   LearningRate 0.0969   Epoch: 0   Global Step: 3830   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:06:50,634-Speed 3313.15 samples/sec   Loss 26.1562   LearningRate 0.0969   Epoch: 0   Global Step: 3840   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:06:53,710-Speed 3330.36 samples/sec   Loss 26.0606   LearningRate 0.0969   Epoch: 0   Global Step: 3850   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:06:56,744-Speed 3376.11 samples/sec   Loss 26.0855   LearningRate 0.0969   Epoch: 0   Global Step: 3860   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-04-27 02:06:59,772-Speed 3382.49 samples/sec   Loss 26.1551   LearningRate 0.0969   Epoch: 0   Global Step: 3870   Fp16 Grad Scale: 262144   Required: 21 hours
Training: 2022-04-27 02:07:02,835-Speed 3344.20 samples/sec   Loss 25.9013   LearningRate 0.0969   Epoch: 0   Global Step: 3880   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:07:05,908-Speed 3333.08 samples/sec   Loss 25.9718   LearningRate 0.0969   Epoch: 0   Global Step: 3890   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:07:08,928-Speed 3392.00 samples/sec   Loss 25.8099   LearningRate 0.0969   Epoch: 0   Global Step: 3900   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:07:11,980-Speed 3356.58 samples/sec   Loss 25.9250   LearningRate 0.0969   Epoch: 0   Global Step: 3910   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:07:15,023-Speed 3365.91 samples/sec   Loss 26.0700   LearningRate 0.0969   Epoch: 0   Global Step: 3920   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:07:18,111-Speed 3316.79 samples/sec   Loss 25.8585   LearningRate 0.0969   Epoch: 0   Global Step: 3930   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:07:21,153-Speed 3367.26 samples/sec   Loss 25.8584   LearningRate 0.0969   Epoch: 0   Global Step: 3940   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:07:24,202-Speed 3360.10 samples/sec   Loss 25.7017   LearningRate 0.0968   Epoch: 0   Global Step: 3950   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:07:27,273-Speed 3335.33 samples/sec   Loss 25.7954   LearningRate 0.0968   Epoch: 0   Global Step: 3960   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:07:30,305-Speed 3377.91 samples/sec   Loss 25.5993   LearningRate 0.0968   Epoch: 0   Global Step: 3970   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:07:33,381-Speed 3329.82 samples/sec   Loss 25.5810   LearningRate 0.0968   Epoch: 0   Global Step: 3980   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:07:36,473-Speed 3313.04 samples/sec   Loss 25.5845   LearningRate 0.0968   Epoch: 0   Global Step: 3990   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:07:39,531-Speed 3349.89 samples/sec   Loss 25.6394   LearningRate 0.0968   Epoch: 0   Global Step: 4000   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:07:42,569-Speed 3371.56 samples/sec   Loss 25.3375   LearningRate 0.0968   Epoch: 0   Global Step: 4010   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:07:45,601-Speed 3379.65 samples/sec   Loss 25.4432   LearningRate 0.0968   Epoch: 0   Global Step: 4020   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:07:48,617-Speed 3396.12 samples/sec   Loss 25.4649   LearningRate 0.0968   Epoch: 0   Global Step: 4030   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:07:51,650-Speed 3377.80 samples/sec   Loss 25.3042   LearningRate 0.0968   Epoch: 0   Global Step: 4040   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:07:54,743-Speed 3311.19 samples/sec   Loss 25.2423   LearningRate 0.0968   Epoch: 0   Global Step: 4050   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:07:57,768-Speed 3387.17 samples/sec   Loss 25.1762   LearningRate 0.0968   Epoch: 0   Global Step: 4060   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:08:00,816-Speed 3359.98 samples/sec   Loss 25.2850   LearningRate 0.0968   Epoch: 0   Global Step: 4070   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:08:03,904-Speed 3317.31 samples/sec   Loss 25.1650   LearningRate 0.0967   Epoch: 0   Global Step: 4080   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:08:06,998-Speed 3310.84 samples/sec   Loss 25.0991   LearningRate 0.0967   Epoch: 0   Global Step: 4090   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:08:10,038-Speed 3369.67 samples/sec   Loss 25.1216   LearningRate 0.0967   Epoch: 0   Global Step: 4100   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:08:13,073-Speed 3375.19 samples/sec   Loss 25.0646   LearningRate 0.0967   Epoch: 0   Global Step: 4110   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:08:16,176-Speed 3301.06 samples/sec   Loss 24.9888   LearningRate 0.0967   Epoch: 0   Global Step: 4120   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:08:19,233-Speed 3350.81 samples/sec   Loss 24.7987   LearningRate 0.0967   Epoch: 0   Global Step: 4130   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:08:22,277-Speed 3364.56 samples/sec   Loss 24.8689   LearningRate 0.0967   Epoch: 0   Global Step: 4140   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:08:25,327-Speed 3358.56 samples/sec   Loss 24.9560   LearningRate 0.0967   Epoch: 0   Global Step: 4150   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:08:28,351-Speed 3386.80 samples/sec   Loss 24.9713   LearningRate 0.0967   Epoch: 0   Global Step: 4160   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:08:31,382-Speed 3380.49 samples/sec   Loss 24.8111   LearningRate 0.0967   Epoch: 0   Global Step: 4170   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:08:34,394-Speed 3400.65 samples/sec   Loss 24.7948   LearningRate 0.0967   Epoch: 0   Global Step: 4180   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:08:37,420-Speed 3385.11 samples/sec   Loss 24.6660   LearningRate 0.0967   Epoch: 0   Global Step: 4190   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:08:40,512-Speed 3312.79 samples/sec   Loss 24.5865   LearningRate 0.0966   Epoch: 0   Global Step: 4200   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:08:43,609-Speed 3308.19 samples/sec   Loss 24.5190   LearningRate 0.0966   Epoch: 0   Global Step: 4210   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:08:46,620-Speed 3401.68 samples/sec   Loss 24.4876   LearningRate 0.0966   Epoch: 0   Global Step: 4220   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:08:49,653-Speed 3377.02 samples/sec   Loss 24.6122   LearningRate 0.0966   Epoch: 0   Global Step: 4230   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:08:52,717-Speed 3343.24 samples/sec   Loss 24.5432   LearningRate 0.0966   Epoch: 0   Global Step: 4240   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:08:55,830-Speed 3290.15 samples/sec   Loss 24.4519   LearningRate 0.0966   Epoch: 0   Global Step: 4250   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:08:58,877-Speed 3361.71 samples/sec   Loss 24.5185   LearningRate 0.0966   Epoch: 0   Global Step: 4260   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:09:01,929-Speed 3356.38 samples/sec   Loss 24.3330   LearningRate 0.0966   Epoch: 0   Global Step: 4270   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:09:04,929-Speed 3414.17 samples/sec   Loss 24.2076   LearningRate 0.0966   Epoch: 0   Global Step: 4280   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:09:07,952-Speed 3389.16 samples/sec   Loss 24.2050   LearningRate 0.0966   Epoch: 0   Global Step: 4290   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:09:10,987-Speed 3374.93 samples/sec   Loss 24.3723   LearningRate 0.0966   Epoch: 0   Global Step: 4300   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:09:14,013-Speed 3384.67 samples/sec   Loss 24.2457   LearningRate 0.0966   Epoch: 0   Global Step: 4310   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:09:17,131-Speed 3286.01 samples/sec   Loss 24.1929   LearningRate 0.0966   Epoch: 0   Global Step: 4320   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:09:20,134-Speed 3410.37 samples/sec   Loss 24.1728   LearningRate 0.0965   Epoch: 0   Global Step: 4330   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:09:23,178-Speed 3365.05 samples/sec   Loss 24.2739   LearningRate 0.0965   Epoch: 0   Global Step: 4340   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:09:26,232-Speed 3354.19 samples/sec   Loss 24.1332   LearningRate 0.0965   Epoch: 0   Global Step: 4350   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:09:29,246-Speed 3398.82 samples/sec   Loss 23.9141   LearningRate 0.0965   Epoch: 0   Global Step: 4360   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:09:32,336-Speed 3314.56 samples/sec   Loss 24.0444   LearningRate 0.0965   Epoch: 0   Global Step: 4370   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:09:35,343-Speed 3406.65 samples/sec   Loss 24.0029   LearningRate 0.0965   Epoch: 0   Global Step: 4380   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:09:38,376-Speed 3377.12 samples/sec   Loss 23.8986   LearningRate 0.0965   Epoch: 0   Global Step: 4390   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:09:41,432-Speed 3351.72 samples/sec   Loss 23.9669   LearningRate 0.0965   Epoch: 0   Global Step: 4400   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:09:44,471-Speed 3370.64 samples/sec   Loss 24.0491   LearningRate 0.0965   Epoch: 0   Global Step: 4410   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:09:47,528-Speed 3351.28 samples/sec   Loss 23.7567   LearningRate 0.0965   Epoch: 0   Global Step: 4420   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:09:50,624-Speed 3308.44 samples/sec   Loss 23.7553   LearningRate 0.0965   Epoch: 0   Global Step: 4430   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:09:53,760-Speed 3266.52 samples/sec   Loss 23.7109   LearningRate 0.0965   Epoch: 0   Global Step: 4440   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:09:56,802-Speed 3366.41 samples/sec   Loss 23.6292   LearningRate 0.0964   Epoch: 0   Global Step: 4450   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:09:59,885-Speed 3323.34 samples/sec   Loss 23.6186   LearningRate 0.0964   Epoch: 0   Global Step: 4460   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:10:02,925-Speed 3368.98 samples/sec   Loss 23.6677   LearningRate 0.0964   Epoch: 0   Global Step: 4470   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:10:05,975-Speed 3359.17 samples/sec   Loss 23.6265   LearningRate 0.0964   Epoch: 0   Global Step: 4480   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:10:08,990-Speed 3396.13 samples/sec   Loss 23.7359   LearningRate 0.0964   Epoch: 0   Global Step: 4490   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:10:12,043-Speed 3356.54 samples/sec   Loss 23.4945   LearningRate 0.0964   Epoch: 0   Global Step: 4500   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:10:15,046-Speed 3410.57 samples/sec   Loss 23.4686   LearningRate 0.0964   Epoch: 0   Global Step: 4510   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:10:18,123-Speed 3329.06 samples/sec   Loss 23.5193   LearningRate 0.0964   Epoch: 0   Global Step: 4520   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:10:21,165-Speed 3367.14 samples/sec   Loss 23.4704   LearningRate 0.0964   Epoch: 0   Global Step: 4530   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:10:24,199-Speed 3376.28 samples/sec   Loss 23.3606   LearningRate 0.0964   Epoch: 0   Global Step: 4540   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:10:27,260-Speed 3346.17 samples/sec   Loss 23.3067   LearningRate 0.0964   Epoch: 0   Global Step: 4550   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:10:30,358-Speed 3306.29 samples/sec   Loss 23.1413   LearningRate 0.0964   Epoch: 0   Global Step: 4560   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:10:33,381-Speed 3388.48 samples/sec   Loss 23.4764   LearningRate 0.0964   Epoch: 0   Global Step: 4570   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:10:36,507-Speed 3277.13 samples/sec   Loss 23.1061   LearningRate 0.0963   Epoch: 0   Global Step: 4580   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:10:39,618-Speed 3292.84 samples/sec   Loss 23.3104   LearningRate 0.0963   Epoch: 0   Global Step: 4590   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:10:42,638-Speed 3391.00 samples/sec   Loss 23.0377   LearningRate 0.0963   Epoch: 0   Global Step: 4600   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:10:45,682-Speed 3365.51 samples/sec   Loss 23.1743   LearningRate 0.0963   Epoch: 0   Global Step: 4610   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:10:48,753-Speed 3335.59 samples/sec   Loss 23.0194   LearningRate 0.0963   Epoch: 0   Global Step: 4620   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:10:51,803-Speed 3358.63 samples/sec   Loss 23.1083   LearningRate 0.0963   Epoch: 0   Global Step: 4630   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:10:54,811-Speed 3404.38 samples/sec   Loss 22.9668   LearningRate 0.0963   Epoch: 0   Global Step: 4640   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:10:57,831-Speed 3392.04 samples/sec   Loss 22.9742   LearningRate 0.0963   Epoch: 0   Global Step: 4650   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:11:00,876-Speed 3364.48 samples/sec   Loss 22.9980   LearningRate 0.0963   Epoch: 0   Global Step: 4660   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:11:03,907-Speed 3379.37 samples/sec   Loss 22.6973   LearningRate 0.0963   Epoch: 0   Global Step: 4670   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:11:06,930-Speed 3387.56 samples/sec   Loss 22.8743   LearningRate 0.0963   Epoch: 0   Global Step: 4680   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:11:09,952-Speed 3390.39 samples/sec   Loss 22.7297   LearningRate 0.0963   Epoch: 0   Global Step: 4690   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:11:12,976-Speed 3387.22 samples/sec   Loss 22.7508   LearningRate 0.0963   Epoch: 0   Global Step: 4700   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:11:15,997-Speed 3390.42 samples/sec   Loss 22.6745   LearningRate 0.0962   Epoch: 0   Global Step: 4710   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:11:19,008-Speed 3402.00 samples/sec   Loss 22.5501   LearningRate 0.0962   Epoch: 0   Global Step: 4720   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:11:22,015-Speed 3406.03 samples/sec   Loss 22.6632   LearningRate 0.0962   Epoch: 0   Global Step: 4730   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:11:25,077-Speed 3346.03 samples/sec   Loss 22.7938   LearningRate 0.0962   Epoch: 0   Global Step: 4740   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:11:28,121-Speed 3364.91 samples/sec   Loss 22.7427   LearningRate 0.0962   Epoch: 0   Global Step: 4750   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:11:31,152-Speed 3379.24 samples/sec   Loss 22.4697   LearningRate 0.0962   Epoch: 0   Global Step: 4760   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:11:34,156-Speed 3409.21 samples/sec   Loss 22.5401   LearningRate 0.0962   Epoch: 0   Global Step: 4770   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:11:37,184-Speed 3383.61 samples/sec   Loss 22.4035   LearningRate 0.0962   Epoch: 0   Global Step: 4780   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:11:40,197-Speed 3399.29 samples/sec   Loss 22.4862   LearningRate 0.0962   Epoch: 0   Global Step: 4790   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:11:43,208-Speed 3401.57 samples/sec   Loss 22.4427   LearningRate 0.0962   Epoch: 0   Global Step: 4800   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:11:46,214-Speed 3407.86 samples/sec   Loss 22.4753   LearningRate 0.0962   Epoch: 0   Global Step: 4810   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:11:49,219-Speed 3409.28 samples/sec   Loss 22.2820   LearningRate 0.0962   Epoch: 0   Global Step: 4820   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:11:52,250-Speed 3379.22 samples/sec   Loss 22.3884   LearningRate 0.0961   Epoch: 0   Global Step: 4830   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:11:55,249-Speed 3415.74 samples/sec   Loss 22.2711   LearningRate 0.0961   Epoch: 0   Global Step: 4840   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:11:58,306-Speed 3350.68 samples/sec   Loss 22.3121   LearningRate 0.0961   Epoch: 0   Global Step: 4850   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:12:01,375-Speed 3337.95 samples/sec   Loss 22.1146   LearningRate 0.0961   Epoch: 0   Global Step: 4860   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:12:04,442-Speed 3340.09 samples/sec   Loss 22.1636   LearningRate 0.0961   Epoch: 0   Global Step: 4870   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:12:07,515-Speed 3332.55 samples/sec   Loss 22.1368   LearningRate 0.0961   Epoch: 0   Global Step: 4880   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:12:10,513-Speed 3416.91 samples/sec   Loss 21.8689   LearningRate 0.0961   Epoch: 0   Global Step: 4890   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:12:13,533-Speed 3392.98 samples/sec   Loss 22.1743   LearningRate 0.0961   Epoch: 0   Global Step: 4900   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:12:16,549-Speed 3396.36 samples/sec   Loss 21.8873   LearningRate 0.0961   Epoch: 0   Global Step: 4910   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:12:19,566-Speed 3394.45 samples/sec   Loss 21.9462   LearningRate 0.0961   Epoch: 0   Global Step: 4920   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:12:22,585-Speed 3393.27 samples/sec   Loss 21.8770   LearningRate 0.0961   Epoch: 0   Global Step: 4930   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:12:25,598-Speed 3399.90 samples/sec   Loss 21.8689   LearningRate 0.0961   Epoch: 0   Global Step: 4940   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:12:28,629-Speed 3379.60 samples/sec   Loss 21.8521   LearningRate 0.0961   Epoch: 0   Global Step: 4950   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:12:31,689-Speed 3347.72 samples/sec   Loss 21.8918   LearningRate 0.0960   Epoch: 0   Global Step: 4960   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:12:34,692-Speed 3411.25 samples/sec   Loss 21.7984   LearningRate 0.0960   Epoch: 0   Global Step: 4970   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:12:37,721-Speed 3381.28 samples/sec   Loss 21.8414   LearningRate 0.0960   Epoch: 0   Global Step: 4980   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:12:40,768-Speed 3361.78 samples/sec   Loss 21.9558   LearningRate 0.0960   Epoch: 0   Global Step: 4990   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:12:43,835-Speed 3340.07 samples/sec   Loss 21.8070   LearningRate 0.0960   Epoch: 0   Global Step: 5000   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:12:46,881-Speed 3363.37 samples/sec   Loss 21.7039   LearningRate 0.0960   Epoch: 0   Global Step: 5010   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:12:49,889-Speed 3404.57 samples/sec   Loss 21.6914   LearningRate 0.0960   Epoch: 0   Global Step: 5020   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:12:52,942-Speed 3354.92 samples/sec   Loss 21.7416   LearningRate 0.0960   Epoch: 0   Global Step: 5030   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:12:55,975-Speed 3377.77 samples/sec   Loss 21.8275   LearningRate 0.0960   Epoch: 0   Global Step: 5040   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:12:59,001-Speed 3385.06 samples/sec   Loss 21.4931   LearningRate 0.0960   Epoch: 0   Global Step: 5050   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:13:02,051-Speed 3359.30 samples/sec   Loss 21.7104   LearningRate 0.0960   Epoch: 0   Global Step: 5060   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:13:05,152-Speed 3302.44 samples/sec   Loss 21.3495   LearningRate 0.0960   Epoch: 0   Global Step: 5070   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:13:08,234-Speed 3323.71 samples/sec   Loss 21.4110   LearningRate 0.0960   Epoch: 0   Global Step: 5080   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:13:11,301-Speed 3340.59 samples/sec   Loss 21.5847   LearningRate 0.0959   Epoch: 0   Global Step: 5090   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:13:14,327-Speed 3385.08 samples/sec   Loss 21.4539   LearningRate 0.0959   Epoch: 0   Global Step: 5100   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:13:17,343-Speed 3396.58 samples/sec   Loss 21.3870   LearningRate 0.0959   Epoch: 0   Global Step: 5110   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:13:20,385-Speed 3367.15 samples/sec   Loss 21.4449   LearningRate 0.0959   Epoch: 0   Global Step: 5120   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:13:23,417-Speed 3377.78 samples/sec   Loss 21.2547   LearningRate 0.0959   Epoch: 0   Global Step: 5130   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:13:26,439-Speed 3389.76 samples/sec   Loss 21.2531   LearningRate 0.0959   Epoch: 0   Global Step: 5140   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:13:29,463-Speed 3386.93 samples/sec   Loss 21.2555   LearningRate 0.0959   Epoch: 0   Global Step: 5150   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:13:32,489-Speed 3385.42 samples/sec   Loss 21.3051   LearningRate 0.0959   Epoch: 0   Global Step: 5160   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:13:35,529-Speed 3369.66 samples/sec   Loss 21.1990   LearningRate 0.0959   Epoch: 0   Global Step: 5170   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:13:38,591-Speed 3345.48 samples/sec   Loss 21.0107   LearningRate 0.0959   Epoch: 0   Global Step: 5180   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:13:41,690-Speed 3305.42 samples/sec   Loss 21.0918   LearningRate 0.0959   Epoch: 0   Global Step: 5190   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:13:44,697-Speed 3405.50 samples/sec   Loss 20.8657   LearningRate 0.0959   Epoch: 0   Global Step: 5200   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:13:47,770-Speed 3334.13 samples/sec   Loss 20.9328   LearningRate 0.0958   Epoch: 0   Global Step: 5210   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:13:50,798-Speed 3382.53 samples/sec   Loss 20.9787   LearningRate 0.0958   Epoch: 0   Global Step: 5220   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:13:53,954-Speed 3245.95 samples/sec   Loss 21.0067   LearningRate 0.0958   Epoch: 0   Global Step: 5230   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:13:57,006-Speed 3355.87 samples/sec   Loss 21.0497   LearningRate 0.0958   Epoch: 0   Global Step: 5240   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:14:00,106-Speed 3304.84 samples/sec   Loss 20.9729   LearningRate 0.0958   Epoch: 0   Global Step: 5250   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:14:03,128-Speed 3389.40 samples/sec   Loss 21.0634   LearningRate 0.0958   Epoch: 0   Global Step: 5260   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:14:06,249-Speed 3281.63 samples/sec   Loss 20.9382   LearningRate 0.0958   Epoch: 0   Global Step: 5270   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:14:09,279-Speed 3381.37 samples/sec   Loss 20.8344   LearningRate 0.0958   Epoch: 0   Global Step: 5280   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:14:12,315-Speed 3374.14 samples/sec   Loss 20.7467   LearningRate 0.0958   Epoch: 0   Global Step: 5290   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:14:15,334-Speed 3392.41 samples/sec   Loss 20.7336   LearningRate 0.0958   Epoch: 0   Global Step: 5300   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:14:18,359-Speed 3385.91 samples/sec   Loss 20.9109   LearningRate 0.0958   Epoch: 0   Global Step: 5310   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:14:21,370-Speed 3402.08 samples/sec   Loss 20.6746   LearningRate 0.0958   Epoch: 0   Global Step: 5320   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:14:24,376-Speed 3409.39 samples/sec   Loss 20.4640   LearningRate 0.0958   Epoch: 0   Global Step: 5330   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:14:27,450-Speed 3331.69 samples/sec   Loss 20.7739   LearningRate 0.0957   Epoch: 0   Global Step: 5340   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:14:30,489-Speed 3371.23 samples/sec   Loss 20.7419   LearningRate 0.0957   Epoch: 0   Global Step: 5350   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:14:33,491-Speed 3412.39 samples/sec   Loss 20.7129   LearningRate 0.0957   Epoch: 0   Global Step: 5360   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:14:36,551-Speed 3347.68 samples/sec   Loss 20.6918   LearningRate 0.0957   Epoch: 0   Global Step: 5370   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:14:39,644-Speed 3311.22 samples/sec   Loss 20.6485   LearningRate 0.0957   Epoch: 0   Global Step: 5380   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:14:42,662-Speed 3394.67 samples/sec   Loss 20.4216   LearningRate 0.0957   Epoch: 0   Global Step: 5390   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:14:45,663-Speed 3412.31 samples/sec   Loss 20.5417   LearningRate 0.0957   Epoch: 0   Global Step: 5400   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:14:48,680-Speed 3396.14 samples/sec   Loss 20.4340   LearningRate 0.0957   Epoch: 0   Global Step: 5410   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:14:51,720-Speed 3369.91 samples/sec   Loss 20.2416   LearningRate 0.0957   Epoch: 0   Global Step: 5420   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:14:54,736-Speed 3396.22 samples/sec   Loss 20.3339   LearningRate 0.0957   Epoch: 0   Global Step: 5430   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:14:57,764-Speed 3382.44 samples/sec   Loss 20.3454   LearningRate 0.0957   Epoch: 0   Global Step: 5440   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:15:00,808-Speed 3365.46 samples/sec   Loss 20.3512   LearningRate 0.0957   Epoch: 0   Global Step: 5450   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:15:03,812-Speed 3409.50 samples/sec   Loss 20.2448   LearningRate 0.0957   Epoch: 0   Global Step: 5460   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:15:06,837-Speed 3385.97 samples/sec   Loss 20.2229   LearningRate 0.0956   Epoch: 0   Global Step: 5470   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:15:09,867-Speed 3381.15 samples/sec   Loss 20.2804   LearningRate 0.0956   Epoch: 0   Global Step: 5480   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:15:12,925-Speed 3349.13 samples/sec   Loss 20.2078   LearningRate 0.0956   Epoch: 0   Global Step: 5490   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:15:15,973-Speed 3360.95 samples/sec   Loss 20.3010   LearningRate 0.0956   Epoch: 0   Global Step: 5500   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:15:19,004-Speed 3379.56 samples/sec   Loss 20.2488   LearningRate 0.0956   Epoch: 0   Global Step: 5510   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:15:22,023-Speed 3392.34 samples/sec   Loss 20.2289   LearningRate 0.0956   Epoch: 0   Global Step: 5520   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:15:25,047-Speed 3387.44 samples/sec   Loss 20.1103   LearningRate 0.0956   Epoch: 0   Global Step: 5530   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:15:28,118-Speed 3335.95 samples/sec   Loss 20.2316   LearningRate 0.0956   Epoch: 0   Global Step: 5540   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:15:31,155-Speed 3372.23 samples/sec   Loss 20.2224   LearningRate 0.0956   Epoch: 0   Global Step: 5550   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:15:34,240-Speed 3321.56 samples/sec   Loss 20.2049   LearningRate 0.0956   Epoch: 0   Global Step: 5560   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:15:37,313-Speed 3332.40 samples/sec   Loss 20.0200   LearningRate 0.0956   Epoch: 0   Global Step: 5570   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:15:40,379-Speed 3341.36 samples/sec   Loss 19.9634   LearningRate 0.0956   Epoch: 0   Global Step: 5580   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:15:43,457-Speed 3327.35 samples/sec   Loss 20.0507   LearningRate 0.0956   Epoch: 0   Global Step: 5590   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:15:46,508-Speed 3357.85 samples/sec   Loss 20.0461   LearningRate 0.0955   Epoch: 0   Global Step: 5600   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:15:49,567-Speed 3348.68 samples/sec   Loss 19.9137   LearningRate 0.0955   Epoch: 0   Global Step: 5610   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:15:52,652-Speed 3320.42 samples/sec   Loss 19.8032   LearningRate 0.0955   Epoch: 0   Global Step: 5620   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:15:55,730-Speed 3327.14 samples/sec   Loss 19.8615   LearningRate 0.0955   Epoch: 0   Global Step: 5630   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:15:58,766-Speed 3373.92 samples/sec   Loss 19.8116   LearningRate 0.0955   Epoch: 0   Global Step: 5640   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:16:01,825-Speed 3348.55 samples/sec   Loss 19.8653   LearningRate 0.0955   Epoch: 0   Global Step: 5650   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:16:04,939-Speed 3290.28 samples/sec   Loss 19.6968   LearningRate 0.0955   Epoch: 0   Global Step: 5660   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:16:07,971-Speed 3378.46 samples/sec   Loss 19.6801   LearningRate 0.0955   Epoch: 0   Global Step: 5670   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:16:10,989-Speed 3393.93 samples/sec   Loss 19.6700   LearningRate 0.0955   Epoch: 0   Global Step: 5680   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:16:14,001-Speed 3401.03 samples/sec   Loss 19.8872   LearningRate 0.0955   Epoch: 0   Global Step: 5690   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:16:17,148-Speed 3254.26 samples/sec   Loss 19.7362   LearningRate 0.0955   Epoch: 0   Global Step: 5700   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:16:20,176-Speed 3382.79 samples/sec   Loss 19.5734   LearningRate 0.0955   Epoch: 0   Global Step: 5710   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:16:23,179-Speed 3411.44 samples/sec   Loss 19.6703   LearningRate 0.0954   Epoch: 0   Global Step: 5720   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:16:26,322-Speed 3259.61 samples/sec   Loss 19.4415   LearningRate 0.0954   Epoch: 0   Global Step: 5730   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:16:29,405-Speed 3321.73 samples/sec   Loss 19.6119   LearningRate 0.0954   Epoch: 0   Global Step: 5740   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:16:32,474-Speed 3337.92 samples/sec   Loss 19.4744   LearningRate 0.0954   Epoch: 0   Global Step: 5750   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:16:35,539-Speed 3341.91 samples/sec   Loss 19.4289   LearningRate 0.0954   Epoch: 0   Global Step: 5760   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:16:38,626-Speed 3318.49 samples/sec   Loss 19.4231   LearningRate 0.0954   Epoch: 0   Global Step: 5770   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:16:41,710-Speed 3320.79 samples/sec   Loss 19.5562   LearningRate 0.0954   Epoch: 0   Global Step: 5780   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:16:44,745-Speed 3375.38 samples/sec   Loss 19.3348   LearningRate 0.0954   Epoch: 0   Global Step: 5790   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:16:48,325-Speed 2861.23 samples/sec   Loss 19.4014   LearningRate 0.0954   Epoch: 0   Global Step: 5800   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:16:51,393-Speed 3338.46 samples/sec   Loss 19.4031   LearningRate 0.0954   Epoch: 0   Global Step: 5810   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:16:54,472-Speed 3327.30 samples/sec   Loss 19.2788   LearningRate 0.0954   Epoch: 0   Global Step: 5820   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:16:57,485-Speed 3398.90 samples/sec   Loss 19.3726   LearningRate 0.0954   Epoch: 0   Global Step: 5830   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:17:00,511-Speed 3385.13 samples/sec   Loss 19.3568   LearningRate 0.0954   Epoch: 0   Global Step: 5840   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:17:03,524-Speed 3399.51 samples/sec   Loss 19.3167   LearningRate 0.0953   Epoch: 0   Global Step: 5850   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:17:06,523-Speed 3416.63 samples/sec   Loss 19.1248   LearningRate 0.0953   Epoch: 0   Global Step: 5860   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:17:09,522-Speed 3415.17 samples/sec   Loss 19.2465   LearningRate 0.0953   Epoch: 0   Global Step: 5870   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:17:12,556-Speed 3376.26 samples/sec   Loss 19.3398   LearningRate 0.0953   Epoch: 0   Global Step: 5880   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:17:15,659-Speed 3300.24 samples/sec   Loss 19.2733   LearningRate 0.0953   Epoch: 0   Global Step: 5890   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:17:18,668-Speed 3404.52 samples/sec   Loss 19.1388   LearningRate 0.0953   Epoch: 0   Global Step: 5900   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:17:21,673-Speed 3409.61 samples/sec   Loss 18.9335   LearningRate 0.0953   Epoch: 0   Global Step: 5910   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:17:24,702-Speed 3380.87 samples/sec   Loss 18.9983   LearningRate 0.0953   Epoch: 0   Global Step: 5920   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:17:27,720-Speed 3394.17 samples/sec   Loss 19.3016   LearningRate 0.0953   Epoch: 0   Global Step: 5930   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:17:30,777-Speed 3351.62 samples/sec   Loss 19.0702   LearningRate 0.0953   Epoch: 0   Global Step: 5940   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:17:33,792-Speed 3397.30 samples/sec   Loss 19.2086   LearningRate 0.0953   Epoch: 0   Global Step: 5950   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:17:36,902-Speed 3294.00 samples/sec   Loss 19.0789   LearningRate 0.0953   Epoch: 0   Global Step: 5960   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:17:40,004-Speed 3301.76 samples/sec   Loss 18.9874   LearningRate 0.0953   Epoch: 0   Global Step: 5970   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:17:43,021-Speed 3395.26 samples/sec   Loss 18.9292   LearningRate 0.0952   Epoch: 0   Global Step: 5980   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:17:46,018-Speed 3418.40 samples/sec   Loss 18.7594   LearningRate 0.0952   Epoch: 0   Global Step: 5990   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:17:49,073-Speed 3352.20 samples/sec   Loss 18.8444   LearningRate 0.0952   Epoch: 0   Global Step: 6000   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:17:52,103-Speed 3380.33 samples/sec   Loss 18.8487   LearningRate 0.0952   Epoch: 0   Global Step: 6010   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:17:55,190-Speed 3319.23 samples/sec   Loss 18.7093   LearningRate 0.0952   Epoch: 0   Global Step: 6020   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:17:58,173-Speed 3433.44 samples/sec   Loss 18.8933   LearningRate 0.0952   Epoch: 0   Global Step: 6030   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:18:01,259-Speed 3319.40 samples/sec   Loss 18.8089   LearningRate 0.0952   Epoch: 0   Global Step: 6040   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:18:04,305-Speed 3362.27 samples/sec   Loss 18.7497   LearningRate 0.0952   Epoch: 0   Global Step: 6050   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:18:07,348-Speed 3366.95 samples/sec   Loss 18.8858   LearningRate 0.0952   Epoch: 0   Global Step: 6060   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:18:10,353-Speed 3408.14 samples/sec   Loss 18.7387   LearningRate 0.0952   Epoch: 0   Global Step: 6070   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:18:14,181-Speed 2675.54 samples/sec   Loss 18.7094   LearningRate 0.0952   Epoch: 0   Global Step: 6080   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:18:17,194-Speed 3399.44 samples/sec   Loss 18.6326   LearningRate 0.0952   Epoch: 0   Global Step: 6090   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:18:20,225-Speed 3379.66 samples/sec   Loss 18.8026   LearningRate 0.0951   Epoch: 0   Global Step: 6100   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:18:24,779-Speed 2249.11 samples/sec   Loss 18.5482   LearningRate 0.0951   Epoch: 0   Global Step: 6110   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:18:27,805-Speed 3385.24 samples/sec   Loss 18.5365   LearningRate 0.0951   Epoch: 0   Global Step: 6120   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:18:30,836-Speed 3379.35 samples/sec   Loss 18.6434   LearningRate 0.0951   Epoch: 0   Global Step: 6130   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:18:33,866-Speed 3381.48 samples/sec   Loss 18.5056   LearningRate 0.0951   Epoch: 0   Global Step: 6140   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:18:36,891-Speed 3386.26 samples/sec   Loss 18.4670   LearningRate 0.0951   Epoch: 0   Global Step: 6150   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:18:39,971-Speed 3325.19 samples/sec   Loss 18.4393   LearningRate 0.0951   Epoch: 0   Global Step: 6160   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:18:42,993-Speed 3389.43 samples/sec   Loss 18.4555   LearningRate 0.0951   Epoch: 0   Global Step: 6170   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:18:46,072-Speed 3326.93 samples/sec   Loss 18.2998   LearningRate 0.0951   Epoch: 0   Global Step: 6180   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:18:49,116-Speed 3365.01 samples/sec   Loss 18.3600   LearningRate 0.0951   Epoch: 0   Global Step: 6190   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:18:52,166-Speed 3358.68 samples/sec   Loss 18.3082   LearningRate 0.0951   Epoch: 0   Global Step: 6200   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:18:55,211-Speed 3367.90 samples/sec   Loss 18.3712   LearningRate 0.0951   Epoch: 0   Global Step: 6210   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:18:58,285-Speed 3332.85 samples/sec   Loss 18.3573   LearningRate 0.0951   Epoch: 0   Global Step: 6220   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:19:01,405-Speed 3283.36 samples/sec   Loss 18.3320   LearningRate 0.0950   Epoch: 0   Global Step: 6230   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:19:04,466-Speed 3346.31 samples/sec   Loss 18.3330   LearningRate 0.0950   Epoch: 0   Global Step: 6240   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:19:07,511-Speed 3363.06 samples/sec   Loss 18.1172   LearningRate 0.0950   Epoch: 0   Global Step: 6250   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:19:10,538-Speed 3383.92 samples/sec   Loss 18.3042   LearningRate 0.0950   Epoch: 0   Global Step: 6260   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:19:13,649-Speed 3293.37 samples/sec   Loss 18.2177   LearningRate 0.0950   Epoch: 0   Global Step: 6270   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:19:16,724-Speed 3331.15 samples/sec   Loss 18.3747   LearningRate 0.0950   Epoch: 0   Global Step: 6280   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:19:19,784-Speed 3346.65 samples/sec   Loss 18.1673   LearningRate 0.0950   Epoch: 0   Global Step: 6290   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:19:22,833-Speed 3359.50 samples/sec   Loss 18.1631   LearningRate 0.0950   Epoch: 0   Global Step: 6300   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:19:25,874-Speed 3369.04 samples/sec   Loss 18.0954   LearningRate 0.0950   Epoch: 0   Global Step: 6310   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:19:28,976-Speed 3302.27 samples/sec   Loss 17.9901   LearningRate 0.0950   Epoch: 0   Global Step: 6320   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:19:32,009-Speed 3377.04 samples/sec   Loss 18.0002   LearningRate 0.0950   Epoch: 0   Global Step: 6330   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:19:35,060-Speed 3357.05 samples/sec   Loss 17.9670   LearningRate 0.0950   Epoch: 0   Global Step: 6340   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:19:38,114-Speed 3354.17 samples/sec   Loss 18.0064   LearningRate 0.0950   Epoch: 0   Global Step: 6350   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:19:41,133-Speed 3392.85 samples/sec   Loss 17.9806   LearningRate 0.0949   Epoch: 0   Global Step: 6360   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:19:44,148-Speed 3398.12 samples/sec   Loss 17.9624   LearningRate 0.0949   Epoch: 0   Global Step: 6370   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:19:47,180-Speed 3378.45 samples/sec   Loss 17.8904   LearningRate 0.0949   Epoch: 0   Global Step: 6380   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:19:50,181-Speed 3412.79 samples/sec   Loss 17.7599   LearningRate 0.0949   Epoch: 0   Global Step: 6390   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:19:53,182-Speed 3412.92 samples/sec   Loss 17.8070   LearningRate 0.0949   Epoch: 0   Global Step: 6400   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:19:56,220-Speed 3371.81 samples/sec   Loss 17.9383   LearningRate 0.0949   Epoch: 0   Global Step: 6410   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:19:59,286-Speed 3341.35 samples/sec   Loss 17.9659   LearningRate 0.0949   Epoch: 0   Global Step: 6420   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:20:02,352-Speed 3340.95 samples/sec   Loss 17.8814   LearningRate 0.0949   Epoch: 0   Global Step: 6430   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:20:05,371-Speed 3393.48 samples/sec   Loss 17.8385   LearningRate 0.0949   Epoch: 0   Global Step: 6440   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:20:08,438-Speed 3339.06 samples/sec   Loss 17.7980   LearningRate 0.0949   Epoch: 0   Global Step: 6450   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:20:11,472-Speed 3377.12 samples/sec   Loss 17.9913   LearningRate 0.0949   Epoch: 0   Global Step: 6460   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:20:14,571-Speed 3304.92 samples/sec   Loss 17.8704   LearningRate 0.0949   Epoch: 0   Global Step: 6470   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:20:17,607-Speed 3374.17 samples/sec   Loss 17.8592   LearningRate 0.0949   Epoch: 0   Global Step: 6480   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:20:20,636-Speed 3381.23 samples/sec   Loss 17.5519   LearningRate 0.0948   Epoch: 0   Global Step: 6490   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:20:23,650-Speed 3398.62 samples/sec   Loss 17.7118   LearningRate 0.0948   Epoch: 0   Global Step: 6500   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:20:26,683-Speed 3377.95 samples/sec   Loss 17.7970   LearningRate 0.0948   Epoch: 0   Global Step: 6510   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:20:29,720-Speed 3371.99 samples/sec   Loss 17.6537   LearningRate 0.0948   Epoch: 0   Global Step: 6520   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:20:32,779-Speed 3349.74 samples/sec   Loss 17.6085   LearningRate 0.0948   Epoch: 0   Global Step: 6530   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:20:35,822-Speed 3365.45 samples/sec   Loss 17.5641   LearningRate 0.0948   Epoch: 0   Global Step: 6540   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:20:38,871-Speed 3359.57 samples/sec   Loss 17.7718   LearningRate 0.0948   Epoch: 0   Global Step: 6550   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:20:41,948-Speed 3329.34 samples/sec   Loss 17.6572   LearningRate 0.0948   Epoch: 0   Global Step: 6560   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:20:44,965-Speed 3394.73 samples/sec   Loss 17.5290   LearningRate 0.0948   Epoch: 0   Global Step: 6570   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:20:48,003-Speed 3371.62 samples/sec   Loss 17.6317   LearningRate 0.0948   Epoch: 0   Global Step: 6580   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:20:51,048-Speed 3363.99 samples/sec   Loss 17.5668   LearningRate 0.0948   Epoch: 0   Global Step: 6590   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:20:54,155-Speed 3297.20 samples/sec   Loss 17.5147   LearningRate 0.0948   Epoch: 0   Global Step: 6600   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:20:57,240-Speed 3319.36 samples/sec   Loss 17.5341   LearningRate 0.0947   Epoch: 0   Global Step: 6610   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:21:00,292-Speed 3356.02 samples/sec   Loss 17.5568   LearningRate 0.0947   Epoch: 0   Global Step: 6620   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:21:03,377-Speed 3321.35 samples/sec   Loss 17.6925   LearningRate 0.0947   Epoch: 0   Global Step: 6630   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:21:06,452-Speed 3330.93 samples/sec   Loss 17.3901   LearningRate 0.0947   Epoch: 0   Global Step: 6640   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:21:09,447-Speed 3420.07 samples/sec   Loss 17.4746   LearningRate 0.0947   Epoch: 0   Global Step: 6650   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:21:12,484-Speed 3373.06 samples/sec   Loss 17.5558   LearningRate 0.0947   Epoch: 0   Global Step: 6660   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:21:15,545-Speed 3346.39 samples/sec   Loss 17.5045   LearningRate 0.0947   Epoch: 0   Global Step: 6670   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:21:18,594-Speed 3359.15 samples/sec   Loss 17.5088   LearningRate 0.0947   Epoch: 0   Global Step: 6680   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:21:21,625-Speed 3379.32 samples/sec   Loss 17.5500   LearningRate 0.0947   Epoch: 0   Global Step: 6690   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:21:24,675-Speed 3359.02 samples/sec   Loss 17.3938   LearningRate 0.0947   Epoch: 0   Global Step: 6700   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:21:27,710-Speed 3375.24 samples/sec   Loss 17.2365   LearningRate 0.0947   Epoch: 0   Global Step: 6710   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:21:30,746-Speed 3373.55 samples/sec   Loss 17.3121   LearningRate 0.0947   Epoch: 0   Global Step: 6720   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:21:33,776-Speed 3380.97 samples/sec   Loss 17.3580   LearningRate 0.0947   Epoch: 0   Global Step: 6730   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:21:36,814-Speed 3371.91 samples/sec   Loss 17.2010   LearningRate 0.0946   Epoch: 0   Global Step: 6740   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:21:39,822-Speed 3405.00 samples/sec   Loss 17.3129   LearningRate 0.0946   Epoch: 0   Global Step: 6750   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:21:42,882-Speed 3347.61 samples/sec   Loss 17.2898   LearningRate 0.0946   Epoch: 0   Global Step: 6760   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:21:45,890-Speed 3405.55 samples/sec   Loss 17.2959   LearningRate 0.0946   Epoch: 0   Global Step: 6770   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:21:48,910-Speed 3391.93 samples/sec   Loss 17.0935   LearningRate 0.0946   Epoch: 0   Global Step: 6780   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:21:51,944-Speed 3376.32 samples/sec   Loss 17.3204   LearningRate 0.0946   Epoch: 0   Global Step: 6790   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:21:54,974-Speed 3379.93 samples/sec   Loss 17.1124   LearningRate 0.0946   Epoch: 0   Global Step: 6800   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:21:58,000-Speed 3386.01 samples/sec   Loss 17.2942   LearningRate 0.0946   Epoch: 0   Global Step: 6810   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:22:01,030-Speed 3380.20 samples/sec   Loss 17.1193   LearningRate 0.0946   Epoch: 0   Global Step: 6820   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:22:04,092-Speed 3345.29 samples/sec   Loss 17.0106   LearningRate 0.0946   Epoch: 0   Global Step: 6830   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:22:07,098-Speed 3407.78 samples/sec   Loss 17.1215   LearningRate 0.0946   Epoch: 0   Global Step: 6840   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:22:10,104-Speed 3408.13 samples/sec   Loss 17.0658   LearningRate 0.0946   Epoch: 0   Global Step: 6850   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:22:13,182-Speed 3327.56 samples/sec   Loss 17.2164   LearningRate 0.0946   Epoch: 0   Global Step: 6860   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:22:16,257-Speed 3331.89 samples/sec   Loss 17.1545   LearningRate 0.0945   Epoch: 0   Global Step: 6870   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:22:19,258-Speed 3412.74 samples/sec   Loss 17.0109   LearningRate 0.0945   Epoch: 0   Global Step: 6880   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-27 02:22:22,276-Speed 3394.02 samples/sec   Loss 16.9847   LearningRate 0.0945   Epoch: 0   Global Step: 6890   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-27 02:22:25,304-Speed 3383.53 samples/sec   Loss 17.1891   LearningRate 0.0945   Epoch: 0   Global Step: 6900   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-27 02:22:28,327-Speed 3388.68 samples/sec   Loss 16.9376   LearningRate 0.0945   Epoch: 0   Global Step: 6910   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-27 02:22:31,353-Speed 3384.32 samples/sec   Loss 16.9385   LearningRate 0.0945   Epoch: 0   Global Step: 6920   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-27 02:22:34,405-Speed 3357.44 samples/sec   Loss 16.9332   LearningRate 0.0945   Epoch: 0   Global Step: 6930   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-27 02:22:37,448-Speed 3365.51 samples/sec   Loss 17.0098   LearningRate 0.0945   Epoch: 0   Global Step: 6940   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-27 02:22:40,530-Speed 3324.34 samples/sec   Loss 17.0161   LearningRate 0.0945   Epoch: 0   Global Step: 6950   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-27 02:22:43,585-Speed 3352.43 samples/sec   Loss 16.7965   LearningRate 0.0945   Epoch: 0   Global Step: 6960   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-27 02:22:46,621-Speed 3373.35 samples/sec   Loss 16.8241   LearningRate 0.0945   Epoch: 0   Global Step: 6970   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-04-27 02:22:49,641-Speed 3391.71 samples/sec   Loss 16.6479   LearningRate 0.0945   Epoch: 0   Global Step: 6980   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:22:52,709-Speed 3339.25 samples/sec   Loss 16.9037   LearningRate 0.0945   Epoch: 0   Global Step: 6990   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:22:55,739-Speed 3380.13 samples/sec   Loss 16.6872   LearningRate 0.0944   Epoch: 0   Global Step: 7000   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:22:58,766-Speed 3384.27 samples/sec   Loss 17.0340   LearningRate 0.0944   Epoch: 0   Global Step: 7010   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:23:01,789-Speed 3389.34 samples/sec   Loss 16.7981   LearningRate 0.0944   Epoch: 0   Global Step: 7020   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:23:04,832-Speed 3365.63 samples/sec   Loss 16.6041   LearningRate 0.0944   Epoch: 0   Global Step: 7030   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:23:07,839-Speed 3406.02 samples/sec   Loss 16.6277   LearningRate 0.0944   Epoch: 0   Global Step: 7040   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:23:10,885-Speed 3363.70 samples/sec   Loss 16.6833   LearningRate 0.0944   Epoch: 0   Global Step: 7050   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:23:13,897-Speed 3400.15 samples/sec   Loss 16.6205   LearningRate 0.0944   Epoch: 0   Global Step: 7060   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:23:16,923-Speed 3384.93 samples/sec   Loss 16.5948   LearningRate 0.0944   Epoch: 0   Global Step: 7070   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:23:19,939-Speed 3396.95 samples/sec   Loss 16.7893   LearningRate 0.0944   Epoch: 0   Global Step: 7080   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:23:22,946-Speed 3406.54 samples/sec   Loss 16.7299   LearningRate 0.0944   Epoch: 0   Global Step: 7090   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:23:25,974-Speed 3382.90 samples/sec   Loss 16.6761   LearningRate 0.0944   Epoch: 0   Global Step: 7100   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:23:29,011-Speed 3372.23 samples/sec   Loss 16.4482   LearningRate 0.0944   Epoch: 0   Global Step: 7110   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:23:32,009-Speed 3417.07 samples/sec   Loss 16.6936   LearningRate 0.0943   Epoch: 0   Global Step: 7120   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:23:35,015-Speed 3407.35 samples/sec   Loss 16.4689   LearningRate 0.0943   Epoch: 0   Global Step: 7130   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:23:38,015-Speed 3413.86 samples/sec   Loss 16.4481   LearningRate 0.0943   Epoch: 0   Global Step: 7140   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:23:41,101-Speed 3319.83 samples/sec   Loss 16.5386   LearningRate 0.0943   Epoch: 0   Global Step: 7150   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:23:44,098-Speed 3417.30 samples/sec   Loss 16.5728   LearningRate 0.0943   Epoch: 0   Global Step: 7160   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:23:47,132-Speed 3376.84 samples/sec   Loss 16.5424   LearningRate 0.0943   Epoch: 0   Global Step: 7170   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:23:50,159-Speed 3384.08 samples/sec   Loss 16.5434   LearningRate 0.0943   Epoch: 0   Global Step: 7180   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:23:53,207-Speed 3359.97 samples/sec   Loss 16.4116   LearningRate 0.0943   Epoch: 0   Global Step: 7190   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:23:56,223-Speed 3396.57 samples/sec   Loss 16.4286   LearningRate 0.0943   Epoch: 0   Global Step: 7200   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:23:59,235-Speed 3400.99 samples/sec   Loss 16.4632   LearningRate 0.0943   Epoch: 0   Global Step: 7210   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:24:02,260-Speed 3385.66 samples/sec   Loss 16.3344   LearningRate 0.0943   Epoch: 0   Global Step: 7220   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:24:05,277-Speed 3394.94 samples/sec   Loss 16.3463   LearningRate 0.0943   Epoch: 0   Global Step: 7230   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:24:08,272-Speed 3420.89 samples/sec   Loss 16.4364   LearningRate 0.0943   Epoch: 0   Global Step: 7240   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:24:11,314-Speed 3366.89 samples/sec   Loss 16.2549   LearningRate 0.0942   Epoch: 0   Global Step: 7250   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:24:14,332-Speed 3394.95 samples/sec   Loss 16.2656   LearningRate 0.0942   Epoch: 0   Global Step: 7260   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:24:17,341-Speed 3403.97 samples/sec   Loss 16.3522   LearningRate 0.0942   Epoch: 0   Global Step: 7270   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:24:20,348-Speed 3406.47 samples/sec   Loss 16.1795   LearningRate 0.0942   Epoch: 0   Global Step: 7280   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:24:23,469-Speed 3282.38 samples/sec   Loss 16.2808   LearningRate 0.0942   Epoch: 0   Global Step: 7290   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:24:26,511-Speed 3366.89 samples/sec   Loss 16.1559   LearningRate 0.0942   Epoch: 0   Global Step: 7300   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:24:29,512-Speed 3413.64 samples/sec   Loss 16.2610   LearningRate 0.0942   Epoch: 0   Global Step: 7310   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:24:32,525-Speed 3399.17 samples/sec   Loss 16.1598   LearningRate 0.0942   Epoch: 0   Global Step: 7320   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:24:35,536-Speed 3401.96 samples/sec   Loss 16.1316   LearningRate 0.0942   Epoch: 0   Global Step: 7330   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:24:38,558-Speed 3389.54 samples/sec   Loss 16.0974   LearningRate 0.0942   Epoch: 0   Global Step: 7340   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:24:41,630-Speed 3334.90 samples/sec   Loss 16.0439   LearningRate 0.0942   Epoch: 0   Global Step: 7350   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:24:44,669-Speed 3371.20 samples/sec   Loss 16.2218   LearningRate 0.0942   Epoch: 0   Global Step: 7360   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:24:47,678-Speed 3403.44 samples/sec   Loss 16.0729   LearningRate 0.0942   Epoch: 0   Global Step: 7370   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:24:50,695-Speed 3395.93 samples/sec   Loss 16.1185   LearningRate 0.0941   Epoch: 0   Global Step: 7380   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:24:53,750-Speed 3351.79 samples/sec   Loss 15.9999   LearningRate 0.0941   Epoch: 0   Global Step: 7390   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:24:56,784-Speed 3376.45 samples/sec   Loss 16.0914   LearningRate 0.0941   Epoch: 0   Global Step: 7400   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:24:59,810-Speed 3385.48 samples/sec   Loss 16.1229   LearningRate 0.0941   Epoch: 0   Global Step: 7410   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:25:02,832-Speed 3389.86 samples/sec   Loss 15.9073   LearningRate 0.0941   Epoch: 0   Global Step: 7420   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:25:05,886-Speed 3353.74 samples/sec   Loss 16.0265   LearningRate 0.0941   Epoch: 0   Global Step: 7430   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:25:08,897-Speed 3402.31 samples/sec   Loss 15.8871   LearningRate 0.0941   Epoch: 0   Global Step: 7440   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:25:11,938-Speed 3368.37 samples/sec   Loss 15.9589   LearningRate 0.0941   Epoch: 0   Global Step: 7450   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:25:14,980-Speed 3367.49 samples/sec   Loss 15.8745   LearningRate 0.0941   Epoch: 0   Global Step: 7460   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:25:18,054-Speed 3332.29 samples/sec   Loss 15.9191   LearningRate 0.0941   Epoch: 0   Global Step: 7470   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:25:21,065-Speed 3401.20 samples/sec   Loss 16.0280   LearningRate 0.0941   Epoch: 0   Global Step: 7480   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:25:24,138-Speed 3333.18 samples/sec   Loss 15.9953   LearningRate 0.0941   Epoch: 0   Global Step: 7490   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:25:27,199-Speed 3346.65 samples/sec   Loss 15.8297   LearningRate 0.0941   Epoch: 0   Global Step: 7500   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:25:30,218-Speed 3393.08 samples/sec   Loss 15.7890   LearningRate 0.0940   Epoch: 0   Global Step: 7510   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:25:33,251-Speed 3377.38 samples/sec   Loss 15.9123   LearningRate 0.0940   Epoch: 0   Global Step: 7520   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:25:36,353-Speed 3301.70 samples/sec   Loss 15.8976   LearningRate 0.0940   Epoch: 0   Global Step: 7530   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:25:39,430-Speed 3329.32 samples/sec   Loss 15.8619   LearningRate 0.0940   Epoch: 0   Global Step: 7540   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:25:42,471-Speed 3367.66 samples/sec   Loss 15.8882   LearningRate 0.0940   Epoch: 0   Global Step: 7550   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:25:45,491-Speed 3392.04 samples/sec   Loss 15.8700   LearningRate 0.0940   Epoch: 0   Global Step: 7560   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:25:48,563-Speed 3334.89 samples/sec   Loss 15.8580   LearningRate 0.0940   Epoch: 0   Global Step: 7570   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:25:51,704-Speed 3260.81 samples/sec   Loss 15.9012   LearningRate 0.0940   Epoch: 0   Global Step: 7580   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:25:54,764-Speed 3348.18 samples/sec   Loss 16.0402   LearningRate 0.0940   Epoch: 0   Global Step: 7590   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:25:57,800-Speed 3374.07 samples/sec   Loss 15.9507   LearningRate 0.0940   Epoch: 0   Global Step: 7600   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:26:00,913-Speed 3291.07 samples/sec   Loss 15.8372   LearningRate 0.0940   Epoch: 0   Global Step: 7610   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:26:04,012-Speed 3305.14 samples/sec   Loss 15.7950   LearningRate 0.0940   Epoch: 0   Global Step: 7620   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:26:07,053-Speed 3369.31 samples/sec   Loss 15.6979   LearningRate 0.0940   Epoch: 0   Global Step: 7630   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:26:10,092-Speed 3369.87 samples/sec   Loss 15.6591   LearningRate 0.0939   Epoch: 0   Global Step: 7640   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:26:13,145-Speed 3355.22 samples/sec   Loss 15.7075   LearningRate 0.0939   Epoch: 0   Global Step: 7650   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:26:16,186-Speed 3368.27 samples/sec   Loss 15.6055   LearningRate 0.0939   Epoch: 0   Global Step: 7660   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:26:19,240-Speed 3354.19 samples/sec   Loss 15.7391   LearningRate 0.0939   Epoch: 0   Global Step: 7670   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:26:22,250-Speed 3403.22 samples/sec   Loss 15.8573   LearningRate 0.0939   Epoch: 0   Global Step: 7680   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:26:25,407-Speed 3244.38 samples/sec   Loss 15.7252   LearningRate 0.0939   Epoch: 0   Global Step: 7690   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:26:28,464-Speed 3350.16 samples/sec   Loss 15.6564   LearningRate 0.0939   Epoch: 0   Global Step: 7700   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:26:31,526-Speed 3345.29 samples/sec   Loss 15.7173   LearningRate 0.0939   Epoch: 0   Global Step: 7710   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:26:34,564-Speed 3372.63 samples/sec   Loss 15.6437   LearningRate 0.0939   Epoch: 0   Global Step: 7720   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:26:37,655-Speed 3313.84 samples/sec   Loss 15.6949   LearningRate 0.0939   Epoch: 0   Global Step: 7730   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:26:40,758-Speed 3300.97 samples/sec   Loss 15.6978   LearningRate 0.0939   Epoch: 0   Global Step: 7740   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:26:43,790-Speed 3377.89 samples/sec   Loss 15.8214   LearningRate 0.0939   Epoch: 0   Global Step: 7750   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:26:46,867-Speed 3329.10 samples/sec   Loss 15.5763   LearningRate 0.0939   Epoch: 0   Global Step: 7760   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:26:49,897-Speed 3380.55 samples/sec   Loss 15.5351   LearningRate 0.0938   Epoch: 0   Global Step: 7770   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:26:52,972-Speed 3331.39 samples/sec   Loss 15.4865   LearningRate 0.0938   Epoch: 0   Global Step: 7780   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:26:56,020-Speed 3360.70 samples/sec   Loss 15.4382   LearningRate 0.0938   Epoch: 0   Global Step: 7790   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:26:59,065-Speed 3363.58 samples/sec   Loss 15.4773   LearningRate 0.0938   Epoch: 0   Global Step: 7800   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:27:02,157-Speed 3313.01 samples/sec   Loss 15.4587   LearningRate 0.0938   Epoch: 0   Global Step: 7810   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:27:05,202-Speed 3363.35 samples/sec   Loss 15.5661   LearningRate 0.0938   Epoch: 0   Global Step: 7820   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:27:08,221-Speed 3393.56 samples/sec   Loss 15.4630   LearningRate 0.0938   Epoch: 0   Global Step: 7830   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:27:11,257-Speed 3374.46 samples/sec   Loss 15.3276   LearningRate 0.0938   Epoch: 0   Global Step: 7840   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:27:14,272-Speed 3396.41 samples/sec   Loss 15.4344   LearningRate 0.0938   Epoch: 0   Global Step: 7850   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:27:17,297-Speed 3386.68 samples/sec   Loss 15.3681   LearningRate 0.0938   Epoch: 0   Global Step: 7860   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:27:20,335-Speed 3372.17 samples/sec   Loss 15.3410   LearningRate 0.0938   Epoch: 0   Global Step: 7870   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:27:23,397-Speed 3344.67 samples/sec   Loss 15.4013   LearningRate 0.0938   Epoch: 0   Global Step: 7880   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:27:26,425-Speed 3382.89 samples/sec   Loss 15.5220   LearningRate 0.0937   Epoch: 0   Global Step: 7890   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:27:29,473-Speed 3361.16 samples/sec   Loss 15.3113   LearningRate 0.0937   Epoch: 0   Global Step: 7900   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:27:32,496-Speed 3388.01 samples/sec   Loss 15.4449   LearningRate 0.0937   Epoch: 0   Global Step: 7910   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:27:35,569-Speed 3333.29 samples/sec   Loss 15.4131   LearningRate 0.0937   Epoch: 0   Global Step: 7920   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:27:38,604-Speed 3375.85 samples/sec   Loss 15.2875   LearningRate 0.0937   Epoch: 0   Global Step: 7930   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:27:41,639-Speed 3374.06 samples/sec   Loss 15.3911   LearningRate 0.0937   Epoch: 0   Global Step: 7940   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:27:44,659-Speed 3392.27 samples/sec   Loss 15.1920   LearningRate 0.0937   Epoch: 0   Global Step: 7950   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:27:47,680-Speed 3390.82 samples/sec   Loss 15.2394   LearningRate 0.0937   Epoch: 0   Global Step: 7960   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:27:50,717-Speed 3372.86 samples/sec   Loss 15.2062   LearningRate 0.0937   Epoch: 0   Global Step: 7970   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:27:53,746-Speed 3382.20 samples/sec   Loss 15.1675   LearningRate 0.0937   Epoch: 0   Global Step: 7980   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:27:56,813-Speed 3339.40 samples/sec   Loss 15.4406   LearningRate 0.0937   Epoch: 0   Global Step: 7990   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:27:59,900-Speed 3317.98 samples/sec   Loss 15.1063   LearningRate 0.0937   Epoch: 0   Global Step: 8000   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:28:03,034-Speed 3269.12 samples/sec   Loss 15.2816   LearningRate 0.0937   Epoch: 0   Global Step: 8010   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:28:06,103-Speed 3337.18 samples/sec   Loss 15.0493   LearningRate 0.0936   Epoch: 0   Global Step: 8020   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-04-27 02:28:09,145-Speed 3366.83 samples/sec   Loss 15.1079   LearningRate 0.0936   Epoch: 0   Global Step: 8030   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:28:12,193-Speed 3361.77 samples/sec   Loss 15.0099   LearningRate 0.0936   Epoch: 0   Global Step: 8040   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:28:15,213-Speed 3390.92 samples/sec   Loss 15.0728   LearningRate 0.0936   Epoch: 0   Global Step: 8050   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:28:18,275-Speed 3345.55 samples/sec   Loss 15.2202   LearningRate 0.0936   Epoch: 0   Global Step: 8060   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:28:21,328-Speed 3355.84 samples/sec   Loss 15.0637   LearningRate 0.0936   Epoch: 0   Global Step: 8070   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:28:24,399-Speed 3335.25 samples/sec   Loss 15.2076   LearningRate 0.0936   Epoch: 0   Global Step: 8080   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:28:27,444-Speed 3363.86 samples/sec   Loss 15.1240   LearningRate 0.0936   Epoch: 0   Global Step: 8090   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:28:30,572-Speed 3274.32 samples/sec   Loss 15.0730   LearningRate 0.0936   Epoch: 0   Global Step: 8100   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:28:33,587-Speed 3397.72 samples/sec   Loss 14.8852   LearningRate 0.0936   Epoch: 0   Global Step: 8110   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:28:36,667-Speed 3326.39 samples/sec   Loss 14.9480   LearningRate 0.0936   Epoch: 0   Global Step: 8120   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:28:39,729-Speed 3345.14 samples/sec   Loss 15.1346   LearningRate 0.0936   Epoch: 0   Global Step: 8130   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:28:42,776-Speed 3362.00 samples/sec   Loss 15.1343   LearningRate 0.0936   Epoch: 0   Global Step: 8140   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:28:45,806-Speed 3380.48 samples/sec   Loss 15.0379   LearningRate 0.0935   Epoch: 0   Global Step: 8150   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:28:48,871-Speed 3342.17 samples/sec   Loss 14.8474   LearningRate 0.0935   Epoch: 0   Global Step: 8160   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:28:51,938-Speed 3340.41 samples/sec   Loss 14.8912   LearningRate 0.0935   Epoch: 0   Global Step: 8170   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:28:55,010-Speed 3334.21 samples/sec   Loss 14.7703   LearningRate 0.0935   Epoch: 0   Global Step: 8180   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:28:58,040-Speed 3380.76 samples/sec   Loss 14.9670   LearningRate 0.0935   Epoch: 0   Global Step: 8190   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:29:01,064-Speed 3386.67 samples/sec   Loss 15.0207   LearningRate 0.0935   Epoch: 0   Global Step: 8200   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:29:04,129-Speed 3342.57 samples/sec   Loss 14.9389   LearningRate 0.0935   Epoch: 0   Global Step: 8210   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:29:07,137-Speed 3404.92 samples/sec   Loss 14.9171   LearningRate 0.0935   Epoch: 0   Global Step: 8220   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:29:10,135-Speed 3416.83 samples/sec   Loss 14.9299   LearningRate 0.0935   Epoch: 0   Global Step: 8230   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:29:13,223-Speed 3316.44 samples/sec   Loss 14.7754   LearningRate 0.0935   Epoch: 0   Global Step: 8240   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:29:16,251-Speed 3383.20 samples/sec   Loss 14.8773   LearningRate 0.0935   Epoch: 0   Global Step: 8250   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:29:19,295-Speed 3365.04 samples/sec   Loss 14.8752   LearningRate 0.0935   Epoch: 0   Global Step: 8260   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:29:22,307-Speed 3401.41 samples/sec   Loss 14.9895   LearningRate 0.0935   Epoch: 0   Global Step: 8270   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:29:25,331-Speed 3386.66 samples/sec   Loss 14.7227   LearningRate 0.0934   Epoch: 0   Global Step: 8280   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:29:28,412-Speed 3324.88 samples/sec   Loss 14.7693   LearningRate 0.0934   Epoch: 0   Global Step: 8290   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:29:31,482-Speed 3335.76 samples/sec   Loss 14.9313   LearningRate 0.0934   Epoch: 0   Global Step: 8300   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:29:34,493-Speed 3402.80 samples/sec   Loss 14.8712   LearningRate 0.0934   Epoch: 0   Global Step: 8310   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:29:37,566-Speed 3333.12 samples/sec   Loss 14.7249   LearningRate 0.0934   Epoch: 0   Global Step: 8320   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:29:40,629-Speed 3343.90 samples/sec   Loss 14.5809   LearningRate 0.0934   Epoch: 0   Global Step: 8330   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:29:43,678-Speed 3359.54 samples/sec   Loss 14.7709   LearningRate 0.0934   Epoch: 0   Global Step: 8340   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:29:46,746-Speed 3338.73 samples/sec   Loss 14.8713   LearningRate 0.0934   Epoch: 0   Global Step: 8350   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:29:49,765-Speed 3393.12 samples/sec   Loss 14.7478   LearningRate 0.0934   Epoch: 0   Global Step: 8360   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:29:52,812-Speed 3361.46 samples/sec   Loss 14.7736   LearningRate 0.0934   Epoch: 0   Global Step: 8370   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:29:55,874-Speed 3345.14 samples/sec   Loss 14.8030   LearningRate 0.0934   Epoch: 0   Global Step: 8380   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:29:58,951-Speed 3328.95 samples/sec   Loss 14.8977   LearningRate 0.0934   Epoch: 0   Global Step: 8390   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:30:02,007-Speed 3351.74 samples/sec   Loss 14.8011   LearningRate 0.0934   Epoch: 0   Global Step: 8400   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:30:05,042-Speed 3374.55 samples/sec   Loss 14.8664   LearningRate 0.0933   Epoch: 0   Global Step: 8410   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:30:08,088-Speed 3363.30 samples/sec   Loss 14.8660   LearningRate 0.0933   Epoch: 0   Global Step: 8420   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:30:11,123-Speed 3374.65 samples/sec   Loss 14.6399   LearningRate 0.0933   Epoch: 0   Global Step: 8430   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:30:14,210-Speed 3318.70 samples/sec   Loss 14.6589   LearningRate 0.0933   Epoch: 0   Global Step: 8440   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:30:17,236-Speed 3384.69 samples/sec   Loss 14.6704   LearningRate 0.0933   Epoch: 0   Global Step: 8450   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:30:20,302-Speed 3341.28 samples/sec   Loss 14.5814   LearningRate 0.0933   Epoch: 0   Global Step: 8460   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:30:23,358-Speed 3352.43 samples/sec   Loss 14.5642   LearningRate 0.0933   Epoch: 0   Global Step: 8470   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:30:26,418-Speed 3347.57 samples/sec   Loss 14.6469   LearningRate 0.0933   Epoch: 0   Global Step: 8480   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:30:29,443-Speed 3385.77 samples/sec   Loss 14.4890   LearningRate 0.0933   Epoch: 0   Global Step: 8490   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:30:32,440-Speed 3417.96 samples/sec   Loss 14.6400   LearningRate 0.0933   Epoch: 0   Global Step: 8500   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:30:35,489-Speed 3360.21 samples/sec   Loss 14.7055   LearningRate 0.0933   Epoch: 0   Global Step: 8510   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:30:38,571-Speed 3322.96 samples/sec   Loss 14.4513   LearningRate 0.0933   Epoch: 0   Global Step: 8520   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:30:41,622-Speed 3357.23 samples/sec   Loss 14.7147   LearningRate 0.0933   Epoch: 0   Global Step: 8530   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:30:44,627-Speed 3409.10 samples/sec   Loss 14.7262   LearningRate 0.0932   Epoch: 0   Global Step: 8540   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:30:47,680-Speed 3355.20 samples/sec   Loss 14.4913   LearningRate 0.0932   Epoch: 0   Global Step: 8550   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:30:50,699-Speed 3392.95 samples/sec   Loss 14.7242   LearningRate 0.0932   Epoch: 0   Global Step: 8560   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:30:53,761-Speed 3344.52 samples/sec   Loss 14.4925   LearningRate 0.0932   Epoch: 0   Global Step: 8570   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:30:56,797-Speed 3374.29 samples/sec   Loss 14.4723   LearningRate 0.0932   Epoch: 0   Global Step: 8580   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:30:59,850-Speed 3355.00 samples/sec   Loss 14.6302   LearningRate 0.0932   Epoch: 0   Global Step: 8590   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:31:02,880-Speed 3381.45 samples/sec   Loss 14.4860   LearningRate 0.0932   Epoch: 0   Global Step: 8600   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:31:05,898-Speed 3393.84 samples/sec   Loss 14.4917   LearningRate 0.0932   Epoch: 0   Global Step: 8610   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:31:08,905-Speed 3406.43 samples/sec   Loss 14.3838   LearningRate 0.0932   Epoch: 0   Global Step: 8620   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:31:11,960-Speed 3353.06 samples/sec   Loss 14.2668   LearningRate 0.0932   Epoch: 0   Global Step: 8630   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:31:14,992-Speed 3378.10 samples/sec   Loss 14.4913   LearningRate 0.0932   Epoch: 0   Global Step: 8640   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:31:18,032-Speed 3368.95 samples/sec   Loss 14.3601   LearningRate 0.0932   Epoch: 0   Global Step: 8650   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-04-27 02:31:21,047-Speed 3398.17 samples/sec   Loss 14.4875   LearningRate 0.0931   Epoch: 0   Global Step: 8660   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:31:24,086-Speed 3370.65 samples/sec   Loss 14.3063   LearningRate 0.0931   Epoch: 0   Global Step: 8670   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:31:27,120-Speed 3376.27 samples/sec   Loss 14.3900   LearningRate 0.0931   Epoch: 0   Global Step: 8680   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:31:30,199-Speed 3326.07 samples/sec   Loss 14.4752   LearningRate 0.0931   Epoch: 0   Global Step: 8690   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:31:33,252-Speed 3355.51 samples/sec   Loss 14.4012   LearningRate 0.0931   Epoch: 0   Global Step: 8700   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:31:36,356-Speed 3300.39 samples/sec   Loss 14.2709   LearningRate 0.0931   Epoch: 0   Global Step: 8710   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:31:39,449-Speed 3311.01 samples/sec   Loss 14.2855   LearningRate 0.0931   Epoch: 0   Global Step: 8720   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:31:42,487-Speed 3372.10 samples/sec   Loss 14.3039   LearningRate 0.0931   Epoch: 0   Global Step: 8730   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:31:45,499-Speed 3401.16 samples/sec   Loss 14.4217   LearningRate 0.0931   Epoch: 0   Global Step: 8740   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-04-27 02:31:48,505-Speed 3407.60 samples/sec   Loss 14.3831   LearningRate 0.0931   Epoch: 0   Global Step: 8750   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:31:51,606-Speed 3302.83 samples/sec   Loss 14.3387   LearningRate 0.0931   Epoch: 0   Global Step: 8760   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:31:54,660-Speed 3354.35 samples/sec   Loss 14.2794   LearningRate 0.0931   Epoch: 0   Global Step: 8770   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:31:57,696-Speed 3373.24 samples/sec   Loss 14.1944   LearningRate 0.0931   Epoch: 0   Global Step: 8780   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:32:00,713-Speed 3395.98 samples/sec   Loss 14.2583   LearningRate 0.0930   Epoch: 0   Global Step: 8790   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:32:03,786-Speed 3332.79 samples/sec   Loss 14.2498   LearningRate 0.0930   Epoch: 0   Global Step: 8800   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:32:06,847-Speed 3347.07 samples/sec   Loss 14.2480   LearningRate 0.0930   Epoch: 0   Global Step: 8810   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:32:09,856-Speed 3403.88 samples/sec   Loss 14.3114   LearningRate 0.0930   Epoch: 0   Global Step: 8820   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:32:12,924-Speed 3338.34 samples/sec   Loss 14.3654   LearningRate 0.0930   Epoch: 0   Global Step: 8830   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:32:15,957-Speed 3377.52 samples/sec   Loss 14.2178   LearningRate 0.0930   Epoch: 0   Global Step: 8840   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:32:18,981-Speed 3386.80 samples/sec   Loss 14.2984   LearningRate 0.0930   Epoch: 0   Global Step: 8850   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:32:22,072-Speed 3314.52 samples/sec   Loss 14.2728   LearningRate 0.0930   Epoch: 0   Global Step: 8860   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:32:25,122-Speed 3357.52 samples/sec   Loss 14.2254   LearningRate 0.0930   Epoch: 0   Global Step: 8870   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:32:28,224-Speed 3303.06 samples/sec   Loss 14.0192   LearningRate 0.0930   Epoch: 0   Global Step: 8880   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:32:31,253-Speed 3381.96 samples/sec   Loss 14.0697   LearningRate 0.0930   Epoch: 0   Global Step: 8890   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:32:34,271-Speed 3393.13 samples/sec   Loss 14.2659   LearningRate 0.0930   Epoch: 0   Global Step: 8900   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:32:37,300-Speed 3381.87 samples/sec   Loss 14.0966   LearningRate 0.0930   Epoch: 0   Global Step: 8910   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:32:40,299-Speed 3416.43 samples/sec   Loss 14.2033   LearningRate 0.0929   Epoch: 0   Global Step: 8920   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:32:43,321-Speed 3389.12 samples/sec   Loss 14.0477   LearningRate 0.0929   Epoch: 0   Global Step: 8930   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:32:46,357-Speed 3374.77 samples/sec   Loss 14.1236   LearningRate 0.0929   Epoch: 0   Global Step: 8940   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:32:49,379-Speed 3389.02 samples/sec   Loss 13.9982   LearningRate 0.0929   Epoch: 0   Global Step: 8950   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:32:52,480-Speed 3303.64 samples/sec   Loss 13.9340   LearningRate 0.0929   Epoch: 0   Global Step: 8960   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:32:55,489-Speed 3403.36 samples/sec   Loss 13.9264   LearningRate 0.0929   Epoch: 0   Global Step: 8970   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:32:58,488-Speed 3416.41 samples/sec   Loss 14.0852   LearningRate 0.0929   Epoch: 0   Global Step: 8980   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:33:01,567-Speed 3326.11 samples/sec   Loss 14.2248   LearningRate 0.0929   Epoch: 0   Global Step: 8990   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:33:04,606-Speed 3370.73 samples/sec   Loss 14.1221   LearningRate 0.0929   Epoch: 0   Global Step: 9000   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:33:07,611-Speed 3409.50 samples/sec   Loss 14.1601   LearningRate 0.0929   Epoch: 0   Global Step: 9010   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:33:10,630-Speed 3392.35 samples/sec   Loss 14.0928   LearningRate 0.0929   Epoch: 0   Global Step: 9020   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:33:13,732-Speed 3301.92 samples/sec   Loss 13.9676   LearningRate 0.0929   Epoch: 0   Global Step: 9030   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:33:16,745-Speed 3399.84 samples/sec   Loss 14.0106   LearningRate 0.0929   Epoch: 0   Global Step: 9040   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:33:19,759-Speed 3398.39 samples/sec   Loss 13.9173   LearningRate 0.0928   Epoch: 0   Global Step: 9050   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:33:22,782-Speed 3388.89 samples/sec   Loss 13.8040   LearningRate 0.0928   Epoch: 0   Global Step: 9060   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:33:25,851-Speed 3338.52 samples/sec   Loss 13.8510   LearningRate 0.0928   Epoch: 0   Global Step: 9070   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 02:33:28,891-Speed 3369.29 samples/sec   Loss 14.0106   LearningRate 0.0928   Epoch: 0   Global Step: 9080   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:33:31,925-Speed 3376.00 samples/sec   Loss 13.9923   LearningRate 0.0928   Epoch: 0   Global Step: 9090   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:33:34,957-Speed 3379.32 samples/sec   Loss 13.9112   LearningRate 0.0928   Epoch: 0   Global Step: 9100   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:33:37,985-Speed 3382.99 samples/sec   Loss 14.0273   LearningRate 0.0928   Epoch: 0   Global Step: 9110   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:33:41,005-Speed 3391.48 samples/sec   Loss 13.9166   LearningRate 0.0928   Epoch: 0   Global Step: 9120   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:33:44,047-Speed 3366.87 samples/sec   Loss 14.0047   LearningRate 0.0928   Epoch: 0   Global Step: 9130   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:33:47,143-Speed 3309.12 samples/sec   Loss 13.8359   LearningRate 0.0928   Epoch: 0   Global Step: 9140   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:33:50,155-Speed 3401.21 samples/sec   Loss 13.8300   LearningRate 0.0928   Epoch: 0   Global Step: 9150   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:33:53,169-Speed 3398.42 samples/sec   Loss 13.8138   LearningRate 0.0928   Epoch: 0   Global Step: 9160   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:33:56,184-Speed 3396.95 samples/sec   Loss 13.9034   LearningRate 0.0928   Epoch: 0   Global Step: 9170   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:33:59,249-Speed 3341.87 samples/sec   Loss 13.7498   LearningRate 0.0927   Epoch: 0   Global Step: 9180   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:34:02,400-Speed 3250.96 samples/sec   Loss 13.8470   LearningRate 0.0927   Epoch: 0   Global Step: 9190   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:34:05,471-Speed 3336.23 samples/sec   Loss 13.8841   LearningRate 0.0927   Epoch: 0   Global Step: 9200   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:34:08,510-Speed 3370.67 samples/sec   Loss 13.7688   LearningRate 0.0927   Epoch: 0   Global Step: 9210   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:34:11,585-Speed 3331.08 samples/sec   Loss 13.7472   LearningRate 0.0927   Epoch: 0   Global Step: 9220   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:34:14,705-Speed 3282.61 samples/sec   Loss 13.9161   LearningRate 0.0927   Epoch: 0   Global Step: 9230   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:34:17,746-Speed 3369.06 samples/sec   Loss 13.7921   LearningRate 0.0927   Epoch: 0   Global Step: 9240   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:34:20,768-Speed 3388.70 samples/sec   Loss 13.8402   LearningRate 0.0927   Epoch: 0   Global Step: 9250   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:34:23,808-Speed 3369.83 samples/sec   Loss 13.6683   LearningRate 0.0927   Epoch: 0   Global Step: 9260   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:34:26,842-Speed 3376.73 samples/sec   Loss 13.7430   LearningRate 0.0927   Epoch: 0   Global Step: 9270   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:34:29,886-Speed 3364.87 samples/sec   Loss 13.6543   LearningRate 0.0927   Epoch: 0   Global Step: 9280   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:34:32,878-Speed 3423.91 samples/sec   Loss 13.8918   LearningRate 0.0927   Epoch: 0   Global Step: 9290   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:34:35,931-Speed 3355.60 samples/sec   Loss 13.6885   LearningRate 0.0927   Epoch: 0   Global Step: 9300   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:34:38,963-Speed 3377.73 samples/sec   Loss 13.7264   LearningRate 0.0926   Epoch: 0   Global Step: 9310   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:34:42,011-Speed 3360.03 samples/sec   Loss 13.6256   LearningRate 0.0926   Epoch: 0   Global Step: 9320   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:34:45,059-Speed 3361.42 samples/sec   Loss 13.6380   LearningRate 0.0926   Epoch: 0   Global Step: 9330   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:34:48,080-Speed 3390.89 samples/sec   Loss 13.6878   LearningRate 0.0926   Epoch: 0   Global Step: 9340   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:34:51,104-Speed 3386.92 samples/sec   Loss 13.7226   LearningRate 0.0926   Epoch: 0   Global Step: 9350   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:34:54,107-Speed 3411.46 samples/sec   Loss 13.5638   LearningRate 0.0926   Epoch: 0   Global Step: 9360   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:34:57,103-Speed 3418.39 samples/sec   Loss 13.7973   LearningRate 0.0926   Epoch: 0   Global Step: 9370   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:35:00,141-Speed 3372.09 samples/sec   Loss 13.7290   LearningRate 0.0926   Epoch: 0   Global Step: 9380   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:35:03,139-Speed 3416.75 samples/sec   Loss 13.6284   LearningRate 0.0926   Epoch: 0   Global Step: 9390   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:35:06,168-Speed 3382.08 samples/sec   Loss 13.6535   LearningRate 0.0926   Epoch: 0   Global Step: 9400   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:35:09,177-Speed 3404.37 samples/sec   Loss 13.7013   LearningRate 0.0926   Epoch: 0   Global Step: 9410   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:35:12,226-Speed 3358.99 samples/sec   Loss 13.6549   LearningRate 0.0926   Epoch: 0   Global Step: 9420   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:35:15,257-Speed 3379.75 samples/sec   Loss 13.6905   LearningRate 0.0926   Epoch: 0   Global Step: 9430   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:35:18,254-Speed 3417.96 samples/sec   Loss 13.5795   LearningRate 0.0925   Epoch: 0   Global Step: 9440   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:35:21,241-Speed 3428.55 samples/sec   Loss 13.6554   LearningRate 0.0925   Epoch: 0   Global Step: 9450   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:35:24,322-Speed 3324.76 samples/sec   Loss 13.6308   LearningRate 0.0925   Epoch: 0   Global Step: 9460   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:35:27,358-Speed 3374.54 samples/sec   Loss 13.7458   LearningRate 0.0925   Epoch: 0   Global Step: 9470   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:35:30,401-Speed 3366.74 samples/sec   Loss 13.6780   LearningRate 0.0925   Epoch: 0   Global Step: 9480   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:35:33,422-Speed 3389.85 samples/sec   Loss 13.4175   LearningRate 0.0925   Epoch: 0   Global Step: 9490   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:35:36,466-Speed 3365.62 samples/sec   Loss 13.4381   LearningRate 0.0925   Epoch: 0   Global Step: 9500   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:35:39,494-Speed 3383.26 samples/sec   Loss 13.5684   LearningRate 0.0925   Epoch: 0   Global Step: 9510   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:35:42,494-Speed 3414.88 samples/sec   Loss 13.4829   LearningRate 0.0925   Epoch: 0   Global Step: 9520   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:35:45,502-Speed 3404.52 samples/sec   Loss 13.4927   LearningRate 0.0925   Epoch: 0   Global Step: 9530   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:35:48,536-Speed 3376.48 samples/sec   Loss 13.5491   LearningRate 0.0925   Epoch: 0   Global Step: 9540   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:35:51,610-Speed 3332.01 samples/sec   Loss 13.4524   LearningRate 0.0925   Epoch: 0   Global Step: 9550   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:35:54,627-Speed 3396.17 samples/sec   Loss 13.6139   LearningRate 0.0925   Epoch: 0   Global Step: 9560   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:35:57,627-Speed 3414.20 samples/sec   Loss 13.2932   LearningRate 0.0924   Epoch: 0   Global Step: 9570   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:36:00,644-Speed 3396.13 samples/sec   Loss 13.6552   LearningRate 0.0924   Epoch: 0   Global Step: 9580   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:36:03,695-Speed 3356.35 samples/sec   Loss 13.5357   LearningRate 0.0924   Epoch: 0   Global Step: 9590   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:36:06,751-Speed 3352.27 samples/sec   Loss 13.6083   LearningRate 0.0924   Epoch: 0   Global Step: 9600   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:36:09,763-Speed 3400.76 samples/sec   Loss 13.5619   LearningRate 0.0924   Epoch: 0   Global Step: 9610   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:36:12,780-Speed 3395.71 samples/sec   Loss 13.5152   LearningRate 0.0924   Epoch: 0   Global Step: 9620   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:36:15,791-Speed 3401.90 samples/sec   Loss 13.4892   LearningRate 0.0924   Epoch: 0   Global Step: 9630   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:36:18,846-Speed 3352.95 samples/sec   Loss 13.4938   LearningRate 0.0924   Epoch: 0   Global Step: 9640   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:36:21,856-Speed 3401.84 samples/sec   Loss 13.4456   LearningRate 0.0924   Epoch: 0   Global Step: 9650   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:36:24,946-Speed 3315.52 samples/sec   Loss 13.3534   LearningRate 0.0924   Epoch: 0   Global Step: 9660   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:36:28,042-Speed 3308.09 samples/sec   Loss 13.2739   LearningRate 0.0924   Epoch: 0   Global Step: 9670   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:36:31,086-Speed 3365.35 samples/sec   Loss 13.3646   LearningRate 0.0924   Epoch: 0   Global Step: 9680   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:36:34,107-Speed 3390.92 samples/sec   Loss 13.3759   LearningRate 0.0924   Epoch: 0   Global Step: 9690   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:36:37,141-Speed 3376.31 samples/sec   Loss 13.3094   LearningRate 0.0923   Epoch: 0   Global Step: 9700   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:36:40,139-Speed 3417.05 samples/sec   Loss 13.4222   LearningRate 0.0923   Epoch: 0   Global Step: 9710   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:36:43,164-Speed 3386.10 samples/sec   Loss 13.3745   LearningRate 0.0923   Epoch: 0   Global Step: 9720   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:36:46,183-Speed 3392.58 samples/sec   Loss 13.3955   LearningRate 0.0923   Epoch: 0   Global Step: 9730   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:36:49,185-Speed 3412.19 samples/sec   Loss 13.4392   LearningRate 0.0923   Epoch: 0   Global Step: 9740   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:36:52,246-Speed 3347.47 samples/sec   Loss 13.3140   LearningRate 0.0923   Epoch: 0   Global Step: 9750   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:36:55,281-Speed 3374.43 samples/sec   Loss 13.4611   LearningRate 0.0923   Epoch: 0   Global Step: 9760   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:36:58,342-Speed 3346.51 samples/sec   Loss 13.3232   LearningRate 0.0923   Epoch: 0   Global Step: 9770   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:37:01,401-Speed 3348.88 samples/sec   Loss 13.3496   LearningRate 0.0923   Epoch: 0   Global Step: 9780   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:37:04,460-Speed 3348.68 samples/sec   Loss 13.2745   LearningRate 0.0923   Epoch: 0   Global Step: 9790   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:37:07,524-Speed 3342.35 samples/sec   Loss 13.3589   LearningRate 0.0923   Epoch: 0   Global Step: 9800   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:37:10,590-Speed 3340.97 samples/sec   Loss 13.3502   LearningRate 0.0923   Epoch: 0   Global Step: 9810   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:37:13,646-Speed 3352.53 samples/sec   Loss 13.3608   LearningRate 0.0923   Epoch: 0   Global Step: 9820   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:37:16,676-Speed 3380.42 samples/sec   Loss 13.4085   LearningRate 0.0922   Epoch: 0   Global Step: 9830   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:37:19,694-Speed 3393.22 samples/sec   Loss 13.4087   LearningRate 0.0922   Epoch: 0   Global Step: 9840   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:37:22,730-Speed 3374.63 samples/sec   Loss 13.2685   LearningRate 0.0922   Epoch: 0   Global Step: 9850   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:37:25,810-Speed 3325.83 samples/sec   Loss 13.3202   LearningRate 0.0922   Epoch: 0   Global Step: 9860   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:37:28,908-Speed 3305.77 samples/sec   Loss 13.2487   LearningRate 0.0922   Epoch: 0   Global Step: 9870   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:37:31,983-Speed 3331.80 samples/sec   Loss 13.1921   LearningRate 0.0922   Epoch: 0   Global Step: 9880   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:37:35,004-Speed 3390.62 samples/sec   Loss 13.1378   LearningRate 0.0922   Epoch: 0   Global Step: 9890   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:37:38,071-Speed 3339.52 samples/sec   Loss 13.2293   LearningRate 0.0922   Epoch: 0   Global Step: 9900   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:37:41,254-Speed 3218.43 samples/sec   Loss 13.2845   LearningRate 0.0922   Epoch: 0   Global Step: 9910   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:37:44,292-Speed 3371.73 samples/sec   Loss 13.0277   LearningRate 0.0922   Epoch: 0   Global Step: 9920   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:37:47,322-Speed 3380.41 samples/sec   Loss 13.3413   LearningRate 0.0922   Epoch: 0   Global Step: 9930   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:37:50,341-Speed 3393.41 samples/sec   Loss 13.1154   LearningRate 0.0922   Epoch: 0   Global Step: 9940   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:37:53,389-Speed 3360.33 samples/sec   Loss 13.3166   LearningRate 0.0921   Epoch: 0   Global Step: 9950   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:37:56,467-Speed 3327.81 samples/sec   Loss 13.2109   LearningRate 0.0921   Epoch: 0   Global Step: 9960   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:37:59,511-Speed 3365.16 samples/sec   Loss 13.3490   LearningRate 0.0921   Epoch: 0   Global Step: 9970   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:38:02,614-Speed 3300.79 samples/sec   Loss 13.1495   LearningRate 0.0921   Epoch: 0   Global Step: 9980   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:38:05,706-Speed 3313.51 samples/sec   Loss 12.9924   LearningRate 0.0921   Epoch: 0   Global Step: 9990   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:38:08,732-Speed 3384.44 samples/sec   Loss 13.2544   LearningRate 0.0921   Epoch: 0   Global Step: 10000   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:38:11,744-Speed 3401.25 samples/sec   Loss 13.0429   LearningRate 0.0921   Epoch: 0   Global Step: 10010   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:38:14,815-Speed 3335.32 samples/sec   Loss 13.1700   LearningRate 0.0921   Epoch: 0   Global Step: 10020   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:38:17,883-Speed 3338.28 samples/sec   Loss 13.1067   LearningRate 0.0921   Epoch: 0   Global Step: 10030   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:38:20,923-Speed 3369.49 samples/sec   Loss 12.9932   LearningRate 0.0921   Epoch: 0   Global Step: 10040   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:38:23,954-Speed 3380.39 samples/sec   Loss 13.1945   LearningRate 0.0921   Epoch: 0   Global Step: 10050   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:38:27,040-Speed 3318.41 samples/sec   Loss 13.1891   LearningRate 0.0921   Epoch: 0   Global Step: 10060   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:38:30,136-Speed 3308.74 samples/sec   Loss 13.0342   LearningRate 0.0921   Epoch: 0   Global Step: 10070   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:38:33,163-Speed 3384.00 samples/sec   Loss 13.1553   LearningRate 0.0920   Epoch: 0   Global Step: 10080   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:38:36,202-Speed 3370.91 samples/sec   Loss 13.0028   LearningRate 0.0920   Epoch: 0   Global Step: 10090   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:38:39,231-Speed 3381.34 samples/sec   Loss 13.2067   LearningRate 0.0920   Epoch: 0   Global Step: 10100   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:38:42,316-Speed 3320.59 samples/sec   Loss 13.3023   LearningRate 0.0920   Epoch: 0   Global Step: 10110   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:38:45,349-Speed 3376.41 samples/sec   Loss 12.9746   LearningRate 0.0920   Epoch: 0   Global Step: 10120   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:38:48,361-Speed 3401.15 samples/sec   Loss 12.9997   LearningRate 0.0920   Epoch: 0   Global Step: 10130   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:38:51,403-Speed 3367.38 samples/sec   Loss 12.8941   LearningRate 0.0920   Epoch: 0   Global Step: 10140   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:38:54,407-Speed 3409.84 samples/sec   Loss 12.9674   LearningRate 0.0920   Epoch: 0   Global Step: 10150   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:38:57,415-Speed 3405.56 samples/sec   Loss 13.0863   LearningRate 0.0920   Epoch: 0   Global Step: 10160   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:39:00,451-Speed 3373.54 samples/sec   Loss 12.9799   LearningRate 0.0920   Epoch: 0   Global Step: 10170   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:39:03,521-Speed 3337.09 samples/sec   Loss 13.0145   LearningRate 0.0920   Epoch: 0   Global Step: 10180   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:39:06,658-Speed 3265.02 samples/sec   Loss 13.0330   LearningRate 0.0920   Epoch: 0   Global Step: 10190   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:39:09,726-Speed 3338.76 samples/sec   Loss 13.0461   LearningRate 0.0920   Epoch: 0   Global Step: 10200   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:39:12,759-Speed 3376.55 samples/sec   Loss 13.1437   LearningRate 0.0919   Epoch: 0   Global Step: 10210   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:39:15,798-Speed 3371.05 samples/sec   Loss 13.0921   LearningRate 0.0919   Epoch: 0   Global Step: 10220   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:39:18,913-Speed 3288.67 samples/sec   Loss 12.9892   LearningRate 0.0919   Epoch: 0   Global Step: 10230   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:39:21,927-Speed 3398.16 samples/sec   Loss 13.0811   LearningRate 0.0919   Epoch: 0   Global Step: 10240   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:39:24,972-Speed 3364.29 samples/sec   Loss 12.9249   LearningRate 0.0919   Epoch: 0   Global Step: 10250   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:39:28,029-Speed 3350.33 samples/sec   Loss 12.9698   LearningRate 0.0919   Epoch: 0   Global Step: 10260   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:39:31,083-Speed 3354.62 samples/sec   Loss 12.8211   LearningRate 0.0919   Epoch: 0   Global Step: 10270   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:39:34,102-Speed 3392.87 samples/sec   Loss 12.8753   LearningRate 0.0919   Epoch: 0   Global Step: 10280   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:39:37,181-Speed 3326.46 samples/sec   Loss 12.8056   LearningRate 0.0919   Epoch: 0   Global Step: 10290   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:39:40,202-Speed 3390.86 samples/sec   Loss 12.9527   LearningRate 0.0919   Epoch: 0   Global Step: 10300   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:39:43,224-Speed 3390.10 samples/sec   Loss 12.7326   LearningRate 0.0919   Epoch: 0   Global Step: 10310   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:39:46,284-Speed 3347.09 samples/sec   Loss 12.8792   LearningRate 0.0919   Epoch: 0   Global Step: 10320   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:39:49,377-Speed 3312.36 samples/sec   Loss 12.9665   LearningRate 0.0919   Epoch: 0   Global Step: 10330   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:39:52,378-Speed 3412.65 samples/sec   Loss 12.9085   LearningRate 0.0918   Epoch: 0   Global Step: 10340   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:39:55,439-Speed 3346.20 samples/sec   Loss 12.8804   LearningRate 0.0918   Epoch: 0   Global Step: 10350   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:39:58,485-Speed 3363.37 samples/sec   Loss 12.8597   LearningRate 0.0918   Epoch: 0   Global Step: 10360   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:40:01,507-Speed 3389.06 samples/sec   Loss 12.7877   LearningRate 0.0918   Epoch: 0   Global Step: 10370   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:40:04,520-Speed 3400.16 samples/sec   Loss 12.7994   LearningRate 0.0918   Epoch: 0   Global Step: 10380   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:40:07,515-Speed 3420.37 samples/sec   Loss 12.8250   LearningRate 0.0918   Epoch: 0   Global Step: 10390   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:40:10,530-Speed 3397.87 samples/sec   Loss 12.8148   LearningRate 0.0918   Epoch: 0   Global Step: 10400   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:40:13,592-Speed 3345.31 samples/sec   Loss 12.7639   LearningRate 0.0918   Epoch: 0   Global Step: 10410   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:40:16,610-Speed 3393.98 samples/sec   Loss 12.9037   LearningRate 0.0918   Epoch: 0   Global Step: 10420   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 02:40:19,626-Speed 3395.63 samples/sec   Loss 12.7277   LearningRate 0.0918   Epoch: 0   Global Step: 10430   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 02:40:22,654-Speed 3383.34 samples/sec   Loss 12.8476   LearningRate 0.0918   Epoch: 0   Global Step: 10440   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 02:40:25,673-Speed 3392.95 samples/sec   Loss 12.8395   LearningRate 0.0918   Epoch: 0   Global Step: 10450   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 02:40:28,741-Speed 3338.89 samples/sec   Loss 12.6736   LearningRate 0.0918   Epoch: 0   Global Step: 10460   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 02:40:31,789-Speed 3360.55 samples/sec   Loss 12.8135   LearningRate 0.0917   Epoch: 0   Global Step: 10470   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 02:40:34,829-Speed 3369.42 samples/sec   Loss 12.9076   LearningRate 0.0917   Epoch: 0   Global Step: 10480   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 02:40:37,867-Speed 3371.77 samples/sec   Loss 12.8452   LearningRate 0.0917   Epoch: 0   Global Step: 10490   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 02:40:40,872-Speed 3408.70 samples/sec   Loss 12.7964   LearningRate 0.0917   Epoch: 0   Global Step: 10500   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 02:40:43,916-Speed 3365.38 samples/sec   Loss 12.8333   LearningRate 0.0917   Epoch: 0   Global Step: 10510   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 02:40:46,970-Speed 3353.76 samples/sec   Loss 12.6897   LearningRate 0.0917   Epoch: 0   Global Step: 10520   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:40:50,033-Speed 3344.59 samples/sec   Loss 12.6303   LearningRate 0.0917   Epoch: 0   Global Step: 10530   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:40:53,095-Speed 3344.82 samples/sec   Loss 12.9099   LearningRate 0.0917   Epoch: 0   Global Step: 10540   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:40:56,126-Speed 3379.73 samples/sec   Loss 12.6825   LearningRate 0.0917   Epoch: 0   Global Step: 10550   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:40:59,120-Speed 3421.55 samples/sec   Loss 12.8472   LearningRate 0.0917   Epoch: 0   Global Step: 10560   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:41:02,136-Speed 3396.58 samples/sec   Loss 12.8634   LearningRate 0.0917   Epoch: 0   Global Step: 10570   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:41:05,261-Speed 3278.21 samples/sec   Loss 12.8268   LearningRate 0.0917   Epoch: 0   Global Step: 10580   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:41:08,322-Speed 3346.42 samples/sec   Loss 12.7065   LearningRate 0.0917   Epoch: 0   Global Step: 10590   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:41:11,355-Speed 3377.06 samples/sec   Loss 12.6862   LearningRate 0.0916   Epoch: 0   Global Step: 10600   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:41:14,416-Speed 3345.78 samples/sec   Loss 12.6418   LearningRate 0.0916   Epoch: 0   Global Step: 10610   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:41:17,460-Speed 3365.26 samples/sec   Loss 12.6152   LearningRate 0.0916   Epoch: 0   Global Step: 10620   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:41:20,472-Speed 3401.35 samples/sec   Loss 12.7147   LearningRate 0.0916   Epoch: 0   Global Step: 10630   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:41:23,501-Speed 3381.18 samples/sec   Loss 12.6051   LearningRate 0.0916   Epoch: 0   Global Step: 10640   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:41:26,562-Speed 3346.30 samples/sec   Loss 12.8186   LearningRate 0.0916   Epoch: 0   Global Step: 10650   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:41:29,603-Speed 3369.06 samples/sec   Loss 12.7358   LearningRate 0.0916   Epoch: 0   Global Step: 10660   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:41:32,614-Speed 3401.65 samples/sec   Loss 12.5729   LearningRate 0.0916   Epoch: 0   Global Step: 10670   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:41:35,713-Speed 3305.91 samples/sec   Loss 12.6684   LearningRate 0.0916   Epoch: 0   Global Step: 10680   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:41:38,811-Speed 3306.11 samples/sec   Loss 12.7678   LearningRate 0.0916   Epoch: 0   Global Step: 10690   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:41:41,880-Speed 3338.75 samples/sec   Loss 12.6509   LearningRate 0.0916   Epoch: 0   Global Step: 10700   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:41:44,924-Speed 3364.44 samples/sec   Loss 12.8302   LearningRate 0.0916   Epoch: 0   Global Step: 10710   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:41:47,987-Speed 3344.25 samples/sec   Loss 12.6608   LearningRate 0.0916   Epoch: 0   Global Step: 10720   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:41:51,096-Speed 3295.20 samples/sec   Loss 12.7148   LearningRate 0.0915   Epoch: 0   Global Step: 10730   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:41:54,169-Speed 3333.30 samples/sec   Loss 12.5626   LearningRate 0.0915   Epoch: 0   Global Step: 10740   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:41:57,206-Speed 3372.93 samples/sec   Loss 12.5629   LearningRate 0.0915   Epoch: 0   Global Step: 10750   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:42:00,218-Speed 3399.81 samples/sec   Loss 12.4831   LearningRate 0.0915   Epoch: 0   Global Step: 10760   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:42:03,308-Speed 3314.82 samples/sec   Loss 12.4265   LearningRate 0.0915   Epoch: 0   Global Step: 10770   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:42:06,406-Speed 3307.22 samples/sec   Loss 12.7082   LearningRate 0.0915   Epoch: 0   Global Step: 10780   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:42:09,418-Speed 3401.08 samples/sec   Loss 12.7474   LearningRate 0.0915   Epoch: 0   Global Step: 10790   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:42:12,416-Speed 3416.26 samples/sec   Loss 12.5879   LearningRate 0.0915   Epoch: 0   Global Step: 10800   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:42:15,466-Speed 3358.86 samples/sec   Loss 12.4613   LearningRate 0.0915   Epoch: 0   Global Step: 10810   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:42:18,521-Speed 3352.83 samples/sec   Loss 12.6699   LearningRate 0.0915   Epoch: 0   Global Step: 10820   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:42:21,531-Speed 3402.55 samples/sec   Loss 12.6413   LearningRate 0.0915   Epoch: 0   Global Step: 10830   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:42:24,563-Speed 3378.86 samples/sec   Loss 12.5891   LearningRate 0.0915   Epoch: 0   Global Step: 10840   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:42:27,651-Speed 3317.52 samples/sec   Loss 12.3438   LearningRate 0.0915   Epoch: 0   Global Step: 10850   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:42:30,676-Speed 3385.82 samples/sec   Loss 12.6449   LearningRate 0.0914   Epoch: 0   Global Step: 10860   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:42:33,674-Speed 3417.16 samples/sec   Loss 12.7317   LearningRate 0.0914   Epoch: 0   Global Step: 10870   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:42:36,701-Speed 3383.58 samples/sec   Loss 12.6292   LearningRate 0.0914   Epoch: 0   Global Step: 10880   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:42:39,836-Speed 3267.67 samples/sec   Loss 12.5438   LearningRate 0.0914   Epoch: 0   Global Step: 10890   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:42:42,930-Speed 3310.90 samples/sec   Loss 12.4986   LearningRate 0.0914   Epoch: 0   Global Step: 10900   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:42:45,951-Speed 3390.62 samples/sec   Loss 12.4127   LearningRate 0.0914   Epoch: 0   Global Step: 10910   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:42:49,001-Speed 3358.22 samples/sec   Loss 12.5488   LearningRate 0.0914   Epoch: 0   Global Step: 10920   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:42:52,021-Speed 3392.07 samples/sec   Loss 12.6137   LearningRate 0.0914   Epoch: 0   Global Step: 10930   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:42:55,061-Speed 3370.30 samples/sec   Loss 12.6558   LearningRate 0.0914   Epoch: 0   Global Step: 10940   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:42:58,075-Speed 3398.05 samples/sec   Loss 12.4251   LearningRate 0.0914   Epoch: 0   Global Step: 10950   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:43:01,073-Speed 3416.84 samples/sec   Loss 12.5444   LearningRate 0.0914   Epoch: 0   Global Step: 10960   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:43:04,069-Speed 3419.90 samples/sec   Loss 12.4616   LearningRate 0.0914   Epoch: 0   Global Step: 10970   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:43:07,121-Speed 3356.35 samples/sec   Loss 12.5369   LearningRate 0.0914   Epoch: 0   Global Step: 10980   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:43:10,168-Speed 3362.23 samples/sec   Loss 12.4454   LearningRate 0.0913   Epoch: 0   Global Step: 10990   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:43:13,202-Speed 3375.45 samples/sec   Loss 12.4451   LearningRate 0.0913   Epoch: 0   Global Step: 11000   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:43:16,241-Speed 3370.96 samples/sec   Loss 12.4587   LearningRate 0.0913   Epoch: 0   Global Step: 11010   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 02:43:19,336-Speed 3309.25 samples/sec   Loss 12.3741   LearningRate 0.0913   Epoch: 0   Global Step: 11020   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 02:43:22,382-Speed 3362.80 samples/sec   Loss 12.5373   LearningRate 0.0913   Epoch: 0   Global Step: 11030   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 02:43:25,482-Speed 3305.28 samples/sec   Loss 12.4917   LearningRate 0.0913   Epoch: 0   Global Step: 11040   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:43:28,520-Speed 3371.41 samples/sec   Loss 12.4217   LearningRate 0.0913   Epoch: 0   Global Step: 11050   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:43:31,569-Speed 3359.73 samples/sec   Loss 12.4356   LearningRate 0.0913   Epoch: 0   Global Step: 11060   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:43:34,622-Speed 3355.58 samples/sec   Loss 12.4785   LearningRate 0.0913   Epoch: 0   Global Step: 11070   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:43:37,693-Speed 3334.51 samples/sec   Loss 12.4360   LearningRate 0.0913   Epoch: 0   Global Step: 11080   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:43:40,777-Speed 3321.75 samples/sec   Loss 12.5667   LearningRate 0.0913   Epoch: 0   Global Step: 11090   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:43:43,797-Speed 3391.60 samples/sec   Loss 12.4319   LearningRate 0.0913   Epoch: 0   Global Step: 11100   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:43:46,845-Speed 3361.26 samples/sec   Loss 12.2174   LearningRate 0.0913   Epoch: 0   Global Step: 11110   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:43:49,895-Speed 3358.40 samples/sec   Loss 12.4540   LearningRate 0.0912   Epoch: 0   Global Step: 11120   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:43:52,978-Speed 3322.54 samples/sec   Loss 12.2866   LearningRate 0.0912   Epoch: 0   Global Step: 11130   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:43:56,045-Speed 3339.96 samples/sec   Loss 12.4140   LearningRate 0.0912   Epoch: 0   Global Step: 11140   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:43:59,055-Speed 3403.06 samples/sec   Loss 12.3055   LearningRate 0.0912   Epoch: 0   Global Step: 11150   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:44:02,117-Speed 3345.13 samples/sec   Loss 12.3647   LearningRate 0.0912   Epoch: 0   Global Step: 11160   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:44:05,193-Speed 3330.00 samples/sec   Loss 12.5069   LearningRate 0.0912   Epoch: 0   Global Step: 11170   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:44:08,276-Speed 3322.58 samples/sec   Loss 12.4159   LearningRate 0.0912   Epoch: 0   Global Step: 11180   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:44:11,321-Speed 3364.75 samples/sec   Loss 12.3496   LearningRate 0.0912   Epoch: 0   Global Step: 11190   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:44:14,346-Speed 3385.45 samples/sec   Loss 12.4869   LearningRate 0.0912   Epoch: 0   Global Step: 11200   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:44:17,422-Speed 3330.63 samples/sec   Loss 12.3013   LearningRate 0.0912   Epoch: 0   Global Step: 11210   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:44:20,500-Speed 3327.86 samples/sec   Loss 12.2409   LearningRate 0.0912   Epoch: 0   Global Step: 11220   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:44:23,519-Speed 3391.91 samples/sec   Loss 12.2163   LearningRate 0.0912   Epoch: 0   Global Step: 11230   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:44:26,605-Speed 3319.29 samples/sec   Loss 12.2648   LearningRate 0.0912   Epoch: 0   Global Step: 11240   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:44:29,748-Speed 3259.01 samples/sec   Loss 12.3235   LearningRate 0.0911   Epoch: 0   Global Step: 11250   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:44:32,787-Speed 3370.70 samples/sec   Loss 12.4295   LearningRate 0.0911   Epoch: 0   Global Step: 11260   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:44:35,875-Speed 3317.46 samples/sec   Loss 12.3528   LearningRate 0.0911   Epoch: 0   Global Step: 11270   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:44:38,940-Speed 3341.81 samples/sec   Loss 12.3846   LearningRate 0.0911   Epoch: 0   Global Step: 11280   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:44:41,987-Speed 3361.15 samples/sec   Loss 12.2269   LearningRate 0.0911   Epoch: 0   Global Step: 11290   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:44:45,004-Speed 3395.88 samples/sec   Loss 12.2652   LearningRate 0.0911   Epoch: 0   Global Step: 11300   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:44:48,005-Speed 3412.96 samples/sec   Loss 12.3527   LearningRate 0.0911   Epoch: 0   Global Step: 11310   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:44:51,033-Speed 3383.76 samples/sec   Loss 12.3049   LearningRate 0.0911   Epoch: 0   Global Step: 11320   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:44:54,106-Speed 3333.02 samples/sec   Loss 12.4169   LearningRate 0.0911   Epoch: 0   Global Step: 11330   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:44:57,149-Speed 3366.11 samples/sec   Loss 12.3095   LearningRate 0.0911   Epoch: 0   Global Step: 11340   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:45:00,233-Speed 3320.86 samples/sec   Loss 12.4101   LearningRate 0.0911   Epoch: 0   Global Step: 11350   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:45:03,325-Speed 3312.75 samples/sec   Loss 12.3854   LearningRate 0.0911   Epoch: 0   Global Step: 11360   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:45:06,358-Speed 3376.82 samples/sec   Loss 12.1836   LearningRate 0.0911   Epoch: 0   Global Step: 11370   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:45:09,391-Speed 3378.43 samples/sec   Loss 12.2166   LearningRate 0.0910   Epoch: 0   Global Step: 11380   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:45:12,451-Speed 3347.58 samples/sec   Loss 12.2006   LearningRate 0.0910   Epoch: 0   Global Step: 11390   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:45:15,511-Speed 3347.56 samples/sec   Loss 12.1845   LearningRate 0.0910   Epoch: 0   Global Step: 11400   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:45:18,514-Speed 3411.11 samples/sec   Loss 12.1684   LearningRate 0.0910   Epoch: 0   Global Step: 11410   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:45:21,535-Speed 3390.41 samples/sec   Loss 12.3038   LearningRate 0.0910   Epoch: 0   Global Step: 11420   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:45:24,626-Speed 3313.02 samples/sec   Loss 12.3296   LearningRate 0.0910   Epoch: 0   Global Step: 11430   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:45:27,709-Speed 3323.26 samples/sec   Loss 12.1004   LearningRate 0.0910   Epoch: 0   Global Step: 11440   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:45:30,745-Speed 3374.10 samples/sec   Loss 12.2602   LearningRate 0.0910   Epoch: 0   Global Step: 11450   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:45:33,788-Speed 3366.01 samples/sec   Loss 12.1567   LearningRate 0.0910   Epoch: 0   Global Step: 11460   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:45:36,867-Speed 3326.31 samples/sec   Loss 12.1310   LearningRate 0.0910   Epoch: 0   Global Step: 11470   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:45:39,904-Speed 3372.68 samples/sec   Loss 12.0516   LearningRate 0.0910   Epoch: 0   Global Step: 11480   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:45:43,009-Speed 3299.21 samples/sec   Loss 12.2255   LearningRate 0.0910   Epoch: 0   Global Step: 11490   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:45:46,017-Speed 3405.61 samples/sec   Loss 12.2035   LearningRate 0.0910   Epoch: 0   Global Step: 11500   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:45:49,058-Speed 3368.50 samples/sec   Loss 12.1446   LearningRate 0.0909   Epoch: 0   Global Step: 11510   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:45:52,088-Speed 3380.93 samples/sec   Loss 12.1153   LearningRate 0.0909   Epoch: 0   Global Step: 11520   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:45:55,142-Speed 3353.83 samples/sec   Loss 12.1489   LearningRate 0.0909   Epoch: 0   Global Step: 11530   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:45:58,153-Speed 3401.88 samples/sec   Loss 12.0952   LearningRate 0.0909   Epoch: 0   Global Step: 11540   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:46:01,244-Speed 3313.29 samples/sec   Loss 12.0489   LearningRate 0.0909   Epoch: 0   Global Step: 11550   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:46:04,274-Speed 3381.19 samples/sec   Loss 12.0744   LearningRate 0.0909   Epoch: 0   Global Step: 11560   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:46:07,354-Speed 3326.33 samples/sec   Loss 12.2023   LearningRate 0.0909   Epoch: 0   Global Step: 11570   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 02:46:10,339-Speed 3430.80 samples/sec   Loss 12.0662   LearningRate 0.0909   Epoch: 0   Global Step: 11580   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:46:13,351-Speed 3401.11 samples/sec   Loss 12.1832   LearningRate 0.0909   Epoch: 0   Global Step: 11590   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:46:16,357-Speed 3408.05 samples/sec   Loss 12.1418   LearningRate 0.0909   Epoch: 0   Global Step: 11600   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:46:19,383-Speed 3384.40 samples/sec   Loss 12.1346   LearningRate 0.0909   Epoch: 0   Global Step: 11610   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:46:22,411-Speed 3382.97 samples/sec   Loss 12.0716   LearningRate 0.0909   Epoch: 0   Global Step: 11620   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:46:25,459-Speed 3360.92 samples/sec   Loss 12.0258   LearningRate 0.0909   Epoch: 0   Global Step: 11630   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:46:28,502-Speed 3366.11 samples/sec   Loss 12.3459   LearningRate 0.0908   Epoch: 0   Global Step: 11640   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:46:31,508-Speed 3408.08 samples/sec   Loss 12.0037   LearningRate 0.0908   Epoch: 0   Global Step: 11650   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:46:34,516-Speed 3404.67 samples/sec   Loss 12.1782   LearningRate 0.0908   Epoch: 0   Global Step: 11660   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:46:37,563-Speed 3362.80 samples/sec   Loss 12.2566   LearningRate 0.0908   Epoch: 0   Global Step: 11670   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:46:40,592-Speed 3381.45 samples/sec   Loss 12.0585   LearningRate 0.0908   Epoch: 0   Global Step: 11680   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:46:43,687-Speed 3310.09 samples/sec   Loss 11.9055   LearningRate 0.0908   Epoch: 0   Global Step: 11690   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:46:46,748-Speed 3346.42 samples/sec   Loss 12.2612   LearningRate 0.0908   Epoch: 0   Global Step: 11700   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:46:49,773-Speed 3385.11 samples/sec   Loss 11.9351   LearningRate 0.0908   Epoch: 0   Global Step: 11710   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:46:52,810-Speed 3373.49 samples/sec   Loss 12.1648   LearningRate 0.0908   Epoch: 0   Global Step: 11720   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:46:55,828-Speed 3393.86 samples/sec   Loss 11.9914   LearningRate 0.0908   Epoch: 0   Global Step: 11730   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:46:58,868-Speed 3370.10 samples/sec   Loss 12.0221   LearningRate 0.0908   Epoch: 0   Global Step: 11740   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:47:01,963-Speed 3309.10 samples/sec   Loss 12.0515   LearningRate 0.0908   Epoch: 0   Global Step: 11750   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:47:05,032-Speed 3337.51 samples/sec   Loss 12.0796   LearningRate 0.0908   Epoch: 0   Global Step: 11760   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:47:08,082-Speed 3359.39 samples/sec   Loss 12.2045   LearningRate 0.0907   Epoch: 0   Global Step: 11770   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:47:11,133-Speed 3356.45 samples/sec   Loss 12.0043   LearningRate 0.0907   Epoch: 0   Global Step: 11780   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:47:14,164-Speed 3379.95 samples/sec   Loss 11.9407   LearningRate 0.0907   Epoch: 0   Global Step: 11790   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:47:17,209-Speed 3363.67 samples/sec   Loss 11.9971   LearningRate 0.0907   Epoch: 0   Global Step: 11800   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:47:20,278-Speed 3337.35 samples/sec   Loss 12.1802   LearningRate 0.0907   Epoch: 0   Global Step: 11810   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:47:23,319-Speed 3368.72 samples/sec   Loss 11.9067   LearningRate 0.0907   Epoch: 0   Global Step: 11820   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:47:26,366-Speed 3361.75 samples/sec   Loss 11.9783   LearningRate 0.0907   Epoch: 0   Global Step: 11830   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:47:29,443-Speed 3329.44 samples/sec   Loss 12.1310   LearningRate 0.0907   Epoch: 0   Global Step: 11840   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:47:32,480-Speed 3373.01 samples/sec   Loss 12.0366   LearningRate 0.0907   Epoch: 0   Global Step: 11850   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:47:35,516-Speed 3373.08 samples/sec   Loss 11.9550   LearningRate 0.0907   Epoch: 0   Global Step: 11860   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:47:38,555-Speed 3370.63 samples/sec   Loss 11.9759   LearningRate 0.0907   Epoch: 0   Global Step: 11870   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:47:41,588-Speed 3377.23 samples/sec   Loss 11.8795   LearningRate 0.0907   Epoch: 0   Global Step: 11880   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:47:44,627-Speed 3370.97 samples/sec   Loss 11.9290   LearningRate 0.0907   Epoch: 0   Global Step: 11890   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:47:47,659-Speed 3378.92 samples/sec   Loss 12.0518   LearningRate 0.0906   Epoch: 0   Global Step: 11900   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:47:50,666-Speed 3406.65 samples/sec   Loss 11.8489   LearningRate 0.0906   Epoch: 0   Global Step: 11910   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:47:53,726-Speed 3346.99 samples/sec   Loss 11.9167   LearningRate 0.0906   Epoch: 0   Global Step: 11920   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:47:56,787-Speed 3346.46 samples/sec   Loss 12.0988   LearningRate 0.0906   Epoch: 0   Global Step: 11930   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:47:59,820-Speed 3377.09 samples/sec   Loss 11.9657   LearningRate 0.0906   Epoch: 0   Global Step: 11940   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:48:02,845-Speed 3386.63 samples/sec   Loss 12.0401   LearningRate 0.0906   Epoch: 0   Global Step: 11950   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:48:05,882-Speed 3373.10 samples/sec   Loss 11.7161   LearningRate 0.0906   Epoch: 0   Global Step: 11960   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:48:08,888-Speed 3406.98 samples/sec   Loss 12.0802   LearningRate 0.0906   Epoch: 0   Global Step: 11970   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:48:11,953-Speed 3342.34 samples/sec   Loss 11.8197   LearningRate 0.0906   Epoch: 0   Global Step: 11980   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:48:15,050-Speed 3307.49 samples/sec   Loss 11.9196   LearningRate 0.0906   Epoch: 0   Global Step: 11990   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:48:18,078-Speed 3383.16 samples/sec   Loss 11.8651   LearningRate 0.0906   Epoch: 0   Global Step: 12000   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:48:21,105-Speed 3383.58 samples/sec   Loss 11.8956   LearningRate 0.0906   Epoch: 0   Global Step: 12010   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:48:24,130-Speed 3387.05 samples/sec   Loss 11.9605   LearningRate 0.0906   Epoch: 0   Global Step: 12020   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:48:27,172-Speed 3366.11 samples/sec   Loss 11.9452   LearningRate 0.0905   Epoch: 0   Global Step: 12030   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:48:30,246-Speed 3332.77 samples/sec   Loss 11.9377   LearningRate 0.0905   Epoch: 0   Global Step: 12040   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:48:33,271-Speed 3386.66 samples/sec   Loss 11.8886   LearningRate 0.0905   Epoch: 0   Global Step: 12050   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:48:36,324-Speed 3354.79 samples/sec   Loss 11.9496   LearningRate 0.0905   Epoch: 0   Global Step: 12060   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:48:39,357-Speed 3377.53 samples/sec   Loss 11.9651   LearningRate 0.0905   Epoch: 0   Global Step: 12070   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:48:42,366-Speed 3403.56 samples/sec   Loss 11.8474   LearningRate 0.0905   Epoch: 0   Global Step: 12080   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:48:45,398-Speed 3378.75 samples/sec   Loss 11.7866   LearningRate 0.0905   Epoch: 0   Global Step: 12090   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 02:48:48,470-Speed 3334.56 samples/sec   Loss 11.7046   LearningRate 0.0905   Epoch: 0   Global Step: 12100   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:48:51,518-Speed 3360.55 samples/sec   Loss 11.9334   LearningRate 0.0905   Epoch: 0   Global Step: 12110   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:48:54,580-Speed 3344.76 samples/sec   Loss 11.9109   LearningRate 0.0905   Epoch: 0   Global Step: 12120   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:48:57,612-Speed 3378.63 samples/sec   Loss 11.7555   LearningRate 0.0905   Epoch: 0   Global Step: 12130   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:49:00,663-Speed 3357.97 samples/sec   Loss 11.9257   LearningRate 0.0905   Epoch: 0   Global Step: 12140   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:49:03,715-Speed 3355.21 samples/sec   Loss 11.9934   LearningRate 0.0905   Epoch: 0   Global Step: 12150   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 02:49:06,749-Speed 3376.27 samples/sec   Loss 11.9373   LearningRate 0.0904   Epoch: 0   Global Step: 12160   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 02:49:09,748-Speed 3416.18 samples/sec   Loss 11.8597   LearningRate 0.0904   Epoch: 0   Global Step: 12170   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 02:49:12,842-Speed 3310.80 samples/sec   Loss 11.8656   LearningRate 0.0904   Epoch: 0   Global Step: 12180   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 02:49:15,909-Speed 3339.77 samples/sec   Loss 11.8597   LearningRate 0.0904   Epoch: 0   Global Step: 12190   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 02:49:19,000-Speed 3313.59 samples/sec   Loss 11.8205   LearningRate 0.0904   Epoch: 0   Global Step: 12200   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 02:49:22,012-Speed 3400.94 samples/sec   Loss 11.7629   LearningRate 0.0904   Epoch: 0   Global Step: 12210   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 02:49:25,035-Speed 3388.70 samples/sec   Loss 11.8096   LearningRate 0.0904   Epoch: 0   Global Step: 12220   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 02:49:28,068-Speed 3376.71 samples/sec   Loss 11.7176   LearningRate 0.0904   Epoch: 0   Global Step: 12230   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 02:49:31,143-Speed 3331.50 samples/sec   Loss 11.8382   LearningRate 0.0904   Epoch: 0   Global Step: 12240   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 02:49:34,200-Speed 3351.10 samples/sec   Loss 11.7799   LearningRate 0.0904   Epoch: 0   Global Step: 12250   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:49:37,271-Speed 3335.46 samples/sec   Loss 11.7010   LearningRate 0.0904   Epoch: 0   Global Step: 12260   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:49:40,357-Speed 3319.09 samples/sec   Loss 11.8429   LearningRate 0.0904   Epoch: 0   Global Step: 12270   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 02:49:43,396-Speed 3371.44 samples/sec   Loss 11.7596   LearningRate 0.0904   Epoch: 0   Global Step: 12280   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 02:49:46,414-Speed 3393.78 samples/sec   Loss 11.8649   LearningRate 0.0904   Epoch: 0   Global Step: 12290   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 02:49:49,487-Speed 3332.79 samples/sec   Loss 11.7960   LearningRate 0.0903   Epoch: 0   Global Step: 12300   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 02:49:52,518-Speed 3380.27 samples/sec   Loss 11.8904   LearningRate 0.0903   Epoch: 0   Global Step: 12310   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 02:49:55,535-Speed 3394.87 samples/sec   Loss 11.7275   LearningRate 0.0903   Epoch: 0   Global Step: 12320   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 02:49:58,551-Speed 3396.24 samples/sec   Loss 11.7850   LearningRate 0.0903   Epoch: 0   Global Step: 12330   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 02:50:01,617-Speed 3340.69 samples/sec   Loss 11.7975   LearningRate 0.0903   Epoch: 0   Global Step: 12340   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 02:50:04,670-Speed 3355.73 samples/sec   Loss 11.8215   LearningRate 0.0903   Epoch: 0   Global Step: 12350   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 02:50:07,708-Speed 3371.72 samples/sec   Loss 11.7039   LearningRate 0.0903   Epoch: 0   Global Step: 12360   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 02:50:10,769-Speed 3346.32 samples/sec   Loss 11.7376   LearningRate 0.0903   Epoch: 0   Global Step: 12370   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:50:13,815-Speed 3362.89 samples/sec   Loss 11.8371   LearningRate 0.0903   Epoch: 0   Global Step: 12380   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:50:16,867-Speed 3356.19 samples/sec   Loss 11.7800   LearningRate 0.0903   Epoch: 0   Global Step: 12390   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:50:19,908-Speed 3368.64 samples/sec   Loss 11.8435   LearningRate 0.0903   Epoch: 0   Global Step: 12400   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:50:23,127-Speed 3182.06 samples/sec   Loss 11.8139   LearningRate 0.0903   Epoch: 0   Global Step: 12410   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:50:26,157-Speed 3381.18 samples/sec   Loss 11.6703   LearningRate 0.0903   Epoch: 0   Global Step: 12420   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:50:57,550-Speed 326.20 samples/sec   Loss 10.1249   LearningRate 0.0902   Epoch: 1   Global Step: 12430   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:51:00,758-Speed 3193.70 samples/sec   Loss 9.9571   LearningRate 0.0902   Epoch: 1   Global Step: 12440   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:51:03,785-Speed 3384.11 samples/sec   Loss 9.7992   LearningRate 0.0902   Epoch: 1   Global Step: 12450   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:51:06,813-Speed 3382.44 samples/sec   Loss 9.9050   LearningRate 0.0902   Epoch: 1   Global Step: 12460   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:51:09,825-Speed 3400.78 samples/sec   Loss 9.7789   LearningRate 0.0902   Epoch: 1   Global Step: 12470   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:51:12,920-Speed 3309.75 samples/sec   Loss 9.7056   LearningRate 0.0902   Epoch: 1   Global Step: 12480   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:51:15,987-Speed 3340.69 samples/sec   Loss 9.7670   LearningRate 0.0902   Epoch: 1   Global Step: 12490   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:51:19,075-Speed 3316.73 samples/sec   Loss 9.7535   LearningRate 0.0902   Epoch: 1   Global Step: 12500   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:51:22,107-Speed 3378.81 samples/sec   Loss 9.8595   LearningRate 0.0902   Epoch: 1   Global Step: 12510   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:51:25,171-Speed 3342.85 samples/sec   Loss 9.8027   LearningRate 0.0902   Epoch: 1   Global Step: 12520   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:51:28,256-Speed 3321.07 samples/sec   Loss 9.7478   LearningRate 0.0902   Epoch: 1   Global Step: 12530   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:51:31,264-Speed 3404.48 samples/sec   Loss 9.8406   LearningRate 0.0902   Epoch: 1   Global Step: 12540   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:51:34,266-Speed 3412.53 samples/sec   Loss 9.9273   LearningRate 0.0902   Epoch: 1   Global Step: 12550   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:51:37,365-Speed 3305.45 samples/sec   Loss 9.9473   LearningRate 0.0901   Epoch: 1   Global Step: 12560   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:51:40,426-Speed 3347.02 samples/sec   Loss 9.8740   LearningRate 0.0901   Epoch: 1   Global Step: 12570   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:51:43,514-Speed 3317.74 samples/sec   Loss 9.8959   LearningRate 0.0901   Epoch: 1   Global Step: 12580   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:51:46,562-Speed 3360.06 samples/sec   Loss 9.8226   LearningRate 0.0901   Epoch: 1   Global Step: 12590   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:51:49,598-Speed 3373.88 samples/sec   Loss 9.7825   LearningRate 0.0901   Epoch: 1   Global Step: 12600   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:51:52,653-Speed 3352.84 samples/sec   Loss 9.8327   LearningRate 0.0901   Epoch: 1   Global Step: 12610   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:51:55,693-Speed 3369.33 samples/sec   Loss 9.8523   LearningRate 0.0901   Epoch: 1   Global Step: 12620   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:51:59,319-Speed 2825.06 samples/sec   Loss 9.8580   LearningRate 0.0901   Epoch: 1   Global Step: 12630   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:52:02,352-Speed 3377.66 samples/sec   Loss 9.8868   LearningRate 0.0901   Epoch: 1   Global Step: 12640   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:52:05,418-Speed 3340.67 samples/sec   Loss 9.8720   LearningRate 0.0901   Epoch: 1   Global Step: 12650   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:52:08,424-Speed 3407.16 samples/sec   Loss 9.6508   LearningRate 0.0901   Epoch: 1   Global Step: 12660   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:52:11,523-Speed 3305.78 samples/sec   Loss 9.8061   LearningRate 0.0901   Epoch: 1   Global Step: 12670   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:52:14,531-Speed 3405.04 samples/sec   Loss 10.0826   LearningRate 0.0901   Epoch: 1   Global Step: 12680   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:52:17,590-Speed 3348.53 samples/sec   Loss 9.8642   LearningRate 0.0900   Epoch: 1   Global Step: 12690   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:52:20,615-Speed 3385.92 samples/sec   Loss 9.9068   LearningRate 0.0900   Epoch: 1   Global Step: 12700   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:52:23,654-Speed 3370.65 samples/sec   Loss 9.8238   LearningRate 0.0900   Epoch: 1   Global Step: 12710   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:52:26,812-Speed 3244.17 samples/sec   Loss 9.8204   LearningRate 0.0900   Epoch: 1   Global Step: 12720   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:52:29,864-Speed 3356.48 samples/sec   Loss 9.6903   LearningRate 0.0900   Epoch: 1   Global Step: 12730   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:52:32,939-Speed 3330.52 samples/sec   Loss 9.8839   LearningRate 0.0900   Epoch: 1   Global Step: 12740   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:52:36,040-Speed 3303.82 samples/sec   Loss 9.9688   LearningRate 0.0900   Epoch: 1   Global Step: 12750   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 02:52:39,103-Speed 3343.91 samples/sec   Loss 9.9751   LearningRate 0.0900   Epoch: 1   Global Step: 12760   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 02:52:42,173-Speed 3337.37 samples/sec   Loss 9.9271   LearningRate 0.0900   Epoch: 1   Global Step: 12770   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 02:52:45,183-Speed 3402.74 samples/sec   Loss 9.7691   LearningRate 0.0900   Epoch: 1   Global Step: 12780   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:52:48,227-Speed 3364.67 samples/sec   Loss 9.8620   LearningRate 0.0900   Epoch: 1   Global Step: 12790   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:52:51,331-Speed 3300.34 samples/sec   Loss 9.9402   LearningRate 0.0900   Epoch: 1   Global Step: 12800   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:52:54,383-Speed 3356.28 samples/sec   Loss 9.9675   LearningRate 0.0900   Epoch: 1   Global Step: 12810   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:52:57,407-Speed 3387.43 samples/sec   Loss 9.9601   LearningRate 0.0899   Epoch: 1   Global Step: 12820   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:53:00,433-Speed 3384.97 samples/sec   Loss 9.9712   LearningRate 0.0899   Epoch: 1   Global Step: 12830   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:53:03,497-Speed 3343.85 samples/sec   Loss 9.9455   LearningRate 0.0899   Epoch: 1   Global Step: 12840   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:53:06,547-Speed 3358.47 samples/sec   Loss 9.9507   LearningRate 0.0899   Epoch: 1   Global Step: 12850   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:53:09,561-Speed 3397.84 samples/sec   Loss 9.9747   LearningRate 0.0899   Epoch: 1   Global Step: 12860   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:53:12,610-Speed 3360.11 samples/sec   Loss 9.9862   LearningRate 0.0899   Epoch: 1   Global Step: 12870   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:53:15,640-Speed 3380.17 samples/sec   Loss 9.9928   LearningRate 0.0899   Epoch: 1   Global Step: 12880   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:53:18,666-Speed 3385.48 samples/sec   Loss 10.0239   LearningRate 0.0899   Epoch: 1   Global Step: 12890   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:53:21,673-Speed 3406.39 samples/sec   Loss 9.7993   LearningRate 0.0899   Epoch: 1   Global Step: 12900   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:53:24,717-Speed 3365.66 samples/sec   Loss 9.9540   LearningRate 0.0899   Epoch: 1   Global Step: 12910   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:53:27,742-Speed 3385.11 samples/sec   Loss 9.8881   LearningRate 0.0899   Epoch: 1   Global Step: 12920   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:53:30,798-Speed 3352.39 samples/sec   Loss 9.8813   LearningRate 0.0899   Epoch: 1   Global Step: 12930   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:53:33,826-Speed 3382.92 samples/sec   Loss 9.9039   LearningRate 0.0899   Epoch: 1   Global Step: 12940   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:53:36,938-Speed 3291.31 samples/sec   Loss 9.8866   LearningRate 0.0898   Epoch: 1   Global Step: 12950   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:53:40,059-Speed 3282.62 samples/sec   Loss 9.9019   LearningRate 0.0898   Epoch: 1   Global Step: 12960   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:53:43,166-Speed 3295.72 samples/sec   Loss 9.8631   LearningRate 0.0898   Epoch: 1   Global Step: 12970   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:53:46,215-Speed 3359.49 samples/sec   Loss 9.9203   LearningRate 0.0898   Epoch: 1   Global Step: 12980   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:53:49,298-Speed 3323.14 samples/sec   Loss 10.0089   LearningRate 0.0898   Epoch: 1   Global Step: 12990   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:53:52,381-Speed 3322.98 samples/sec   Loss 9.9257   LearningRate 0.0898   Epoch: 1   Global Step: 13000   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:53:55,436-Speed 3352.56 samples/sec   Loss 9.9727   LearningRate 0.0898   Epoch: 1   Global Step: 13010   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:53:58,523-Speed 3317.68 samples/sec   Loss 9.9306   LearningRate 0.0898   Epoch: 1   Global Step: 13020   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:54:01,577-Speed 3354.08 samples/sec   Loss 10.0393   LearningRate 0.0898   Epoch: 1   Global Step: 13030   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:54:04,642-Speed 3342.25 samples/sec   Loss 9.9127   LearningRate 0.0898   Epoch: 1   Global Step: 13040   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:54:07,670-Speed 3383.82 samples/sec   Loss 10.0274   LearningRate 0.0898   Epoch: 1   Global Step: 13050   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:54:10,691-Speed 3389.87 samples/sec   Loss 9.9538   LearningRate 0.0898   Epoch: 1   Global Step: 13060   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:54:13,760-Speed 3337.58 samples/sec   Loss 10.0113   LearningRate 0.0898   Epoch: 1   Global Step: 13070   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:54:16,842-Speed 3323.67 samples/sec   Loss 9.9796   LearningRate 0.0897   Epoch: 1   Global Step: 13080   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:54:19,872-Speed 3380.35 samples/sec   Loss 9.8919   LearningRate 0.0897   Epoch: 1   Global Step: 13090   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:54:22,919-Speed 3362.96 samples/sec   Loss 10.0215   LearningRate 0.0897   Epoch: 1   Global Step: 13100   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:54:25,997-Speed 3327.43 samples/sec   Loss 10.1007   LearningRate 0.0897   Epoch: 1   Global Step: 13110   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:54:29,051-Speed 3354.00 samples/sec   Loss 10.0498   LearningRate 0.0897   Epoch: 1   Global Step: 13120   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:54:32,058-Speed 3406.84 samples/sec   Loss 9.9810   LearningRate 0.0897   Epoch: 1   Global Step: 13130   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 02:54:35,067-Speed 3404.18 samples/sec   Loss 10.1064   LearningRate 0.0897   Epoch: 1   Global Step: 13140   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 02:54:38,075-Speed 3404.64 samples/sec   Loss 10.0257   LearningRate 0.0897   Epoch: 1   Global Step: 13150   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:54:41,127-Speed 3356.23 samples/sec   Loss 10.0386   LearningRate 0.0897   Epoch: 1   Global Step: 13160   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:54:44,150-Speed 3389.12 samples/sec   Loss 10.0075   LearningRate 0.0897   Epoch: 1   Global Step: 13170   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:54:47,147-Speed 3417.32 samples/sec   Loss 10.0902   LearningRate 0.0897   Epoch: 1   Global Step: 13180   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:54:50,171-Speed 3387.53 samples/sec   Loss 9.9751   LearningRate 0.0897   Epoch: 1   Global Step: 13190   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:54:53,221-Speed 3358.39 samples/sec   Loss 10.0921   LearningRate 0.0897   Epoch: 1   Global Step: 13200   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:54:56,237-Speed 3395.53 samples/sec   Loss 10.0470   LearningRate 0.0896   Epoch: 1   Global Step: 13210   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:54:59,247-Speed 3403.93 samples/sec   Loss 10.0618   LearningRate 0.0896   Epoch: 1   Global Step: 13220   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:55:02,259-Speed 3400.63 samples/sec   Loss 10.1569   LearningRate 0.0896   Epoch: 1   Global Step: 13230   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:55:05,272-Speed 3399.37 samples/sec   Loss 10.1316   LearningRate 0.0896   Epoch: 1   Global Step: 13240   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:55:08,324-Speed 3356.98 samples/sec   Loss 10.0519   LearningRate 0.0896   Epoch: 1   Global Step: 13250   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:55:11,430-Speed 3297.48 samples/sec   Loss 10.0208   LearningRate 0.0896   Epoch: 1   Global Step: 13260   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:55:14,471-Speed 3368.08 samples/sec   Loss 10.2316   LearningRate 0.0896   Epoch: 1   Global Step: 13270   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:55:17,535-Speed 3343.50 samples/sec   Loss 10.1318   LearningRate 0.0896   Epoch: 1   Global Step: 13280   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:55:20,558-Speed 3388.36 samples/sec   Loss 10.0415   LearningRate 0.0896   Epoch: 1   Global Step: 13290   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:55:23,626-Speed 3338.25 samples/sec   Loss 9.9797   LearningRate 0.0896   Epoch: 1   Global Step: 13300   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:55:26,624-Speed 3417.35 samples/sec   Loss 10.0599   LearningRate 0.0896   Epoch: 1   Global Step: 13310   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:55:29,750-Speed 3276.57 samples/sec   Loss 10.1217   LearningRate 0.0896   Epoch: 1   Global Step: 13320   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:55:32,815-Speed 3341.77 samples/sec   Loss 10.1054   LearningRate 0.0896   Epoch: 1   Global Step: 13330   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:55:35,874-Speed 3349.17 samples/sec   Loss 10.0688   LearningRate 0.0895   Epoch: 1   Global Step: 13340   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:55:38,911-Speed 3372.20 samples/sec   Loss 9.9507   LearningRate 0.0895   Epoch: 1   Global Step: 13350   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:55:41,971-Speed 3348.23 samples/sec   Loss 10.1096   LearningRate 0.0895   Epoch: 1   Global Step: 13360   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:55:44,982-Speed 3401.92 samples/sec   Loss 10.0131   LearningRate 0.0895   Epoch: 1   Global Step: 13370   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:55:48,064-Speed 3323.84 samples/sec   Loss 10.1405   LearningRate 0.0895   Epoch: 1   Global Step: 13380   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:55:51,114-Speed 3358.07 samples/sec   Loss 10.1100   LearningRate 0.0895   Epoch: 1   Global Step: 13390   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:55:54,215-Speed 3303.13 samples/sec   Loss 10.1454   LearningRate 0.0895   Epoch: 1   Global Step: 13400   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:55:57,234-Speed 3393.44 samples/sec   Loss 10.1947   LearningRate 0.0895   Epoch: 1   Global Step: 13410   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:56:00,254-Speed 3391.83 samples/sec   Loss 10.0970   LearningRate 0.0895   Epoch: 1   Global Step: 13420   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:56:03,360-Speed 3297.42 samples/sec   Loss 10.1928   LearningRate 0.0895   Epoch: 1   Global Step: 13430   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:56:06,461-Speed 3303.85 samples/sec   Loss 10.1126   LearningRate 0.0895   Epoch: 1   Global Step: 13440   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:56:09,477-Speed 3395.43 samples/sec   Loss 10.1083   LearningRate 0.0895   Epoch: 1   Global Step: 13450   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:56:12,532-Speed 3353.66 samples/sec   Loss 10.0576   LearningRate 0.0895   Epoch: 1   Global Step: 13460   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:56:15,639-Speed 3295.91 samples/sec   Loss 10.1203   LearningRate 0.0894   Epoch: 1   Global Step: 13470   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:56:18,678-Speed 3370.51 samples/sec   Loss 10.0982   LearningRate 0.0894   Epoch: 1   Global Step: 13480   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:56:21,724-Speed 3363.24 samples/sec   Loss 10.2324   LearningRate 0.0894   Epoch: 1   Global Step: 13490   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:56:24,755-Speed 3379.06 samples/sec   Loss 9.9822   LearningRate 0.0894   Epoch: 1   Global Step: 13500   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:56:27,768-Speed 3400.05 samples/sec   Loss 9.9879   LearningRate 0.0894   Epoch: 1   Global Step: 13510   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:56:30,853-Speed 3319.75 samples/sec   Loss 10.1077   LearningRate 0.0894   Epoch: 1   Global Step: 13520   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:56:33,855-Speed 3411.70 samples/sec   Loss 10.1903   LearningRate 0.0894   Epoch: 1   Global Step: 13530   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:56:36,858-Speed 3411.33 samples/sec   Loss 10.1178   LearningRate 0.0894   Epoch: 1   Global Step: 13540   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:56:39,904-Speed 3363.15 samples/sec   Loss 10.0807   LearningRate 0.0894   Epoch: 1   Global Step: 13550   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:56:42,956-Speed 3355.93 samples/sec   Loss 10.1764   LearningRate 0.0894   Epoch: 1   Global Step: 13560   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:56:45,994-Speed 3372.15 samples/sec   Loss 10.1327   LearningRate 0.0894   Epoch: 1   Global Step: 13570   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:56:48,996-Speed 3412.23 samples/sec   Loss 10.1625   LearningRate 0.0894   Epoch: 1   Global Step: 13580   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:56:52,068-Speed 3334.83 samples/sec   Loss 10.1301   LearningRate 0.0894   Epoch: 1   Global Step: 13590   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:56:55,140-Speed 3334.12 samples/sec   Loss 10.1555   LearningRate 0.0894   Epoch: 1   Global Step: 13600   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:56:58,157-Speed 3395.28 samples/sec   Loss 10.1935   LearningRate 0.0893   Epoch: 1   Global Step: 13610   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:57:01,164-Speed 3406.14 samples/sec   Loss 10.0903   LearningRate 0.0893   Epoch: 1   Global Step: 13620   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:57:04,248-Speed 3321.76 samples/sec   Loss 10.0839   LearningRate 0.0893   Epoch: 1   Global Step: 13630   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:57:07,256-Speed 3405.43 samples/sec   Loss 10.3108   LearningRate 0.0893   Epoch: 1   Global Step: 13640   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:57:10,254-Speed 3416.13 samples/sec   Loss 10.1591   LearningRate 0.0893   Epoch: 1   Global Step: 13650   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:57:13,332-Speed 3327.90 samples/sec   Loss 10.1506   LearningRate 0.0893   Epoch: 1   Global Step: 13660   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:57:16,403-Speed 3335.99 samples/sec   Loss 10.1664   LearningRate 0.0893   Epoch: 1   Global Step: 13670   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:57:19,437-Speed 3375.28 samples/sec   Loss 10.1646   LearningRate 0.0893   Epoch: 1   Global Step: 13680   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:57:22,440-Speed 3411.23 samples/sec   Loss 10.0410   LearningRate 0.0893   Epoch: 1   Global Step: 13690   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:57:25,548-Speed 3296.27 samples/sec   Loss 10.1659   LearningRate 0.0893   Epoch: 1   Global Step: 13700   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:57:28,592-Speed 3364.81 samples/sec   Loss 10.1605   LearningRate 0.0893   Epoch: 1   Global Step: 13710   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:57:31,658-Speed 3341.01 samples/sec   Loss 10.0468   LearningRate 0.0893   Epoch: 1   Global Step: 13720   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:57:34,658-Speed 3414.76 samples/sec   Loss 10.2328   LearningRate 0.0893   Epoch: 1   Global Step: 13730   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:57:37,725-Speed 3339.92 samples/sec   Loss 10.2348   LearningRate 0.0892   Epoch: 1   Global Step: 13740   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:57:40,835-Speed 3292.57 samples/sec   Loss 10.2378   LearningRate 0.0892   Epoch: 1   Global Step: 13750   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:57:43,867-Speed 3378.87 samples/sec   Loss 10.0615   LearningRate 0.0892   Epoch: 1   Global Step: 13760   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:57:46,889-Speed 3389.43 samples/sec   Loss 10.1832   LearningRate 0.0892   Epoch: 1   Global Step: 13770   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:57:49,970-Speed 3324.38 samples/sec   Loss 10.2257   LearningRate 0.0892   Epoch: 1   Global Step: 13780   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:57:53,057-Speed 3318.80 samples/sec   Loss 10.2686   LearningRate 0.0892   Epoch: 1   Global Step: 13790   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:57:56,160-Speed 3301.58 samples/sec   Loss 10.1043   LearningRate 0.0892   Epoch: 1   Global Step: 13800   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:57:59,154-Speed 3420.48 samples/sec   Loss 10.1802   LearningRate 0.0892   Epoch: 1   Global Step: 13810   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:58:02,170-Speed 3396.93 samples/sec   Loss 10.2677   LearningRate 0.0892   Epoch: 1   Global Step: 13820   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:58:05,240-Speed 3336.99 samples/sec   Loss 10.1921   LearningRate 0.0892   Epoch: 1   Global Step: 13830   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:58:08,249-Speed 3403.66 samples/sec   Loss 10.2963   LearningRate 0.0892   Epoch: 1   Global Step: 13840   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:58:11,279-Speed 3381.39 samples/sec   Loss 10.1482   LearningRate 0.0892   Epoch: 1   Global Step: 13850   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:58:14,308-Speed 3382.07 samples/sec   Loss 10.2864   LearningRate 0.0892   Epoch: 1   Global Step: 13860   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:58:17,317-Speed 3404.14 samples/sec   Loss 10.1644   LearningRate 0.0891   Epoch: 1   Global Step: 13870   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:58:20,330-Speed 3399.54 samples/sec   Loss 10.3866   LearningRate 0.0891   Epoch: 1   Global Step: 13880   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:58:23,334-Speed 3409.83 samples/sec   Loss 10.2615   LearningRate 0.0891   Epoch: 1   Global Step: 13890   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:58:26,367-Speed 3377.89 samples/sec   Loss 10.2310   LearningRate 0.0891   Epoch: 1   Global Step: 13900   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:58:29,386-Speed 3393.16 samples/sec   Loss 10.2300   LearningRate 0.0891   Epoch: 1   Global Step: 13910   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:58:32,417-Speed 3379.98 samples/sec   Loss 10.1876   LearningRate 0.0891   Epoch: 1   Global Step: 13920   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:58:35,420-Speed 3411.09 samples/sec   Loss 10.0571   LearningRate 0.0891   Epoch: 1   Global Step: 13930   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:58:38,451-Speed 3379.41 samples/sec   Loss 10.2094   LearningRate 0.0891   Epoch: 1   Global Step: 13940   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:58:41,467-Speed 3396.01 samples/sec   Loss 10.2267   LearningRate 0.0891   Epoch: 1   Global Step: 13950   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:58:44,476-Speed 3404.52 samples/sec   Loss 10.2696   LearningRate 0.0891   Epoch: 1   Global Step: 13960   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:58:47,496-Speed 3391.88 samples/sec   Loss 10.1447   LearningRate 0.0891   Epoch: 1   Global Step: 13970   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:58:50,611-Speed 3288.77 samples/sec   Loss 10.1094   LearningRate 0.0891   Epoch: 1   Global Step: 13980   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:58:53,674-Speed 3344.33 samples/sec   Loss 10.1038   LearningRate 0.0891   Epoch: 1   Global Step: 13990   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:58:56,667-Speed 3422.11 samples/sec   Loss 10.1262   LearningRate 0.0890   Epoch: 1   Global Step: 14000   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:58:59,688-Speed 3390.88 samples/sec   Loss 10.3112   LearningRate 0.0890   Epoch: 1   Global Step: 14010   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:59:02,692-Speed 3409.81 samples/sec   Loss 10.2551   LearningRate 0.0890   Epoch: 1   Global Step: 14020   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:59:05,781-Speed 3315.65 samples/sec   Loss 10.1968   LearningRate 0.0890   Epoch: 1   Global Step: 14030   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:59:08,790-Speed 3404.89 samples/sec   Loss 10.1082   LearningRate 0.0890   Epoch: 1   Global Step: 14040   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:59:11,920-Speed 3272.74 samples/sec   Loss 10.3176   LearningRate 0.0890   Epoch: 1   Global Step: 14050   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:59:14,943-Speed 3387.79 samples/sec   Loss 10.2013   LearningRate 0.0890   Epoch: 1   Global Step: 14060   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:59:18,034-Speed 3314.72 samples/sec   Loss 10.1706   LearningRate 0.0890   Epoch: 1   Global Step: 14070   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:59:21,042-Speed 3404.86 samples/sec   Loss 10.0774   LearningRate 0.0890   Epoch: 1   Global Step: 14080   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:59:24,136-Speed 3310.83 samples/sec   Loss 10.1528   LearningRate 0.0890   Epoch: 1   Global Step: 14090   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 02:59:27,178-Speed 3366.62 samples/sec   Loss 10.2464   LearningRate 0.0890   Epoch: 1   Global Step: 14100   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 02:59:30,223-Speed 3364.88 samples/sec   Loss 10.1824   LearningRate 0.0890   Epoch: 1   Global Step: 14110   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 02:59:33,236-Speed 3398.98 samples/sec   Loss 10.2032   LearningRate 0.0890   Epoch: 1   Global Step: 14120   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 02:59:36,284-Speed 3361.31 samples/sec   Loss 10.1896   LearningRate 0.0889   Epoch: 1   Global Step: 14130   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 02:59:39,342-Speed 3349.05 samples/sec   Loss 10.0750   LearningRate 0.0889   Epoch: 1   Global Step: 14140   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 02:59:42,445-Speed 3301.10 samples/sec   Loss 10.1409   LearningRate 0.0889   Epoch: 1   Global Step: 14150   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 02:59:45,473-Speed 3383.31 samples/sec   Loss 10.3200   LearningRate 0.0889   Epoch: 1   Global Step: 14160   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 02:59:48,616-Speed 3259.18 samples/sec   Loss 10.2157   LearningRate 0.0889   Epoch: 1   Global Step: 14170   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 02:59:51,728-Speed 3291.04 samples/sec   Loss 10.2167   LearningRate 0.0889   Epoch: 1   Global Step: 14180   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 02:59:54,777-Speed 3359.99 samples/sec   Loss 10.2105   LearningRate 0.0889   Epoch: 1   Global Step: 14190   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 02:59:57,828-Speed 3356.78 samples/sec   Loss 10.3305   LearningRate 0.0889   Epoch: 1   Global Step: 14200   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:00:00,905-Speed 3329.59 samples/sec   Loss 10.2161   LearningRate 0.0889   Epoch: 1   Global Step: 14210   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:00:03,922-Speed 3394.95 samples/sec   Loss 10.3057   LearningRate 0.0889   Epoch: 1   Global Step: 14220   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:00:06,921-Speed 3415.45 samples/sec   Loss 10.3241   LearningRate 0.0889   Epoch: 1   Global Step: 14230   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:00:09,917-Speed 3418.60 samples/sec   Loss 10.3498   LearningRate 0.0889   Epoch: 1   Global Step: 14240   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:00:12,981-Speed 3342.99 samples/sec   Loss 10.2199   LearningRate 0.0889   Epoch: 1   Global Step: 14250   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:00:16,056-Speed 3331.90 samples/sec   Loss 10.2588   LearningRate 0.0888   Epoch: 1   Global Step: 14260   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:00:19,076-Speed 3391.02 samples/sec   Loss 10.2217   LearningRate 0.0888   Epoch: 1   Global Step: 14270   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:00:22,096-Speed 3392.85 samples/sec   Loss 10.3678   LearningRate 0.0888   Epoch: 1   Global Step: 14280   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:00:25,137-Speed 3368.19 samples/sec   Loss 10.1835   LearningRate 0.0888   Epoch: 1   Global Step: 14290   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:00:28,173-Speed 3373.32 samples/sec   Loss 10.1455   LearningRate 0.0888   Epoch: 1   Global Step: 14300   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:00:31,209-Speed 3374.16 samples/sec   Loss 10.2944   LearningRate 0.0888   Epoch: 1   Global Step: 14310   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:00:34,226-Speed 3395.95 samples/sec   Loss 10.1883   LearningRate 0.0888   Epoch: 1   Global Step: 14320   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:00:37,272-Speed 3361.77 samples/sec   Loss 10.1424   LearningRate 0.0888   Epoch: 1   Global Step: 14330   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:00:40,288-Speed 3397.44 samples/sec   Loss 10.1891   LearningRate 0.0888   Epoch: 1   Global Step: 14340   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:00:43,333-Speed 3363.62 samples/sec   Loss 10.3423   LearningRate 0.0888   Epoch: 1   Global Step: 14350   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:00:46,417-Speed 3321.79 samples/sec   Loss 10.2224   LearningRate 0.0888   Epoch: 1   Global Step: 14360   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:00:49,512-Speed 3309.48 samples/sec   Loss 10.2652   LearningRate 0.0888   Epoch: 1   Global Step: 14370   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:00:52,534-Speed 3390.10 samples/sec   Loss 10.2227   LearningRate 0.0888   Epoch: 1   Global Step: 14380   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:00:55,651-Speed 3286.27 samples/sec   Loss 10.1913   LearningRate 0.0888   Epoch: 1   Global Step: 14390   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:00:58,660-Speed 3403.89 samples/sec   Loss 10.2150   LearningRate 0.0887   Epoch: 1   Global Step: 14400   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:01:01,687-Speed 3383.85 samples/sec   Loss 10.2038   LearningRate 0.0887   Epoch: 1   Global Step: 14410   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:01:04,766-Speed 3327.09 samples/sec   Loss 10.4094   LearningRate 0.0887   Epoch: 1   Global Step: 14420   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:01:07,846-Speed 3326.11 samples/sec   Loss 10.1751   LearningRate 0.0887   Epoch: 1   Global Step: 14430   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:01:10,841-Speed 3420.27 samples/sec   Loss 10.2826   LearningRate 0.0887   Epoch: 1   Global Step: 14440   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:01:13,900-Speed 3348.09 samples/sec   Loss 10.1649   LearningRate 0.0887   Epoch: 1   Global Step: 14450   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:01:16,994-Speed 3310.77 samples/sec   Loss 10.2125   LearningRate 0.0887   Epoch: 1   Global Step: 14460   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:01:19,987-Speed 3423.06 samples/sec   Loss 10.2554   LearningRate 0.0887   Epoch: 1   Global Step: 14470   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:01:23,021-Speed 3375.59 samples/sec   Loss 10.2210   LearningRate 0.0887   Epoch: 1   Global Step: 14480   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:01:26,063-Speed 3367.47 samples/sec   Loss 10.1647   LearningRate 0.0887   Epoch: 1   Global Step: 14490   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:01:29,094-Speed 3380.05 samples/sec   Loss 10.2298   LearningRate 0.0887   Epoch: 1   Global Step: 14500   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:01:32,116-Speed 3389.76 samples/sec   Loss 10.1696   LearningRate 0.0887   Epoch: 1   Global Step: 14510   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:01:35,137-Speed 3390.40 samples/sec   Loss 10.2873   LearningRate 0.0887   Epoch: 1   Global Step: 14520   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:01:38,167-Speed 3380.32 samples/sec   Loss 10.3623   LearningRate 0.0886   Epoch: 1   Global Step: 14530   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:01:41,171-Speed 3409.96 samples/sec   Loss 10.2610   LearningRate 0.0886   Epoch: 1   Global Step: 14540   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:01:44,231-Speed 3348.05 samples/sec   Loss 10.2780   LearningRate 0.0886   Epoch: 1   Global Step: 14550   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:01:47,263-Speed 3378.74 samples/sec   Loss 10.2845   LearningRate 0.0886   Epoch: 1   Global Step: 14560   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:01:50,339-Speed 3329.45 samples/sec   Loss 10.2873   LearningRate 0.0886   Epoch: 1   Global Step: 14570   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:01:53,418-Speed 3327.02 samples/sec   Loss 10.3081   LearningRate 0.0886   Epoch: 1   Global Step: 14580   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:01:56,478-Speed 3347.10 samples/sec   Loss 10.2989   LearningRate 0.0886   Epoch: 1   Global Step: 14590   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:01:59,506-Speed 3382.80 samples/sec   Loss 10.3583   LearningRate 0.0886   Epoch: 1   Global Step: 14600   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:02:02,562-Speed 3352.76 samples/sec   Loss 10.3106   LearningRate 0.0886   Epoch: 1   Global Step: 14610   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:02:05,644-Speed 3322.64 samples/sec   Loss 10.1864   LearningRate 0.0886   Epoch: 1   Global Step: 14620   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:02:08,687-Speed 3366.37 samples/sec   Loss 10.2108   LearningRate 0.0886   Epoch: 1   Global Step: 14630   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:02:11,749-Speed 3346.02 samples/sec   Loss 10.3418   LearningRate 0.0886   Epoch: 1   Global Step: 14640   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:02:14,805-Speed 3351.87 samples/sec   Loss 10.2877   LearningRate 0.0886   Epoch: 1   Global Step: 14650   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:02:17,887-Speed 3323.03 samples/sec   Loss 10.3846   LearningRate 0.0885   Epoch: 1   Global Step: 14660   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:02:20,927-Speed 3370.38 samples/sec   Loss 10.2861   LearningRate 0.0885   Epoch: 1   Global Step: 14670   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:02:23,968-Speed 3367.57 samples/sec   Loss 10.3300   LearningRate 0.0885   Epoch: 1   Global Step: 14680   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:02:27,044-Speed 3329.92 samples/sec   Loss 10.1424   LearningRate 0.0885   Epoch: 1   Global Step: 14690   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:02:30,056-Speed 3400.65 samples/sec   Loss 10.2704   LearningRate 0.0885   Epoch: 1   Global Step: 14700   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:02:33,108-Speed 3356.88 samples/sec   Loss 10.3005   LearningRate 0.0885   Epoch: 1   Global Step: 14710   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:02:36,145-Speed 3373.23 samples/sec   Loss 10.1777   LearningRate 0.0885   Epoch: 1   Global Step: 14720   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:02:39,191-Speed 3362.10 samples/sec   Loss 10.2784   LearningRate 0.0885   Epoch: 1   Global Step: 14730   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:02:42,259-Speed 3339.43 samples/sec   Loss 10.2599   LearningRate 0.0885   Epoch: 1   Global Step: 14740   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:02:45,288-Speed 3381.08 samples/sec   Loss 10.1637   LearningRate 0.0885   Epoch: 1   Global Step: 14750   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:02:48,365-Speed 3328.73 samples/sec   Loss 10.2520   LearningRate 0.0885   Epoch: 1   Global Step: 14760   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:02:51,476-Speed 3292.89 samples/sec   Loss 10.2530   LearningRate 0.0885   Epoch: 1   Global Step: 14770   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:02:54,530-Speed 3353.63 samples/sec   Loss 10.2616   LearningRate 0.0885   Epoch: 1   Global Step: 14780   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:02:57,533-Speed 3411.61 samples/sec   Loss 10.3973   LearningRate 0.0884   Epoch: 1   Global Step: 14790   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:03:00,557-Speed 3388.06 samples/sec   Loss 10.2556   LearningRate 0.0884   Epoch: 1   Global Step: 14800   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:03:03,602-Speed 3364.21 samples/sec   Loss 10.1874   LearningRate 0.0884   Epoch: 1   Global Step: 14810   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:03:06,662-Speed 3347.49 samples/sec   Loss 10.3055   LearningRate 0.0884   Epoch: 1   Global Step: 14820   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:03:09,686-Speed 3387.45 samples/sec   Loss 10.3328   LearningRate 0.0884   Epoch: 1   Global Step: 14830   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:03:12,756-Speed 3336.45 samples/sec   Loss 10.2133   LearningRate 0.0884   Epoch: 1   Global Step: 14840   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:03:15,839-Speed 3322.37 samples/sec   Loss 10.1728   LearningRate 0.0884   Epoch: 1   Global Step: 14850   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:03:18,893-Speed 3354.33 samples/sec   Loss 10.3326   LearningRate 0.0884   Epoch: 1   Global Step: 14860   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:03:21,906-Speed 3399.79 samples/sec   Loss 10.2999   LearningRate 0.0884   Epoch: 1   Global Step: 14870   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:03:24,935-Speed 3381.55 samples/sec   Loss 10.3785   LearningRate 0.0884   Epoch: 1   Global Step: 14880   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:03:27,993-Speed 3349.84 samples/sec   Loss 10.1486   LearningRate 0.0884   Epoch: 1   Global Step: 14890   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:03:31,061-Speed 3338.71 samples/sec   Loss 10.2041   LearningRate 0.0884   Epoch: 1   Global Step: 14900   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:03:34,104-Speed 3365.91 samples/sec   Loss 10.2141   LearningRate 0.0884   Epoch: 1   Global Step: 14910   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:03:37,162-Speed 3350.49 samples/sec   Loss 10.3192   LearningRate 0.0883   Epoch: 1   Global Step: 14920   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:03:40,257-Speed 3311.30 samples/sec   Loss 10.2153   LearningRate 0.0883   Epoch: 1   Global Step: 14930   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:03:43,326-Speed 3337.97 samples/sec   Loss 10.2284   LearningRate 0.0883   Epoch: 1   Global Step: 14940   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:03:46,342-Speed 3395.38 samples/sec   Loss 10.2223   LearningRate 0.0883   Epoch: 1   Global Step: 14950   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:03:49,388-Speed 3363.51 samples/sec   Loss 10.2949   LearningRate 0.0883   Epoch: 1   Global Step: 14960   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:03:52,498-Speed 3293.48 samples/sec   Loss 10.2119   LearningRate 0.0883   Epoch: 1   Global Step: 14970   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:03:55,551-Speed 3355.76 samples/sec   Loss 10.3233   LearningRate 0.0883   Epoch: 1   Global Step: 14980   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:03:58,597-Speed 3362.76 samples/sec   Loss 10.2305   LearningRate 0.0883   Epoch: 1   Global Step: 14990   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:04:01,689-Speed 3312.54 samples/sec   Loss 10.2978   LearningRate 0.0883   Epoch: 1   Global Step: 15000   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:04:04,728-Speed 3370.44 samples/sec   Loss 10.1676   LearningRate 0.0883   Epoch: 1   Global Step: 15010   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:04:07,759-Speed 3379.83 samples/sec   Loss 10.2325   LearningRate 0.0883   Epoch: 1   Global Step: 15020   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:04:10,803-Speed 3365.47 samples/sec   Loss 10.1828   LearningRate 0.0883   Epoch: 1   Global Step: 15030   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:04:13,845-Speed 3367.44 samples/sec   Loss 10.1812   LearningRate 0.0883   Epoch: 1   Global Step: 15040   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:04:16,852-Speed 3406.19 samples/sec   Loss 10.2228   LearningRate 0.0883   Epoch: 1   Global Step: 15050   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:04:19,900-Speed 3360.94 samples/sec   Loss 10.2770   LearningRate 0.0882   Epoch: 1   Global Step: 15060   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:04:22,903-Speed 3411.30 samples/sec   Loss 10.4429   LearningRate 0.0882   Epoch: 1   Global Step: 15070   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:04:25,912-Speed 3403.77 samples/sec   Loss 10.2028   LearningRate 0.0882   Epoch: 1   Global Step: 15080   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:04:28,966-Speed 3353.88 samples/sec   Loss 10.2797   LearningRate 0.0882   Epoch: 1   Global Step: 15090   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:04:32,046-Speed 3326.45 samples/sec   Loss 10.1818   LearningRate 0.0882   Epoch: 1   Global Step: 15100   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:04:35,038-Speed 3423.32 samples/sec   Loss 10.1534   LearningRate 0.0882   Epoch: 1   Global Step: 15110   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:04:38,085-Speed 3362.10 samples/sec   Loss 10.2732   LearningRate 0.0882   Epoch: 1   Global Step: 15120   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:04:41,119-Speed 3375.29 samples/sec   Loss 10.1318   LearningRate 0.0882   Epoch: 1   Global Step: 15130   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:04:44,116-Speed 3418.36 samples/sec   Loss 10.2502   LearningRate 0.0882   Epoch: 1   Global Step: 15140   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:04:47,109-Speed 3422.20 samples/sec   Loss 10.3078   LearningRate 0.0882   Epoch: 1   Global Step: 15150   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:04:50,170-Speed 3346.99 samples/sec   Loss 10.4196   LearningRate 0.0882   Epoch: 1   Global Step: 15160   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:04:53,196-Speed 3384.78 samples/sec   Loss 10.2420   LearningRate 0.0882   Epoch: 1   Global Step: 15170   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:04:56,237-Speed 3367.76 samples/sec   Loss 10.2390   LearningRate 0.0882   Epoch: 1   Global Step: 15180   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:04:59,268-Speed 3379.69 samples/sec   Loss 10.1995   LearningRate 0.0881   Epoch: 1   Global Step: 15190   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:05:02,312-Speed 3365.91 samples/sec   Loss 10.3231   LearningRate 0.0881   Epoch: 1   Global Step: 15200   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:05:05,339-Speed 3383.69 samples/sec   Loss 10.3229   LearningRate 0.0881   Epoch: 1   Global Step: 15210   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:05:08,370-Speed 3378.87 samples/sec   Loss 10.3240   LearningRate 0.0881   Epoch: 1   Global Step: 15220   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:05:11,394-Speed 3387.76 samples/sec   Loss 10.2410   LearningRate 0.0881   Epoch: 1   Global Step: 15230   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:05:14,457-Speed 3344.76 samples/sec   Loss 10.3423   LearningRate 0.0881   Epoch: 1   Global Step: 15240   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:05:17,499-Speed 3366.56 samples/sec   Loss 10.2559   LearningRate 0.0881   Epoch: 1   Global Step: 15250   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:05:20,536-Speed 3373.23 samples/sec   Loss 10.2011   LearningRate 0.0881   Epoch: 1   Global Step: 15260   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:05:23,562-Speed 3385.10 samples/sec   Loss 10.1502   LearningRate 0.0881   Epoch: 1   Global Step: 15270   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:05:26,566-Speed 3409.22 samples/sec   Loss 10.1602   LearningRate 0.0881   Epoch: 1   Global Step: 15280   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:05:29,583-Speed 3395.47 samples/sec   Loss 10.1859   LearningRate 0.0881   Epoch: 1   Global Step: 15290   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:05:32,640-Speed 3350.96 samples/sec   Loss 10.2330   LearningRate 0.0881   Epoch: 1   Global Step: 15300   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:05:35,687-Speed 3362.28 samples/sec   Loss 10.3067   LearningRate 0.0881   Epoch: 1   Global Step: 15310   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:05:38,730-Speed 3365.81 samples/sec   Loss 10.3078   LearningRate 0.0880   Epoch: 1   Global Step: 15320   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:05:41,737-Speed 3406.06 samples/sec   Loss 10.2875   LearningRate 0.0880   Epoch: 1   Global Step: 15330   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:05:44,753-Speed 3396.34 samples/sec   Loss 10.2802   LearningRate 0.0880   Epoch: 1   Global Step: 15340   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:05:47,756-Speed 3410.59 samples/sec   Loss 10.2629   LearningRate 0.0880   Epoch: 1   Global Step: 15350   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:05:50,770-Speed 3399.19 samples/sec   Loss 10.3866   LearningRate 0.0880   Epoch: 1   Global Step: 15360   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:05:53,783-Speed 3399.35 samples/sec   Loss 10.1965   LearningRate 0.0880   Epoch: 1   Global Step: 15370   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:05:56,789-Speed 3407.70 samples/sec   Loss 10.2950   LearningRate 0.0880   Epoch: 1   Global Step: 15380   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:05:59,781-Speed 3423.15 samples/sec   Loss 10.2959   LearningRate 0.0880   Epoch: 1   Global Step: 15390   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:06:02,821-Speed 3369.43 samples/sec   Loss 10.1918   LearningRate 0.0880   Epoch: 1   Global Step: 15400   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:06:05,862-Speed 3369.39 samples/sec   Loss 10.2551   LearningRate 0.0880   Epoch: 1   Global Step: 15410   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:06:08,879-Speed 3395.04 samples/sec   Loss 10.3523   LearningRate 0.0880   Epoch: 1   Global Step: 15420   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:06:11,905-Speed 3384.44 samples/sec   Loss 10.1363   LearningRate 0.0880   Epoch: 1   Global Step: 15430   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:06:14,973-Speed 3339.66 samples/sec   Loss 10.3114   LearningRate 0.0880   Epoch: 1   Global Step: 15440   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:06:18,051-Speed 3327.83 samples/sec   Loss 10.2895   LearningRate 0.0879   Epoch: 1   Global Step: 15450   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:06:21,060-Speed 3403.89 samples/sec   Loss 10.1743   LearningRate 0.0879   Epoch: 1   Global Step: 15460   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:06:24,136-Speed 3330.47 samples/sec   Loss 10.2273   LearningRate 0.0879   Epoch: 1   Global Step: 15470   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:06:27,221-Speed 3319.99 samples/sec   Loss 10.1743   LearningRate 0.0879   Epoch: 1   Global Step: 15480   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:06:30,359-Speed 3264.50 samples/sec   Loss 10.1871   LearningRate 0.0879   Epoch: 1   Global Step: 15490   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:06:33,362-Speed 3411.43 samples/sec   Loss 10.2363   LearningRate 0.0879   Epoch: 1   Global Step: 15500   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:06:36,413-Speed 3357.45 samples/sec   Loss 10.4052   LearningRate 0.0879   Epoch: 1   Global Step: 15510   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:06:39,430-Speed 3394.79 samples/sec   Loss 10.2591   LearningRate 0.0879   Epoch: 1   Global Step: 15520   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:06:42,496-Speed 3341.70 samples/sec   Loss 10.2976   LearningRate 0.0879   Epoch: 1   Global Step: 15530   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:06:45,543-Speed 3360.96 samples/sec   Loss 10.2305   LearningRate 0.0879   Epoch: 1   Global Step: 15540   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:06:48,714-Speed 3230.60 samples/sec   Loss 10.3352   LearningRate 0.0879   Epoch: 1   Global Step: 15550   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:06:51,743-Speed 3382.10 samples/sec   Loss 10.1730   LearningRate 0.0879   Epoch: 1   Global Step: 15560   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:06:54,796-Speed 3354.56 samples/sec   Loss 10.3211   LearningRate 0.0879   Epoch: 1   Global Step: 15570   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:06:57,789-Speed 3422.94 samples/sec   Loss 10.3363   LearningRate 0.0879   Epoch: 1   Global Step: 15580   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:07:00,851-Speed 3344.71 samples/sec   Loss 10.3006   LearningRate 0.0878   Epoch: 1   Global Step: 15590   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:07:03,932-Speed 3324.89 samples/sec   Loss 10.1831   LearningRate 0.0878   Epoch: 1   Global Step: 15600   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:07:07,008-Speed 3330.25 samples/sec   Loss 10.2971   LearningRate 0.0878   Epoch: 1   Global Step: 15610   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:07:10,016-Speed 3404.83 samples/sec   Loss 10.1859   LearningRate 0.0878   Epoch: 1   Global Step: 15620   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:07:13,083-Speed 3339.91 samples/sec   Loss 10.2661   LearningRate 0.0878   Epoch: 1   Global Step: 15630   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:07:16,156-Speed 3334.29 samples/sec   Loss 10.2641   LearningRate 0.0878   Epoch: 1   Global Step: 15640   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:07:19,169-Speed 3398.88 samples/sec   Loss 10.3933   LearningRate 0.0878   Epoch: 1   Global Step: 15650   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:07:22,177-Speed 3405.17 samples/sec   Loss 10.2045   LearningRate 0.0878   Epoch: 1   Global Step: 15660   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:07:25,189-Speed 3400.90 samples/sec   Loss 10.1289   LearningRate 0.0878   Epoch: 1   Global Step: 15670   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:07:28,254-Speed 3342.32 samples/sec   Loss 10.3381   LearningRate 0.0878   Epoch: 1   Global Step: 15680   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:07:31,286-Speed 3378.93 samples/sec   Loss 10.1024   LearningRate 0.0878   Epoch: 1   Global Step: 15690   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:07:34,347-Speed 3345.86 samples/sec   Loss 10.2080   LearningRate 0.0878   Epoch: 1   Global Step: 15700   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:07:37,373-Speed 3385.49 samples/sec   Loss 10.2565   LearningRate 0.0878   Epoch: 1   Global Step: 15710   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:07:40,400-Speed 3384.04 samples/sec   Loss 10.3189   LearningRate 0.0877   Epoch: 1   Global Step: 15720   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:07:43,409-Speed 3403.56 samples/sec   Loss 10.1642   LearningRate 0.0877   Epoch: 1   Global Step: 15730   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:07:46,410-Speed 3413.24 samples/sec   Loss 10.2109   LearningRate 0.0877   Epoch: 1   Global Step: 15740   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:07:49,478-Speed 3339.18 samples/sec   Loss 10.1041   LearningRate 0.0877   Epoch: 1   Global Step: 15750   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:07:52,613-Speed 3267.55 samples/sec   Loss 10.3169   LearningRate 0.0877   Epoch: 1   Global Step: 15760   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:07:55,643-Speed 3380.86 samples/sec   Loss 10.1156   LearningRate 0.0877   Epoch: 1   Global Step: 15770   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:07:58,653-Speed 3402.40 samples/sec   Loss 10.0387   LearningRate 0.0877   Epoch: 1   Global Step: 15780   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:08:01,735-Speed 3323.78 samples/sec   Loss 10.2053   LearningRate 0.0877   Epoch: 1   Global Step: 15790   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:08:04,750-Speed 3397.91 samples/sec   Loss 10.2614   LearningRate 0.0877   Epoch: 1   Global Step: 15800   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:08:07,787-Speed 3372.88 samples/sec   Loss 10.2569   LearningRate 0.0877   Epoch: 1   Global Step: 15810   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:08:10,805-Speed 3393.27 samples/sec   Loss 10.2902   LearningRate 0.0877   Epoch: 1   Global Step: 15820   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:08:13,867-Speed 3346.22 samples/sec   Loss 10.1049   LearningRate 0.0877   Epoch: 1   Global Step: 15830   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:08:16,866-Speed 3415.57 samples/sec   Loss 10.2507   LearningRate 0.0877   Epoch: 1   Global Step: 15840   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:08:19,881-Speed 3396.50 samples/sec   Loss 10.3121   LearningRate 0.0876   Epoch: 1   Global Step: 15850   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:08:22,897-Speed 3396.48 samples/sec   Loss 10.2625   LearningRate 0.0876   Epoch: 1   Global Step: 15860   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:08:25,912-Speed 3397.33 samples/sec   Loss 10.1614   LearningRate 0.0876   Epoch: 1   Global Step: 15870   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:08:29,017-Speed 3299.04 samples/sec   Loss 10.3668   LearningRate 0.0876   Epoch: 1   Global Step: 15880   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:08:32,052-Speed 3375.45 samples/sec   Loss 10.0989   LearningRate 0.0876   Epoch: 1   Global Step: 15890   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:08:35,057-Speed 3408.56 samples/sec   Loss 10.1582   LearningRate 0.0876   Epoch: 1   Global Step: 15900   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:08:38,065-Speed 3405.60 samples/sec   Loss 10.4122   LearningRate 0.0876   Epoch: 1   Global Step: 15910   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:08:41,095-Speed 3379.88 samples/sec   Loss 10.2916   LearningRate 0.0876   Epoch: 1   Global Step: 15920   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:08:44,137-Speed 3368.15 samples/sec   Loss 10.2205   LearningRate 0.0876   Epoch: 1   Global Step: 15930   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:08:47,186-Speed 3358.69 samples/sec   Loss 10.1996   LearningRate 0.0876   Epoch: 1   Global Step: 15940   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:08:50,197-Speed 3402.57 samples/sec   Loss 10.2098   LearningRate 0.0876   Epoch: 1   Global Step: 15950   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:08:53,199-Speed 3411.79 samples/sec   Loss 10.3190   LearningRate 0.0876   Epoch: 1   Global Step: 15960   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:08:56,242-Speed 3366.61 samples/sec   Loss 10.3057   LearningRate 0.0876   Epoch: 1   Global Step: 15970   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:08:59,292-Speed 3358.91 samples/sec   Loss 10.2055   LearningRate 0.0875   Epoch: 1   Global Step: 15980   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:09:02,317-Speed 3386.58 samples/sec   Loss 10.3268   LearningRate 0.0875   Epoch: 1   Global Step: 15990   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:09:05,350-Speed 3376.29 samples/sec   Loss 10.2637   LearningRate 0.0875   Epoch: 1   Global Step: 16000   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:09:08,368-Speed 3394.90 samples/sec   Loss 10.1933   LearningRate 0.0875   Epoch: 1   Global Step: 16010   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:09:11,418-Speed 3358.16 samples/sec   Loss 10.3167   LearningRate 0.0875   Epoch: 1   Global Step: 16020   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:09:14,440-Speed 3389.58 samples/sec   Loss 10.3710   LearningRate 0.0875   Epoch: 1   Global Step: 16030   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:09:17,485-Speed 3364.42 samples/sec   Loss 10.3780   LearningRate 0.0875   Epoch: 1   Global Step: 16040   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:09:20,486-Speed 3412.83 samples/sec   Loss 10.2568   LearningRate 0.0875   Epoch: 1   Global Step: 16050   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:09:23,497-Speed 3402.36 samples/sec   Loss 10.3349   LearningRate 0.0875   Epoch: 1   Global Step: 16060   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:09:26,570-Speed 3332.89 samples/sec   Loss 10.2384   LearningRate 0.0875   Epoch: 1   Global Step: 16070   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:09:29,607-Speed 3372.84 samples/sec   Loss 10.2774   LearningRate 0.0875   Epoch: 1   Global Step: 16080   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:09:32,691-Speed 3321.02 samples/sec   Loss 10.2271   LearningRate 0.0875   Epoch: 1   Global Step: 16090   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:09:35,734-Speed 3366.30 samples/sec   Loss 10.2352   LearningRate 0.0875   Epoch: 1   Global Step: 16100   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:09:38,790-Speed 3352.48 samples/sec   Loss 10.1782   LearningRate 0.0875   Epoch: 1   Global Step: 16110   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:09:41,808-Speed 3393.89 samples/sec   Loss 10.2330   LearningRate 0.0874   Epoch: 1   Global Step: 16120   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:09:44,841-Speed 3377.41 samples/sec   Loss 10.1738   LearningRate 0.0874   Epoch: 1   Global Step: 16130   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:09:47,886-Speed 3363.67 samples/sec   Loss 10.2105   LearningRate 0.0874   Epoch: 1   Global Step: 16140   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:09:50,884-Speed 3416.35 samples/sec   Loss 10.2745   LearningRate 0.0874   Epoch: 1   Global Step: 16150   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:09:53,913-Speed 3382.26 samples/sec   Loss 10.2114   LearningRate 0.0874   Epoch: 1   Global Step: 16160   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:09:56,916-Speed 3410.37 samples/sec   Loss 10.1402   LearningRate 0.0874   Epoch: 1   Global Step: 16170   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:09:59,933-Speed 3395.91 samples/sec   Loss 10.3800   LearningRate 0.0874   Epoch: 1   Global Step: 16180   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:10:02,996-Speed 3343.77 samples/sec   Loss 10.0298   LearningRate 0.0874   Epoch: 1   Global Step: 16190   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:10:06,071-Speed 3331.48 samples/sec   Loss 10.2791   LearningRate 0.0874   Epoch: 1   Global Step: 16200   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:10:09,084-Speed 3399.53 samples/sec   Loss 10.2183   LearningRate 0.0874   Epoch: 1   Global Step: 16210   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:10:12,098-Speed 3398.46 samples/sec   Loss 10.2767   LearningRate 0.0874   Epoch: 1   Global Step: 16220   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:10:15,131-Speed 3377.53 samples/sec   Loss 10.1965   LearningRate 0.0874   Epoch: 1   Global Step: 16230   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:10:18,196-Speed 3341.57 samples/sec   Loss 10.2425   LearningRate 0.0874   Epoch: 1   Global Step: 16240   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:10:21,241-Speed 3364.47 samples/sec   Loss 10.3411   LearningRate 0.0873   Epoch: 1   Global Step: 16250   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:10:24,273-Speed 3378.13 samples/sec   Loss 10.2384   LearningRate 0.0873   Epoch: 1   Global Step: 16260   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:10:27,302-Speed 3382.13 samples/sec   Loss 10.4143   LearningRate 0.0873   Epoch: 1   Global Step: 16270   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:10:30,394-Speed 3312.55 samples/sec   Loss 10.2856   LearningRate 0.0873   Epoch: 1   Global Step: 16280   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:10:33,444-Speed 3358.39 samples/sec   Loss 10.2315   LearningRate 0.0873   Epoch: 1   Global Step: 16290   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:10:36,480-Speed 3374.48 samples/sec   Loss 10.1894   LearningRate 0.0873   Epoch: 1   Global Step: 16300   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:10:39,499-Speed 3392.39 samples/sec   Loss 10.1750   LearningRate 0.0873   Epoch: 1   Global Step: 16310   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:10:42,613-Speed 3289.40 samples/sec   Loss 10.1946   LearningRate 0.0873   Epoch: 1   Global Step: 16320   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:10:45,662-Speed 3359.91 samples/sec   Loss 10.2631   LearningRate 0.0873   Epoch: 1   Global Step: 16330   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:10:48,790-Speed 3273.75 samples/sec   Loss 10.1608   LearningRate 0.0873   Epoch: 1   Global Step: 16340   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:10:51,824-Speed 3376.25 samples/sec   Loss 10.3667   LearningRate 0.0873   Epoch: 1   Global Step: 16350   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:10:54,835-Speed 3401.68 samples/sec   Loss 10.2183   LearningRate 0.0873   Epoch: 1   Global Step: 16360   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:10:57,841-Speed 3407.85 samples/sec   Loss 10.1603   LearningRate 0.0873   Epoch: 1   Global Step: 16370   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:11:00,878-Speed 3372.92 samples/sec   Loss 10.1680   LearningRate 0.0872   Epoch: 1   Global Step: 16380   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:11:03,939-Speed 3346.31 samples/sec   Loss 10.1815   LearningRate 0.0872   Epoch: 1   Global Step: 16390   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:11:06,975-Speed 3374.05 samples/sec   Loss 10.0545   LearningRate 0.0872   Epoch: 1   Global Step: 16400   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:11:09,997-Speed 3390.44 samples/sec   Loss 10.1089   LearningRate 0.0872   Epoch: 1   Global Step: 16410   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:11:13,024-Speed 3383.21 samples/sec   Loss 10.2090   LearningRate 0.0872   Epoch: 1   Global Step: 16420   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:11:16,116-Speed 3312.96 samples/sec   Loss 10.1896   LearningRate 0.0872   Epoch: 1   Global Step: 16430   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:11:19,147-Speed 3379.41 samples/sec   Loss 10.2167   LearningRate 0.0872   Epoch: 1   Global Step: 16440   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:11:22,174-Speed 3384.32 samples/sec   Loss 10.0794   LearningRate 0.0872   Epoch: 1   Global Step: 16450   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:11:25,269-Speed 3309.30 samples/sec   Loss 10.0740   LearningRate 0.0872   Epoch: 1   Global Step: 16460   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:11:28,320-Speed 3356.75 samples/sec   Loss 10.1772   LearningRate 0.0872   Epoch: 1   Global Step: 16470   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:11:31,404-Speed 3321.54 samples/sec   Loss 10.0980   LearningRate 0.0872   Epoch: 1   Global Step: 16480   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:11:34,406-Speed 3412.79 samples/sec   Loss 10.0872   LearningRate 0.0872   Epoch: 1   Global Step: 16490   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:11:37,425-Speed 3393.12 samples/sec   Loss 10.1119   LearningRate 0.0872   Epoch: 1   Global Step: 16500   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:11:40,440-Speed 3396.97 samples/sec   Loss 10.2606   LearningRate 0.0871   Epoch: 1   Global Step: 16510   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:11:43,434-Speed 3421.20 samples/sec   Loss 10.0382   LearningRate 0.0871   Epoch: 1   Global Step: 16520   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:11:46,459-Speed 3387.23 samples/sec   Loss 10.2496   LearningRate 0.0871   Epoch: 1   Global Step: 16530   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:11:49,647-Speed 3212.35 samples/sec   Loss 10.1857   LearningRate 0.0871   Epoch: 1   Global Step: 16540   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:11:52,658-Speed 3402.03 samples/sec   Loss 10.1537   LearningRate 0.0871   Epoch: 1   Global Step: 16550   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:11:55,681-Speed 3389.02 samples/sec   Loss 10.1564   LearningRate 0.0871   Epoch: 1   Global Step: 16560   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:11:58,680-Speed 3416.04 samples/sec   Loss 10.1784   LearningRate 0.0871   Epoch: 1   Global Step: 16570   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:12:01,679-Speed 3414.39 samples/sec   Loss 10.1423   LearningRate 0.0871   Epoch: 1   Global Step: 16580   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:12:04,739-Speed 3347.77 samples/sec   Loss 10.1887   LearningRate 0.0871   Epoch: 1   Global Step: 16590   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:12:07,740-Speed 3414.13 samples/sec   Loss 10.2438   LearningRate 0.0871   Epoch: 1   Global Step: 16600   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:12:10,792-Speed 3355.91 samples/sec   Loss 10.2063   LearningRate 0.0871   Epoch: 1   Global Step: 16610   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:12:13,867-Speed 3331.63 samples/sec   Loss 10.1636   LearningRate 0.0871   Epoch: 1   Global Step: 16620   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:12:16,899-Speed 3378.00 samples/sec   Loss 10.1860   LearningRate 0.0871   Epoch: 1   Global Step: 16630   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:12:19,905-Speed 3407.52 samples/sec   Loss 9.9315   LearningRate 0.0871   Epoch: 1   Global Step: 16640   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:12:22,929-Speed 3389.00 samples/sec   Loss 10.1416   LearningRate 0.0870   Epoch: 1   Global Step: 16650   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:12:26,005-Speed 3329.80 samples/sec   Loss 10.1254   LearningRate 0.0870   Epoch: 1   Global Step: 16660   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:12:29,011-Speed 3407.83 samples/sec   Loss 10.1142   LearningRate 0.0870   Epoch: 1   Global Step: 16670   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:12:32,029-Speed 3394.53 samples/sec   Loss 10.2145   LearningRate 0.0870   Epoch: 1   Global Step: 16680   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:12:35,042-Speed 3399.34 samples/sec   Loss 10.2189   LearningRate 0.0870   Epoch: 1   Global Step: 16690   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:12:38,066-Speed 3387.52 samples/sec   Loss 10.2548   LearningRate 0.0870   Epoch: 1   Global Step: 16700   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:12:41,064-Speed 3416.90 samples/sec   Loss 10.1511   LearningRate 0.0870   Epoch: 1   Global Step: 16710   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:12:44,071-Speed 3406.43 samples/sec   Loss 10.2510   LearningRate 0.0870   Epoch: 1   Global Step: 16720   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:12:47,088-Speed 3395.68 samples/sec   Loss 10.1472   LearningRate 0.0870   Epoch: 1   Global Step: 16730   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:12:50,091-Speed 3411.21 samples/sec   Loss 10.0625   LearningRate 0.0870   Epoch: 1   Global Step: 16740   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:12:53,127-Speed 3372.84 samples/sec   Loss 10.0392   LearningRate 0.0870   Epoch: 1   Global Step: 16750   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:12:56,181-Speed 3354.97 samples/sec   Loss 9.9947   LearningRate 0.0870   Epoch: 1   Global Step: 16760   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:12:59,178-Speed 3417.23 samples/sec   Loss 10.2434   LearningRate 0.0870   Epoch: 1   Global Step: 16770   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:13:02,234-Speed 3352.76 samples/sec   Loss 10.1393   LearningRate 0.0869   Epoch: 1   Global Step: 16780   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:13:05,233-Speed 3415.14 samples/sec   Loss 10.2316   LearningRate 0.0869   Epoch: 1   Global Step: 16790   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:13:08,243-Speed 3402.62 samples/sec   Loss 10.1629   LearningRate 0.0869   Epoch: 1   Global Step: 16800   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:13:11,309-Speed 3341.16 samples/sec   Loss 10.0271   LearningRate 0.0869   Epoch: 1   Global Step: 16810   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:13:14,398-Speed 3316.14 samples/sec   Loss 10.2524   LearningRate 0.0869   Epoch: 1   Global Step: 16820   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:13:17,404-Speed 3408.28 samples/sec   Loss 10.2111   LearningRate 0.0869   Epoch: 1   Global Step: 16830   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:13:20,433-Speed 3381.99 samples/sec   Loss 10.1042   LearningRate 0.0869   Epoch: 1   Global Step: 16840   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:13:23,475-Speed 3367.04 samples/sec   Loss 10.1821   LearningRate 0.0869   Epoch: 1   Global Step: 16850   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:13:26,570-Speed 3309.89 samples/sec   Loss 10.0361   LearningRate 0.0869   Epoch: 1   Global Step: 16860   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:13:29,618-Speed 3359.51 samples/sec   Loss 10.2556   LearningRate 0.0869   Epoch: 1   Global Step: 16870   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:13:32,649-Speed 3380.53 samples/sec   Loss 10.1263   LearningRate 0.0869   Epoch: 1   Global Step: 16880   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:13:35,674-Speed 3385.74 samples/sec   Loss 10.2395   LearningRate 0.0869   Epoch: 1   Global Step: 16890   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:13:38,706-Speed 3378.58 samples/sec   Loss 10.1644   LearningRate 0.0869   Epoch: 1   Global Step: 16900   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:13:41,718-Speed 3400.99 samples/sec   Loss 10.1823   LearningRate 0.0868   Epoch: 1   Global Step: 16910   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:13:44,758-Speed 3369.06 samples/sec   Loss 10.2699   LearningRate 0.0868   Epoch: 1   Global Step: 16920   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:13:47,791-Speed 3377.81 samples/sec   Loss 9.9648   LearningRate 0.0868   Epoch: 1   Global Step: 16930   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:13:50,893-Speed 3301.98 samples/sec   Loss 10.3175   LearningRate 0.0868   Epoch: 1   Global Step: 16940   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:13:53,974-Speed 3324.68 samples/sec   Loss 10.2144   LearningRate 0.0868   Epoch: 1   Global Step: 16950   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:13:56,972-Speed 3417.54 samples/sec   Loss 10.2591   LearningRate 0.0868   Epoch: 1   Global Step: 16960   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:14:00,011-Speed 3370.62 samples/sec   Loss 10.1349   LearningRate 0.0868   Epoch: 1   Global Step: 16970   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:14:03,022-Speed 3401.79 samples/sec   Loss 9.9934   LearningRate 0.0868   Epoch: 1   Global Step: 16980   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:14:06,083-Speed 3346.91 samples/sec   Loss 10.1389   LearningRate 0.0868   Epoch: 1   Global Step: 16990   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:14:09,075-Speed 3422.80 samples/sec   Loss 10.0603   LearningRate 0.0868   Epoch: 1   Global Step: 17000   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:14:12,121-Speed 3362.94 samples/sec   Loss 10.1678   LearningRate 0.0868   Epoch: 1   Global Step: 17010   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:14:15,189-Speed 3339.10 samples/sec   Loss 10.1441   LearningRate 0.0868   Epoch: 1   Global Step: 17020   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-04-27 03:14:18,229-Speed 3369.86 samples/sec   Loss 10.2337   LearningRate 0.0868   Epoch: 1   Global Step: 17030   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:14:21,269-Speed 3369.05 samples/sec   Loss 10.1044   LearningRate 0.0868   Epoch: 1   Global Step: 17040   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:14:24,324-Speed 3353.08 samples/sec   Loss 10.2387   LearningRate 0.0867   Epoch: 1   Global Step: 17050   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:14:27,362-Speed 3371.16 samples/sec   Loss 10.0375   LearningRate 0.0867   Epoch: 1   Global Step: 17060   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:14:30,466-Speed 3301.01 samples/sec   Loss 10.0855   LearningRate 0.0867   Epoch: 1   Global Step: 17070   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:14:33,459-Speed 3421.91 samples/sec   Loss 10.1725   LearningRate 0.0867   Epoch: 1   Global Step: 17080   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:14:36,508-Speed 3359.77 samples/sec   Loss 10.1870   LearningRate 0.0867   Epoch: 1   Global Step: 17090   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:14:39,573-Speed 3341.99 samples/sec   Loss 10.0379   LearningRate 0.0867   Epoch: 1   Global Step: 17100   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:14:42,608-Speed 3375.32 samples/sec   Loss 10.2953   LearningRate 0.0867   Epoch: 1   Global Step: 17110   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:14:45,631-Speed 3388.42 samples/sec   Loss 10.1755   LearningRate 0.0867   Epoch: 1   Global Step: 17120   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:14:48,709-Speed 3328.37 samples/sec   Loss 10.2428   LearningRate 0.0867   Epoch: 1   Global Step: 17130   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:14:51,761-Speed 3356.26 samples/sec   Loss 9.9815   LearningRate 0.0867   Epoch: 1   Global Step: 17140   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:14:54,783-Speed 3390.19 samples/sec   Loss 10.1877   LearningRate 0.0867   Epoch: 1   Global Step: 17150   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:14:57,796-Speed 3398.97 samples/sec   Loss 10.1217   LearningRate 0.0867   Epoch: 1   Global Step: 17160   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:15:00,846-Speed 3359.03 samples/sec   Loss 10.2963   LearningRate 0.0867   Epoch: 1   Global Step: 17170   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:15:04,016-Speed 3230.79 samples/sec   Loss 10.1578   LearningRate 0.0866   Epoch: 1   Global Step: 17180   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:15:07,131-Speed 3288.76 samples/sec   Loss 10.2161   LearningRate 0.0866   Epoch: 1   Global Step: 17190   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:15:10,194-Speed 3343.97 samples/sec   Loss 10.0435   LearningRate 0.0866   Epoch: 1   Global Step: 17200   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:15:13,273-Speed 3327.61 samples/sec   Loss 10.1979   LearningRate 0.0866   Epoch: 1   Global Step: 17210   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:15:16,320-Speed 3362.01 samples/sec   Loss 10.2494   LearningRate 0.0866   Epoch: 1   Global Step: 17220   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:15:19,390-Speed 3335.70 samples/sec   Loss 10.1030   LearningRate 0.0866   Epoch: 1   Global Step: 17230   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:15:22,452-Speed 3345.87 samples/sec   Loss 10.0727   LearningRate 0.0866   Epoch: 1   Global Step: 17240   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:15:25,536-Speed 3320.84 samples/sec   Loss 10.1156   LearningRate 0.0866   Epoch: 1   Global Step: 17250   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:15:28,596-Speed 3347.84 samples/sec   Loss 10.1823   LearningRate 0.0866   Epoch: 1   Global Step: 17260   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:15:31,682-Speed 3319.61 samples/sec   Loss 10.1097   LearningRate 0.0866   Epoch: 1   Global Step: 17270   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:15:34,756-Speed 3332.47 samples/sec   Loss 10.0852   LearningRate 0.0866   Epoch: 1   Global Step: 17280   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:15:37,817-Speed 3345.57 samples/sec   Loss 10.1215   LearningRate 0.0866   Epoch: 1   Global Step: 17290   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:15:40,922-Speed 3299.24 samples/sec   Loss 10.2565   LearningRate 0.0866   Epoch: 1   Global Step: 17300   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:15:44,001-Speed 3327.09 samples/sec   Loss 10.1275   LearningRate 0.0865   Epoch: 1   Global Step: 17310   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:15:47,040-Speed 3370.85 samples/sec   Loss 10.0248   LearningRate 0.0865   Epoch: 1   Global Step: 17320   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:15:50,063-Speed 3388.78 samples/sec   Loss 10.0777   LearningRate 0.0865   Epoch: 1   Global Step: 17330   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:15:53,133-Speed 3336.31 samples/sec   Loss 10.1636   LearningRate 0.0865   Epoch: 1   Global Step: 17340   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:15:56,149-Speed 3396.14 samples/sec   Loss 10.0974   LearningRate 0.0865   Epoch: 1   Global Step: 17350   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:15:59,198-Speed 3359.64 samples/sec   Loss 10.0220   LearningRate 0.0865   Epoch: 1   Global Step: 17360   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:16:02,249-Speed 3357.25 samples/sec   Loss 10.0676   LearningRate 0.0865   Epoch: 1   Global Step: 17370   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:16:05,366-Speed 3286.45 samples/sec   Loss 10.0001   LearningRate 0.0865   Epoch: 1   Global Step: 17380   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:16:08,402-Speed 3374.10 samples/sec   Loss 10.1135   LearningRate 0.0865   Epoch: 1   Global Step: 17390   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:16:11,404-Speed 3411.24 samples/sec   Loss 10.0485   LearningRate 0.0865   Epoch: 1   Global Step: 17400   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:16:14,454-Speed 3358.74 samples/sec   Loss 10.1104   LearningRate 0.0865   Epoch: 1   Global Step: 17410   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:16:17,491-Speed 3372.95 samples/sec   Loss 10.0434   LearningRate 0.0865   Epoch: 1   Global Step: 17420   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:16:20,557-Speed 3340.58 samples/sec   Loss 10.1177   LearningRate 0.0865   Epoch: 1   Global Step: 17430   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:16:23,578-Speed 3391.43 samples/sec   Loss 10.3234   LearningRate 0.0865   Epoch: 1   Global Step: 17440   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:16:26,712-Speed 3268.15 samples/sec   Loss 10.1068   LearningRate 0.0864   Epoch: 1   Global Step: 17450   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:16:29,727-Speed 3396.83 samples/sec   Loss 10.1340   LearningRate 0.0864   Epoch: 1   Global Step: 17460   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:16:32,755-Speed 3383.21 samples/sec   Loss 10.1564   LearningRate 0.0864   Epoch: 1   Global Step: 17470   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:16:35,766-Speed 3402.56 samples/sec   Loss 9.9787   LearningRate 0.0864   Epoch: 1   Global Step: 17480   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:16:38,778-Speed 3400.88 samples/sec   Loss 10.1254   LearningRate 0.0864   Epoch: 1   Global Step: 17490   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:16:41,786-Speed 3405.04 samples/sec   Loss 10.1153   LearningRate 0.0864   Epoch: 1   Global Step: 17500   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:16:44,800-Speed 3398.78 samples/sec   Loss 10.1368   LearningRate 0.0864   Epoch: 1   Global Step: 17510   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:16:47,880-Speed 3325.44 samples/sec   Loss 10.2076   LearningRate 0.0864   Epoch: 1   Global Step: 17520   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:16:50,920-Speed 3369.59 samples/sec   Loss 10.1378   LearningRate 0.0864   Epoch: 1   Global Step: 17530   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:16:53,989-Speed 3337.49 samples/sec   Loss 10.2293   LearningRate 0.0864   Epoch: 1   Global Step: 17540   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:16:57,034-Speed 3364.17 samples/sec   Loss 10.1363   LearningRate 0.0864   Epoch: 1   Global Step: 17550   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:17:00,038-Speed 3409.88 samples/sec   Loss 10.1949   LearningRate 0.0864   Epoch: 1   Global Step: 17560   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:17:03,101-Speed 3344.83 samples/sec   Loss 10.0561   LearningRate 0.0864   Epoch: 1   Global Step: 17570   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:17:06,119-Speed 3393.74 samples/sec   Loss 10.1149   LearningRate 0.0863   Epoch: 1   Global Step: 17580   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:17:09,122-Speed 3411.15 samples/sec   Loss 10.1137   LearningRate 0.0863   Epoch: 1   Global Step: 17590   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:17:12,150-Speed 3382.86 samples/sec   Loss 9.9956   LearningRate 0.0863   Epoch: 1   Global Step: 17600   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:17:15,150-Speed 3413.28 samples/sec   Loss 9.9887   LearningRate 0.0863   Epoch: 1   Global Step: 17610   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:17:18,228-Speed 3328.40 samples/sec   Loss 10.0884   LearningRate 0.0863   Epoch: 1   Global Step: 17620   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:17:21,251-Speed 3387.85 samples/sec   Loss 10.1135   LearningRate 0.0863   Epoch: 1   Global Step: 17630   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:17:24,316-Speed 3342.70 samples/sec   Loss 9.9540   LearningRate 0.0863   Epoch: 1   Global Step: 17640   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:17:27,334-Speed 3393.97 samples/sec   Loss 10.1263   LearningRate 0.0863   Epoch: 1   Global Step: 17650   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:17:30,392-Speed 3349.35 samples/sec   Loss 10.1259   LearningRate 0.0863   Epoch: 1   Global Step: 17660   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:17:33,412-Speed 3392.36 samples/sec   Loss 10.0707   LearningRate 0.0863   Epoch: 1   Global Step: 17670   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:17:36,431-Speed 3392.89 samples/sec   Loss 9.9566   LearningRate 0.0863   Epoch: 1   Global Step: 17680   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:17:39,433-Speed 3411.67 samples/sec   Loss 10.0962   LearningRate 0.0863   Epoch: 1   Global Step: 17690   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:17:42,470-Speed 3372.85 samples/sec   Loss 10.1136   LearningRate 0.0863   Epoch: 1   Global Step: 17700   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:17:45,480-Speed 3403.95 samples/sec   Loss 9.9566   LearningRate 0.0863   Epoch: 1   Global Step: 17710   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:17:48,472-Speed 3422.89 samples/sec   Loss 10.0330   LearningRate 0.0862   Epoch: 1   Global Step: 17720   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:17:51,534-Speed 3345.45 samples/sec   Loss 10.1347   LearningRate 0.0862   Epoch: 1   Global Step: 17730   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:17:54,565-Speed 3379.96 samples/sec   Loss 10.1062   LearningRate 0.0862   Epoch: 1   Global Step: 17740   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:17:57,586-Speed 3390.31 samples/sec   Loss 9.9686   LearningRate 0.0862   Epoch: 1   Global Step: 17750   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:18:00,583-Speed 3418.45 samples/sec   Loss 9.9944   LearningRate 0.0862   Epoch: 1   Global Step: 17760   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:18:03,625-Speed 3367.24 samples/sec   Loss 10.1130   LearningRate 0.0862   Epoch: 1   Global Step: 17770   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:18:06,663-Speed 3371.05 samples/sec   Loss 10.1266   LearningRate 0.0862   Epoch: 1   Global Step: 17780   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:18:09,647-Speed 3432.82 samples/sec   Loss 10.1633   LearningRate 0.0862   Epoch: 1   Global Step: 17790   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:18:12,680-Speed 3377.98 samples/sec   Loss 10.2494   LearningRate 0.0862   Epoch: 1   Global Step: 17800   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:18:15,755-Speed 3331.13 samples/sec   Loss 10.0498   LearningRate 0.0862   Epoch: 1   Global Step: 17810   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:18:18,809-Speed 3353.32 samples/sec   Loss 9.8818   LearningRate 0.0862   Epoch: 1   Global Step: 17820   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:18:21,811-Speed 3412.47 samples/sec   Loss 10.0440   LearningRate 0.0862   Epoch: 1   Global Step: 17830   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:18:24,861-Speed 3358.44 samples/sec   Loss 10.0072   LearningRate 0.0862   Epoch: 1   Global Step: 17840   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:18:27,891-Speed 3380.83 samples/sec   Loss 10.1235   LearningRate 0.0861   Epoch: 1   Global Step: 17850   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:18:30,967-Speed 3330.24 samples/sec   Loss 10.0564   LearningRate 0.0861   Epoch: 1   Global Step: 17860   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:18:34,025-Speed 3349.73 samples/sec   Loss 10.0060   LearningRate 0.0861   Epoch: 1   Global Step: 17870   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:18:37,090-Speed 3341.76 samples/sec   Loss 10.0399   LearningRate 0.0861   Epoch: 1   Global Step: 17880   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:18:40,151-Speed 3346.02 samples/sec   Loss 10.0216   LearningRate 0.0861   Epoch: 1   Global Step: 17890   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:18:43,193-Speed 3367.73 samples/sec   Loss 10.0870   LearningRate 0.0861   Epoch: 1   Global Step: 17900   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:18:46,251-Speed 3349.65 samples/sec   Loss 10.0556   LearningRate 0.0861   Epoch: 1   Global Step: 17910   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:18:49,344-Speed 3311.76 samples/sec   Loss 9.9864   LearningRate 0.0861   Epoch: 1   Global Step: 17920   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:18:52,396-Speed 3356.54 samples/sec   Loss 10.0052   LearningRate 0.0861   Epoch: 1   Global Step: 17930   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:18:55,425-Speed 3381.92 samples/sec   Loss 10.0760   LearningRate 0.0861   Epoch: 1   Global Step: 17940   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:18:58,415-Speed 3425.13 samples/sec   Loss 10.0788   LearningRate 0.0861   Epoch: 1   Global Step: 17950   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:19:01,465-Speed 3358.62 samples/sec   Loss 9.9897   LearningRate 0.0861   Epoch: 1   Global Step: 17960   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:19:04,484-Speed 3392.96 samples/sec   Loss 10.1063   LearningRate 0.0861   Epoch: 1   Global Step: 17970   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:19:07,544-Speed 3348.38 samples/sec   Loss 9.9881   LearningRate 0.0860   Epoch: 1   Global Step: 17980   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:19:10,549-Speed 3408.51 samples/sec   Loss 9.9067   LearningRate 0.0860   Epoch: 1   Global Step: 17990   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:19:13,613-Speed 3342.63 samples/sec   Loss 10.0180   LearningRate 0.0860   Epoch: 1   Global Step: 18000   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:19:16,655-Speed 3367.93 samples/sec   Loss 10.0647   LearningRate 0.0860   Epoch: 1   Global Step: 18010   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:19:19,651-Speed 3418.70 samples/sec   Loss 9.9634   LearningRate 0.0860   Epoch: 1   Global Step: 18020   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:19:22,667-Speed 3396.36 samples/sec   Loss 10.0078   LearningRate 0.0860   Epoch: 1   Global Step: 18030   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:19:25,680-Speed 3398.89 samples/sec   Loss 10.0720   LearningRate 0.0860   Epoch: 1   Global Step: 18040   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:19:28,676-Speed 3420.12 samples/sec   Loss 10.0371   LearningRate 0.0860   Epoch: 1   Global Step: 18050   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:19:31,687-Speed 3400.89 samples/sec   Loss 9.9894   LearningRate 0.0860   Epoch: 1   Global Step: 18060   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:19:34,747-Speed 3348.22 samples/sec   Loss 10.0576   LearningRate 0.0860   Epoch: 1   Global Step: 18070   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:19:37,819-Speed 3333.90 samples/sec   Loss 10.0521   LearningRate 0.0860   Epoch: 1   Global Step: 18080   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:19:40,847-Speed 3383.59 samples/sec   Loss 10.0149   LearningRate 0.0860   Epoch: 1   Global Step: 18090   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:19:43,905-Speed 3349.10 samples/sec   Loss 9.9215   LearningRate 0.0860   Epoch: 1   Global Step: 18100   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:19:46,984-Speed 3327.53 samples/sec   Loss 10.1061   LearningRate 0.0860   Epoch: 1   Global Step: 18110   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:19:50,029-Speed 3363.77 samples/sec   Loss 10.0600   LearningRate 0.0859   Epoch: 1   Global Step: 18120   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:19:53,132-Speed 3300.88 samples/sec   Loss 10.1057   LearningRate 0.0859   Epoch: 1   Global Step: 18130   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:19:56,153-Speed 3390.51 samples/sec   Loss 9.9427   LearningRate 0.0859   Epoch: 1   Global Step: 18140   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:19:59,220-Speed 3340.19 samples/sec   Loss 9.9322   LearningRate 0.0859   Epoch: 1   Global Step: 18150   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:20:02,319-Speed 3305.05 samples/sec   Loss 9.9200   LearningRate 0.0859   Epoch: 1   Global Step: 18160   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:20:05,329-Speed 3402.87 samples/sec   Loss 9.9621   LearningRate 0.0859   Epoch: 1   Global Step: 18170   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:20:08,394-Speed 3342.10 samples/sec   Loss 9.9423   LearningRate 0.0859   Epoch: 1   Global Step: 18180   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:20:11,462-Speed 3339.14 samples/sec   Loss 9.9244   LearningRate 0.0859   Epoch: 1   Global Step: 18190   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:20:14,519-Speed 3350.99 samples/sec   Loss 10.0227   LearningRate 0.0859   Epoch: 1   Global Step: 18200   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:20:17,611-Speed 3313.09 samples/sec   Loss 10.0029   LearningRate 0.0859   Epoch: 1   Global Step: 18210   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:20:20,624-Speed 3399.29 samples/sec   Loss 9.8568   LearningRate 0.0859   Epoch: 1   Global Step: 18220   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:20:23,643-Speed 3393.09 samples/sec   Loss 9.9295   LearningRate 0.0859   Epoch: 1   Global Step: 18230   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:20:26,671-Speed 3382.81 samples/sec   Loss 10.0609   LearningRate 0.0859   Epoch: 1   Global Step: 18240   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:20:29,774-Speed 3301.47 samples/sec   Loss 10.1152   LearningRate 0.0858   Epoch: 1   Global Step: 18250   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:20:32,811-Speed 3372.50 samples/sec   Loss 10.0440   LearningRate 0.0858   Epoch: 1   Global Step: 18260   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:20:35,881-Speed 3336.16 samples/sec   Loss 10.0602   LearningRate 0.0858   Epoch: 1   Global Step: 18270   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:20:38,954-Speed 3333.92 samples/sec   Loss 9.9329   LearningRate 0.0858   Epoch: 1   Global Step: 18280   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:20:41,980-Speed 3384.90 samples/sec   Loss 9.9360   LearningRate 0.0858   Epoch: 1   Global Step: 18290   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:20:45,003-Speed 3389.03 samples/sec   Loss 10.0579   LearningRate 0.0858   Epoch: 1   Global Step: 18300   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:20:48,059-Speed 3351.22 samples/sec   Loss 9.8425   LearningRate 0.0858   Epoch: 1   Global Step: 18310   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:20:51,123-Speed 3343.33 samples/sec   Loss 9.9733   LearningRate 0.0858   Epoch: 1   Global Step: 18320   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:20:54,157-Speed 3376.01 samples/sec   Loss 9.9135   LearningRate 0.0858   Epoch: 1   Global Step: 18330   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:20:57,190-Speed 3377.53 samples/sec   Loss 10.1286   LearningRate 0.0858   Epoch: 1   Global Step: 18340   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:21:00,227-Speed 3373.80 samples/sec   Loss 9.9761   LearningRate 0.0858   Epoch: 1   Global Step: 18350   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:21:03,330-Speed 3300.41 samples/sec   Loss 9.9737   LearningRate 0.0858   Epoch: 1   Global Step: 18360   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:21:06,452-Speed 3280.78 samples/sec   Loss 10.0341   LearningRate 0.0858   Epoch: 1   Global Step: 18370   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:21:09,464-Speed 3400.67 samples/sec   Loss 9.9491   LearningRate 0.0857   Epoch: 1   Global Step: 18380   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:21:12,536-Speed 3335.02 samples/sec   Loss 9.8478   LearningRate 0.0857   Epoch: 1   Global Step: 18390   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:21:15,601-Speed 3341.45 samples/sec   Loss 9.9802   LearningRate 0.0857   Epoch: 1   Global Step: 18400   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:21:18,658-Speed 3351.55 samples/sec   Loss 9.9196   LearningRate 0.0857   Epoch: 1   Global Step: 18410   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:21:21,689-Speed 3379.03 samples/sec   Loss 10.0160   LearningRate 0.0857   Epoch: 1   Global Step: 18420   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:21:24,747-Speed 3349.86 samples/sec   Loss 10.0355   LearningRate 0.0857   Epoch: 1   Global Step: 18430   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:21:27,763-Speed 3396.53 samples/sec   Loss 10.0131   LearningRate 0.0857   Epoch: 1   Global Step: 18440   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:21:30,832-Speed 3337.53 samples/sec   Loss 10.0332   LearningRate 0.0857   Epoch: 1   Global Step: 18450   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:21:33,856-Speed 3386.79 samples/sec   Loss 10.0562   LearningRate 0.0857   Epoch: 1   Global Step: 18460   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:21:36,878-Speed 3390.21 samples/sec   Loss 9.8407   LearningRate 0.0857   Epoch: 1   Global Step: 18470   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:21:39,907-Speed 3381.20 samples/sec   Loss 10.0058   LearningRate 0.0857   Epoch: 1   Global Step: 18480   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:21:42,967-Speed 3347.25 samples/sec   Loss 9.9474   LearningRate 0.0857   Epoch: 1   Global Step: 18490   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:21:45,997-Speed 3381.40 samples/sec   Loss 10.0551   LearningRate 0.0857   Epoch: 1   Global Step: 18500   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:21:49,016-Speed 3391.87 samples/sec   Loss 9.9833   LearningRate 0.0857   Epoch: 1   Global Step: 18510   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:21:52,069-Speed 3355.59 samples/sec   Loss 10.0417   LearningRate 0.0856   Epoch: 1   Global Step: 18520   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:21:55,143-Speed 3332.52 samples/sec   Loss 9.9416   LearningRate 0.0856   Epoch: 1   Global Step: 18530   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:21:58,224-Speed 3324.24 samples/sec   Loss 9.9348   LearningRate 0.0856   Epoch: 1   Global Step: 18540   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:22:01,273-Speed 3359.70 samples/sec   Loss 10.1019   LearningRate 0.0856   Epoch: 1   Global Step: 18550   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:22:04,343-Speed 3336.74 samples/sec   Loss 10.0594   LearningRate 0.0856   Epoch: 1   Global Step: 18560   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:22:07,354-Speed 3402.80 samples/sec   Loss 9.9035   LearningRate 0.0856   Epoch: 1   Global Step: 18570   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:22:10,359-Speed 3408.49 samples/sec   Loss 9.9072   LearningRate 0.0856   Epoch: 1   Global Step: 18580   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:22:13,432-Speed 3332.62 samples/sec   Loss 10.1570   LearningRate 0.0856   Epoch: 1   Global Step: 18590   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:22:16,489-Speed 3351.29 samples/sec   Loss 9.9714   LearningRate 0.0856   Epoch: 1   Global Step: 18600   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:22:19,523-Speed 3376.27 samples/sec   Loss 9.9254   LearningRate 0.0856   Epoch: 1   Global Step: 18610   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:22:22,600-Speed 3329.38 samples/sec   Loss 10.0446   LearningRate 0.0856   Epoch: 1   Global Step: 18620   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:22:25,762-Speed 3239.52 samples/sec   Loss 10.0463   LearningRate 0.0856   Epoch: 1   Global Step: 18630   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:22:28,771-Speed 3403.26 samples/sec   Loss 9.9387   LearningRate 0.0856   Epoch: 1   Global Step: 18640   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:22:31,827-Speed 3352.42 samples/sec   Loss 10.0871   LearningRate 0.0855   Epoch: 1   Global Step: 18650   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:22:34,851-Speed 3387.60 samples/sec   Loss 10.0546   LearningRate 0.0855   Epoch: 1   Global Step: 18660   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:22:37,867-Speed 3397.12 samples/sec   Loss 10.0488   LearningRate 0.0855   Epoch: 1   Global Step: 18670   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:22:40,913-Speed 3362.67 samples/sec   Loss 9.9695   LearningRate 0.0855   Epoch: 1   Global Step: 18680   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:22:43,935-Speed 3389.19 samples/sec   Loss 9.9022   LearningRate 0.0855   Epoch: 1   Global Step: 18690   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:22:46,988-Speed 3355.69 samples/sec   Loss 9.9957   LearningRate 0.0855   Epoch: 1   Global Step: 18700   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:22:50,057-Speed 3337.09 samples/sec   Loss 9.9642   LearningRate 0.0855   Epoch: 1   Global Step: 18710   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:22:53,105-Speed 3360.70 samples/sec   Loss 10.0152   LearningRate 0.0855   Epoch: 1   Global Step: 18720   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:22:56,121-Speed 3397.39 samples/sec   Loss 9.9912   LearningRate 0.0855   Epoch: 1   Global Step: 18730   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:22:59,150-Speed 3381.73 samples/sec   Loss 9.9961   LearningRate 0.0855   Epoch: 1   Global Step: 18740   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:23:02,214-Speed 3343.64 samples/sec   Loss 10.0501   LearningRate 0.0855   Epoch: 1   Global Step: 18750   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:23:05,255-Speed 3368.04 samples/sec   Loss 9.9226   LearningRate 0.0855   Epoch: 1   Global Step: 18760   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:23:08,306-Speed 3357.37 samples/sec   Loss 9.8695   LearningRate 0.0855   Epoch: 1   Global Step: 18770   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:23:11,363-Speed 3350.88 samples/sec   Loss 9.9575   LearningRate 0.0855   Epoch: 1   Global Step: 18780   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:23:14,443-Speed 3326.45 samples/sec   Loss 9.9790   LearningRate 0.0854   Epoch: 1   Global Step: 18790   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:23:17,509-Speed 3339.96 samples/sec   Loss 9.9603   LearningRate 0.0854   Epoch: 1   Global Step: 18800   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:23:20,562-Speed 3355.19 samples/sec   Loss 9.8801   LearningRate 0.0854   Epoch: 1   Global Step: 18810   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:23:23,639-Speed 3329.94 samples/sec   Loss 9.9572   LearningRate 0.0854   Epoch: 1   Global Step: 18820   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:23:26,738-Speed 3305.34 samples/sec   Loss 9.8359   LearningRate 0.0854   Epoch: 1   Global Step: 18830   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:23:29,783-Speed 3363.72 samples/sec   Loss 9.8734   LearningRate 0.0854   Epoch: 1   Global Step: 18840   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:23:32,839-Speed 3351.23 samples/sec   Loss 9.8655   LearningRate 0.0854   Epoch: 1   Global Step: 18850   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:23:35,874-Speed 3374.96 samples/sec   Loss 9.9487   LearningRate 0.0854   Epoch: 1   Global Step: 18860   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:23:38,904-Speed 3381.24 samples/sec   Loss 9.8889   LearningRate 0.0854   Epoch: 1   Global Step: 18870   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:23:41,925-Speed 3389.91 samples/sec   Loss 10.0002   LearningRate 0.0854   Epoch: 1   Global Step: 18880   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:23:44,962-Speed 3373.28 samples/sec   Loss 9.9063   LearningRate 0.0854   Epoch: 1   Global Step: 18890   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:23:47,976-Speed 3398.06 samples/sec   Loss 10.0038   LearningRate 0.0854   Epoch: 1   Global Step: 18900   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:23:51,065-Speed 3316.02 samples/sec   Loss 9.8384   LearningRate 0.0854   Epoch: 1   Global Step: 18910   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-04-27 03:23:54,122-Speed 3351.44 samples/sec   Loss 10.0180   LearningRate 0.0853   Epoch: 1   Global Step: 18920   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:23:57,154-Speed 3378.41 samples/sec   Loss 9.8570   LearningRate 0.0853   Epoch: 1   Global Step: 18930   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:24:00,171-Speed 3394.94 samples/sec   Loss 9.8264   LearningRate 0.0853   Epoch: 1   Global Step: 18940   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:24:03,312-Speed 3260.78 samples/sec   Loss 9.8925   LearningRate 0.0853   Epoch: 1   Global Step: 18950   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:24:06,374-Speed 3345.53 samples/sec   Loss 9.9048   LearningRate 0.0853   Epoch: 1   Global Step: 18960   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:24:09,382-Speed 3405.27 samples/sec   Loss 9.9722   LearningRate 0.0853   Epoch: 1   Global Step: 18970   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:24:13,122-Speed 2738.71 samples/sec   Loss 10.0353   LearningRate 0.0853   Epoch: 1   Global Step: 18980   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:24:16,145-Speed 3387.68 samples/sec   Loss 9.8756   LearningRate 0.0853   Epoch: 1   Global Step: 18990   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:24:19,238-Speed 3311.75 samples/sec   Loss 9.8542   LearningRate 0.0853   Epoch: 1   Global Step: 19000   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:24:22,250-Speed 3401.16 samples/sec   Loss 9.7604   LearningRate 0.0853   Epoch: 1   Global Step: 19010   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:24:25,289-Speed 3370.69 samples/sec   Loss 9.8783   LearningRate 0.0853   Epoch: 1   Global Step: 19020   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:24:28,331-Speed 3367.23 samples/sec   Loss 10.0290   LearningRate 0.0853   Epoch: 1   Global Step: 19030   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:24:31,421-Speed 3315.07 samples/sec   Loss 9.9001   LearningRate 0.0853   Epoch: 1   Global Step: 19040   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:24:34,476-Speed 3352.69 samples/sec   Loss 9.9511   LearningRate 0.0853   Epoch: 1   Global Step: 19050   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:24:37,495-Speed 3393.54 samples/sec   Loss 9.9342   LearningRate 0.0852   Epoch: 1   Global Step: 19060   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:24:40,538-Speed 3365.80 samples/sec   Loss 10.0683   LearningRate 0.0852   Epoch: 1   Global Step: 19070   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:24:43,596-Speed 3349.71 samples/sec   Loss 9.8898   LearningRate 0.0852   Epoch: 1   Global Step: 19080   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:24:46,668-Speed 3334.25 samples/sec   Loss 9.9270   LearningRate 0.0852   Epoch: 1   Global Step: 19090   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:24:49,695-Speed 3383.61 samples/sec   Loss 9.9671   LearningRate 0.0852   Epoch: 1   Global Step: 19100   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:24:52,740-Speed 3364.57 samples/sec   Loss 9.8891   LearningRate 0.0852   Epoch: 1   Global Step: 19110   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:24:55,780-Speed 3369.24 samples/sec   Loss 9.9272   LearningRate 0.0852   Epoch: 1   Global Step: 19120   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:24:58,787-Speed 3406.79 samples/sec   Loss 9.7440   LearningRate 0.0852   Epoch: 1   Global Step: 19130   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:25:01,834-Speed 3361.72 samples/sec   Loss 9.9429   LearningRate 0.0852   Epoch: 1   Global Step: 19140   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:25:04,865-Speed 3379.23 samples/sec   Loss 9.9204   LearningRate 0.0852   Epoch: 1   Global Step: 19150   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:25:07,890-Speed 3386.29 samples/sec   Loss 9.8290   LearningRate 0.0852   Epoch: 1   Global Step: 19160   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:25:10,929-Speed 3370.33 samples/sec   Loss 9.8875   LearningRate 0.0852   Epoch: 1   Global Step: 19170   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:25:14,035-Speed 3298.79 samples/sec   Loss 9.9927   LearningRate 0.0852   Epoch: 1   Global Step: 19180   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:25:17,116-Speed 3324.01 samples/sec   Loss 9.8329   LearningRate 0.0851   Epoch: 1   Global Step: 19190   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:25:20,191-Speed 3330.96 samples/sec   Loss 9.9485   LearningRate 0.0851   Epoch: 1   Global Step: 19200   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:25:23,221-Speed 3380.59 samples/sec   Loss 9.8067   LearningRate 0.0851   Epoch: 1   Global Step: 19210   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:25:26,268-Speed 3362.29 samples/sec   Loss 10.0051   LearningRate 0.0851   Epoch: 1   Global Step: 19220   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:25:29,314-Speed 3363.20 samples/sec   Loss 9.8698   LearningRate 0.0851   Epoch: 1   Global Step: 19230   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:25:32,369-Speed 3352.76 samples/sec   Loss 9.7963   LearningRate 0.0851   Epoch: 1   Global Step: 19240   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:25:35,413-Speed 3365.27 samples/sec   Loss 9.9718   LearningRate 0.0851   Epoch: 1   Global Step: 19250   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:25:38,513-Speed 3304.68 samples/sec   Loss 9.8285   LearningRate 0.0851   Epoch: 1   Global Step: 19260   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:25:41,631-Speed 3284.65 samples/sec   Loss 9.8284   LearningRate 0.0851   Epoch: 1   Global Step: 19270   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:25:44,668-Speed 3372.70 samples/sec   Loss 9.8537   LearningRate 0.0851   Epoch: 1   Global Step: 19280   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:25:47,735-Speed 3340.46 samples/sec   Loss 9.8322   LearningRate 0.0851   Epoch: 1   Global Step: 19290   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:25:50,847-Speed 3290.90 samples/sec   Loss 9.8516   LearningRate 0.0851   Epoch: 1   Global Step: 19300   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:25:53,932-Speed 3319.72 samples/sec   Loss 9.8191   LearningRate 0.0851   Epoch: 1   Global Step: 19310   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:25:56,989-Speed 3351.08 samples/sec   Loss 9.9994   LearningRate 0.0851   Epoch: 1   Global Step: 19320   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:26:00,018-Speed 3381.84 samples/sec   Loss 9.8855   LearningRate 0.0850   Epoch: 1   Global Step: 19330   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:26:03,084-Speed 3341.17 samples/sec   Loss 9.9254   LearningRate 0.0850   Epoch: 1   Global Step: 19340   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:26:06,158-Speed 3331.84 samples/sec   Loss 9.8325   LearningRate 0.0850   Epoch: 1   Global Step: 19350   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:26:09,162-Speed 3409.77 samples/sec   Loss 9.8702   LearningRate 0.0850   Epoch: 1   Global Step: 19360   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:26:12,197-Speed 3374.78 samples/sec   Loss 9.7409   LearningRate 0.0850   Epoch: 1   Global Step: 19370   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:26:15,259-Speed 3345.43 samples/sec   Loss 9.8127   LearningRate 0.0850   Epoch: 1   Global Step: 19380   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:26:18,331-Speed 3334.83 samples/sec   Loss 9.8615   LearningRate 0.0850   Epoch: 1   Global Step: 19390   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:26:21,379-Speed 3360.75 samples/sec   Loss 9.9673   LearningRate 0.0850   Epoch: 1   Global Step: 19400   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:26:24,429-Speed 3358.86 samples/sec   Loss 9.7779   LearningRate 0.0850   Epoch: 1   Global Step: 19410   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:26:27,528-Speed 3305.40 samples/sec   Loss 9.7883   LearningRate 0.0850   Epoch: 1   Global Step: 19420   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:26:30,628-Speed 3303.20 samples/sec   Loss 9.8580   LearningRate 0.0850   Epoch: 1   Global Step: 19430   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:26:33,662-Speed 3376.39 samples/sec   Loss 9.8907   LearningRate 0.0850   Epoch: 1   Global Step: 19440   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:26:36,718-Speed 3352.53 samples/sec   Loss 9.7914   LearningRate 0.0850   Epoch: 1   Global Step: 19450   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:26:39,773-Speed 3352.40 samples/sec   Loss 9.9305   LearningRate 0.0849   Epoch: 1   Global Step: 19460   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:26:42,854-Speed 3324.61 samples/sec   Loss 9.7989   LearningRate 0.0849   Epoch: 1   Global Step: 19470   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:26:45,851-Speed 3417.65 samples/sec   Loss 9.7893   LearningRate 0.0849   Epoch: 1   Global Step: 19480   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:26:48,865-Speed 3399.34 samples/sec   Loss 9.8533   LearningRate 0.0849   Epoch: 1   Global Step: 19490   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:26:51,924-Speed 3348.82 samples/sec   Loss 9.8819   LearningRate 0.0849   Epoch: 1   Global Step: 19500   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:26:54,966-Speed 3367.14 samples/sec   Loss 9.7386   LearningRate 0.0849   Epoch: 1   Global Step: 19510   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:26:57,976-Speed 3403.28 samples/sec   Loss 9.8307   LearningRate 0.0849   Epoch: 1   Global Step: 19520   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:27:01,018-Speed 3367.25 samples/sec   Loss 9.9171   LearningRate 0.0849   Epoch: 1   Global Step: 19530   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:27:04,123-Speed 3298.11 samples/sec   Loss 9.8031   LearningRate 0.0849   Epoch: 1   Global Step: 19540   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:27:07,192-Speed 3338.38 samples/sec   Loss 9.8169   LearningRate 0.0849   Epoch: 1   Global Step: 19550   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:27:10,216-Speed 3387.55 samples/sec   Loss 9.7139   LearningRate 0.0849   Epoch: 1   Global Step: 19560   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:27:13,258-Speed 3367.11 samples/sec   Loss 9.8778   LearningRate 0.0849   Epoch: 1   Global Step: 19570   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:27:16,326-Speed 3339.02 samples/sec   Loss 9.7767   LearningRate 0.0849   Epoch: 1   Global Step: 19580   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:27:19,386-Speed 3347.28 samples/sec   Loss 9.8335   LearningRate 0.0849   Epoch: 1   Global Step: 19590   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:27:22,418-Speed 3377.91 samples/sec   Loss 9.7364   LearningRate 0.0848   Epoch: 1   Global Step: 19600   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:27:25,497-Speed 3327.47 samples/sec   Loss 9.7512   LearningRate 0.0848   Epoch: 1   Global Step: 19610   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:27:28,529-Speed 3377.82 samples/sec   Loss 9.8222   LearningRate 0.0848   Epoch: 1   Global Step: 19620   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:27:31,580-Speed 3357.58 samples/sec   Loss 9.8132   LearningRate 0.0848   Epoch: 1   Global Step: 19630   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:27:34,598-Speed 3393.92 samples/sec   Loss 9.8801   LearningRate 0.0848   Epoch: 1   Global Step: 19640   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:27:37,621-Speed 3388.43 samples/sec   Loss 9.8943   LearningRate 0.0848   Epoch: 1   Global Step: 19650   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:27:40,670-Speed 3358.75 samples/sec   Loss 9.8779   LearningRate 0.0848   Epoch: 1   Global Step: 19660   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:27:43,697-Speed 3384.45 samples/sec   Loss 9.8127   LearningRate 0.0848   Epoch: 1   Global Step: 19670   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:27:46,745-Speed 3360.36 samples/sec   Loss 9.9897   LearningRate 0.0848   Epoch: 1   Global Step: 19680   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:27:49,773-Speed 3383.53 samples/sec   Loss 9.7835   LearningRate 0.0848   Epoch: 1   Global Step: 19690   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:27:52,807-Speed 3375.93 samples/sec   Loss 9.7623   LearningRate 0.0848   Epoch: 1   Global Step: 19700   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:27:55,858-Speed 3358.11 samples/sec   Loss 9.7805   LearningRate 0.0848   Epoch: 1   Global Step: 19710   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:27:58,873-Speed 3396.43 samples/sec   Loss 9.9253   LearningRate 0.0848   Epoch: 1   Global Step: 19720   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:28:01,891-Speed 3394.02 samples/sec   Loss 9.9009   LearningRate 0.0847   Epoch: 1   Global Step: 19730   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:28:04,919-Speed 3382.94 samples/sec   Loss 9.7844   LearningRate 0.0847   Epoch: 1   Global Step: 19740   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:28:07,927-Speed 3405.56 samples/sec   Loss 9.8012   LearningRate 0.0847   Epoch: 1   Global Step: 19750   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:28:10,966-Speed 3370.60 samples/sec   Loss 9.8882   LearningRate 0.0847   Epoch: 1   Global Step: 19760   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:28:13,990-Speed 3386.86 samples/sec   Loss 9.7378   LearningRate 0.0847   Epoch: 1   Global Step: 19770   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:28:17,022-Speed 3379.17 samples/sec   Loss 9.7228   LearningRate 0.0847   Epoch: 1   Global Step: 19780   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:28:20,046-Speed 3387.51 samples/sec   Loss 9.8186   LearningRate 0.0847   Epoch: 1   Global Step: 19790   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:28:23,121-Speed 3330.89 samples/sec   Loss 9.8252   LearningRate 0.0847   Epoch: 1   Global Step: 19800   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:28:26,147-Speed 3385.43 samples/sec   Loss 9.7737   LearningRate 0.0847   Epoch: 1   Global Step: 19810   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:28:29,228-Speed 3324.26 samples/sec   Loss 9.8384   LearningRate 0.0847   Epoch: 1   Global Step: 19820   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:28:32,253-Speed 3386.13 samples/sec   Loss 10.0563   LearningRate 0.0847   Epoch: 1   Global Step: 19830   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:28:35,296-Speed 3366.26 samples/sec   Loss 9.7242   LearningRate 0.0847   Epoch: 1   Global Step: 19840   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:28:38,353-Speed 3350.56 samples/sec   Loss 9.8071   LearningRate 0.0847   Epoch: 1   Global Step: 19850   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:28:41,392-Speed 3371.14 samples/sec   Loss 9.9053   LearningRate 0.0847   Epoch: 1   Global Step: 19860   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:28:44,433-Speed 3367.99 samples/sec   Loss 9.8142   LearningRate 0.0846   Epoch: 1   Global Step: 19870   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:28:47,513-Speed 3326.05 samples/sec   Loss 9.7451   LearningRate 0.0846   Epoch: 1   Global Step: 19880   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:28:50,582-Speed 3337.54 samples/sec   Loss 9.7990   LearningRate 0.0846   Epoch: 1   Global Step: 19890   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:28:53,681-Speed 3305.47 samples/sec   Loss 9.4872   LearningRate 0.0846   Epoch: 1   Global Step: 19900   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:28:56,719-Speed 3371.53 samples/sec   Loss 9.9015   LearningRate 0.0846   Epoch: 1   Global Step: 19910   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:28:59,796-Speed 3328.38 samples/sec   Loss 9.8751   LearningRate 0.0846   Epoch: 1   Global Step: 19920   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:29:02,814-Speed 3394.33 samples/sec   Loss 9.8005   LearningRate 0.0846   Epoch: 1   Global Step: 19930   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:29:05,854-Speed 3369.65 samples/sec   Loss 9.7338   LearningRate 0.0846   Epoch: 1   Global Step: 19940   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:29:08,875-Speed 3390.34 samples/sec   Loss 9.7452   LearningRate 0.0846   Epoch: 1   Global Step: 19950   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:29:11,919-Speed 3365.08 samples/sec   Loss 9.6677   LearningRate 0.0846   Epoch: 1   Global Step: 19960   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:29:14,963-Speed 3365.70 samples/sec   Loss 9.7825   LearningRate 0.0846   Epoch: 1   Global Step: 19970   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:29:17,991-Speed 3382.71 samples/sec   Loss 9.6926   LearningRate 0.0846   Epoch: 1   Global Step: 19980   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:29:20,997-Speed 3407.67 samples/sec   Loss 9.7032   LearningRate 0.0846   Epoch: 1   Global Step: 19990   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:29:24,045-Speed 3360.74 samples/sec   Loss 9.7517   LearningRate 0.0845   Epoch: 1   Global Step: 20000   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:29:27,087-Speed 3367.53 samples/sec   Loss 9.8456   LearningRate 0.0845   Epoch: 1   Global Step: 20010   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:29:30,152-Speed 3342.22 samples/sec   Loss 9.8367   LearningRate 0.0845   Epoch: 1   Global Step: 20020   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:29:33,208-Speed 3351.09 samples/sec   Loss 9.7571   LearningRate 0.0845   Epoch: 1   Global Step: 20030   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:29:36,253-Speed 3363.92 samples/sec   Loss 9.7880   LearningRate 0.0845   Epoch: 1   Global Step: 20040   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:29:39,267-Speed 3398.62 samples/sec   Loss 9.6983   LearningRate 0.0845   Epoch: 1   Global Step: 20050   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:29:42,285-Speed 3394.69 samples/sec   Loss 9.7116   LearningRate 0.0845   Epoch: 1   Global Step: 20060   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:29:45,297-Speed 3400.72 samples/sec   Loss 9.9285   LearningRate 0.0845   Epoch: 1   Global Step: 20070   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-04-27 03:29:48,302-Speed 3408.65 samples/sec   Loss 9.7954   LearningRate 0.0845   Epoch: 1   Global Step: 20080   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:29:51,319-Speed 3395.44 samples/sec   Loss 9.7055   LearningRate 0.0845   Epoch: 1   Global Step: 20090   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:29:54,348-Speed 3381.73 samples/sec   Loss 9.8115   LearningRate 0.0845   Epoch: 1   Global Step: 20100   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-04-27 03:29:57,398-Speed 3358.83 samples/sec   Loss 9.7083   LearningRate 0.0845   Epoch: 1   Global Step: 20110   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:30:00,459-Speed 3345.50 samples/sec   Loss 9.8866   LearningRate 0.0845   Epoch: 1   Global Step: 20120   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:30:03,515-Speed 3352.14 samples/sec   Loss 9.7676   LearningRate 0.0845   Epoch: 1   Global Step: 20130   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:30:06,659-Speed 3258.37 samples/sec   Loss 9.7614   LearningRate 0.0844   Epoch: 1   Global Step: 20140   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:30:09,682-Speed 3387.59 samples/sec   Loss 9.6366   LearningRate 0.0844   Epoch: 1   Global Step: 20150   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:30:12,713-Speed 3380.13 samples/sec   Loss 9.7687   LearningRate 0.0844   Epoch: 1   Global Step: 20160   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:30:15,728-Speed 3396.55 samples/sec   Loss 9.7319   LearningRate 0.0844   Epoch: 1   Global Step: 20170   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:30:18,792-Speed 3343.37 samples/sec   Loss 9.7148   LearningRate 0.0844   Epoch: 1   Global Step: 20180   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:30:21,821-Speed 3381.95 samples/sec   Loss 9.7883   LearningRate 0.0844   Epoch: 1   Global Step: 20190   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:30:24,857-Speed 3373.95 samples/sec   Loss 9.6684   LearningRate 0.0844   Epoch: 1   Global Step: 20200   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:30:27,893-Speed 3373.95 samples/sec   Loss 9.8348   LearningRate 0.0844   Epoch: 1   Global Step: 20210   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:30:30,918-Speed 3385.48 samples/sec   Loss 9.7084   LearningRate 0.0844   Epoch: 1   Global Step: 20220   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:30:33,930-Speed 3401.32 samples/sec   Loss 9.8988   LearningRate 0.0844   Epoch: 1   Global Step: 20230   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:30:37,037-Speed 3296.33 samples/sec   Loss 9.8081   LearningRate 0.0844   Epoch: 1   Global Step: 20240   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:30:40,092-Speed 3352.75 samples/sec   Loss 9.7662   LearningRate 0.0844   Epoch: 1   Global Step: 20250   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:30:43,153-Speed 3346.21 samples/sec   Loss 9.6791   LearningRate 0.0844   Epoch: 1   Global Step: 20260   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:30:46,194-Speed 3369.34 samples/sec   Loss 9.7482   LearningRate 0.0843   Epoch: 1   Global Step: 20270   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:30:49,206-Speed 3400.58 samples/sec   Loss 9.6429   LearningRate 0.0843   Epoch: 1   Global Step: 20280   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:30:52,246-Speed 3368.89 samples/sec   Loss 9.6434   LearningRate 0.0843   Epoch: 1   Global Step: 20290   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:30:55,310-Speed 3343.87 samples/sec   Loss 9.7203   LearningRate 0.0843   Epoch: 1   Global Step: 20300   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:30:58,316-Speed 3407.57 samples/sec   Loss 9.7720   LearningRate 0.0843   Epoch: 1   Global Step: 20310   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:31:01,396-Speed 3325.47 samples/sec   Loss 9.7293   LearningRate 0.0843   Epoch: 1   Global Step: 20320   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:31:04,431-Speed 3375.04 samples/sec   Loss 9.7501   LearningRate 0.0843   Epoch: 1   Global Step: 20330   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:31:07,472-Speed 3369.32 samples/sec   Loss 9.7925   LearningRate 0.0843   Epoch: 1   Global Step: 20340   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:31:10,492-Speed 3391.90 samples/sec   Loss 9.7635   LearningRate 0.0843   Epoch: 1   Global Step: 20350   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:31:13,553-Speed 3345.95 samples/sec   Loss 9.6419   LearningRate 0.0843   Epoch: 1   Global Step: 20360   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:31:16,626-Speed 3334.02 samples/sec   Loss 9.7165   LearningRate 0.0843   Epoch: 1   Global Step: 20370   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:31:19,633-Speed 3405.34 samples/sec   Loss 9.7335   LearningRate 0.0843   Epoch: 1   Global Step: 20380   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:31:22,673-Speed 3369.86 samples/sec   Loss 9.6554   LearningRate 0.0843   Epoch: 1   Global Step: 20390   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:31:25,783-Speed 3293.63 samples/sec   Loss 9.7861   LearningRate 0.0843   Epoch: 1   Global Step: 20400   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:31:28,844-Speed 3347.06 samples/sec   Loss 9.7667   LearningRate 0.0842   Epoch: 1   Global Step: 20410   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:31:31,870-Speed 3384.01 samples/sec   Loss 9.9012   LearningRate 0.0842   Epoch: 1   Global Step: 20420   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:31:34,899-Speed 3382.98 samples/sec   Loss 9.6543   LearningRate 0.0842   Epoch: 1   Global Step: 20430   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:31:37,951-Speed 3355.18 samples/sec   Loss 9.7171   LearningRate 0.0842   Epoch: 1   Global Step: 20440   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:31:41,792-Speed 2666.45 samples/sec   Loss 9.7737   LearningRate 0.0842   Epoch: 1   Global Step: 20450   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:31:44,821-Speed 3382.72 samples/sec   Loss 9.8036   LearningRate 0.0842   Epoch: 1   Global Step: 20460   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:31:47,833-Speed 3401.10 samples/sec   Loss 9.6985   LearningRate 0.0842   Epoch: 1   Global Step: 20470   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 03:31:50,855-Speed 3389.07 samples/sec   Loss 9.6096   LearningRate 0.0842   Epoch: 1   Global Step: 20480   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:31:53,911-Speed 3351.77 samples/sec   Loss 9.7947   LearningRate 0.0842   Epoch: 1   Global Step: 20490   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:31:56,926-Speed 3396.87 samples/sec   Loss 9.7214   LearningRate 0.0842   Epoch: 1   Global Step: 20500   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:31:59,984-Speed 3350.22 samples/sec   Loss 9.7352   LearningRate 0.0842   Epoch: 1   Global Step: 20510   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:32:03,040-Speed 3351.18 samples/sec   Loss 9.8811   LearningRate 0.0842   Epoch: 1   Global Step: 20520   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:32:06,111-Speed 3336.14 samples/sec   Loss 9.6212   LearningRate 0.0842   Epoch: 1   Global Step: 20530   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:32:09,138-Speed 3383.43 samples/sec   Loss 9.7484   LearningRate 0.0841   Epoch: 1   Global Step: 20540   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:32:12,174-Speed 3373.79 samples/sec   Loss 9.6378   LearningRate 0.0841   Epoch: 1   Global Step: 20550   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:32:15,214-Speed 3369.95 samples/sec   Loss 9.6903   LearningRate 0.0841   Epoch: 1   Global Step: 20560   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:32:18,245-Speed 3378.90 samples/sec   Loss 9.6341   LearningRate 0.0841   Epoch: 1   Global Step: 20570   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:32:21,294-Speed 3360.22 samples/sec   Loss 9.7708   LearningRate 0.0841   Epoch: 1   Global Step: 20580   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 03:32:24,323-Speed 3381.82 samples/sec   Loss 9.6693   LearningRate 0.0841   Epoch: 1   Global Step: 20590   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:32:27,392-Speed 3336.91 samples/sec   Loss 9.7790   LearningRate 0.0841   Epoch: 1   Global Step: 20600   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:32:30,423-Speed 3380.03 samples/sec   Loss 9.6942   LearningRate 0.0841   Epoch: 1   Global Step: 20610   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:32:33,452-Speed 3381.91 samples/sec   Loss 9.6339   LearningRate 0.0841   Epoch: 1   Global Step: 20620   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:32:36,534-Speed 3323.76 samples/sec   Loss 9.7631   LearningRate 0.0841   Epoch: 1   Global Step: 20630   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:32:39,571-Speed 3372.55 samples/sec   Loss 9.7252   LearningRate 0.0841   Epoch: 1   Global Step: 20640   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:32:42,642-Speed 3335.00 samples/sec   Loss 9.5847   LearningRate 0.0841   Epoch: 1   Global Step: 20650   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:32:45,689-Speed 3362.06 samples/sec   Loss 9.6997   LearningRate 0.0841   Epoch: 1   Global Step: 20660   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:32:48,745-Speed 3352.01 samples/sec   Loss 9.5930   LearningRate 0.0841   Epoch: 1   Global Step: 20670   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:32:51,763-Speed 3393.65 samples/sec   Loss 9.8413   LearningRate 0.0840   Epoch: 1   Global Step: 20680   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:32:54,840-Speed 3329.54 samples/sec   Loss 9.5015   LearningRate 0.0840   Epoch: 1   Global Step: 20690   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 03:32:57,872-Speed 3378.91 samples/sec   Loss 9.7550   LearningRate 0.0840   Epoch: 1   Global Step: 20700   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 03:33:00,919-Speed 3361.35 samples/sec   Loss 9.6440   LearningRate 0.0840   Epoch: 1   Global Step: 20710   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:33:03,958-Speed 3371.02 samples/sec   Loss 9.7190   LearningRate 0.0840   Epoch: 1   Global Step: 20720   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:33:06,996-Speed 3371.86 samples/sec   Loss 9.5887   LearningRate 0.0840   Epoch: 1   Global Step: 20730   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:33:10,031-Speed 3374.74 samples/sec   Loss 9.8286   LearningRate 0.0840   Epoch: 1   Global Step: 20740   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:33:13,073-Speed 3366.62 samples/sec   Loss 9.8405   LearningRate 0.0840   Epoch: 1   Global Step: 20750   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:33:16,166-Speed 3312.26 samples/sec   Loss 9.7278   LearningRate 0.0840   Epoch: 1   Global Step: 20760   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:33:21,771-Speed 1827.36 samples/sec   Loss 9.6914   LearningRate 0.0840   Epoch: 1   Global Step: 20770   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:33:24,795-Speed 3386.81 samples/sec   Loss 9.7600   LearningRate 0.0840   Epoch: 1   Global Step: 20780   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:33:27,820-Speed 3386.17 samples/sec   Loss 9.8534   LearningRate 0.0840   Epoch: 1   Global Step: 20790   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:33:30,872-Speed 3356.07 samples/sec   Loss 9.5342   LearningRate 0.0840   Epoch: 1   Global Step: 20800   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:33:33,905-Speed 3377.48 samples/sec   Loss 9.7635   LearningRate 0.0839   Epoch: 1   Global Step: 20810   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:33:36,975-Speed 3336.60 samples/sec   Loss 9.6927   LearningRate 0.0839   Epoch: 1   Global Step: 20820   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:33:40,022-Speed 3361.31 samples/sec   Loss 9.7463   LearningRate 0.0839   Epoch: 1   Global Step: 20830   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:33:43,073-Speed 3357.87 samples/sec   Loss 9.5037   LearningRate 0.0839   Epoch: 1   Global Step: 20840   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:33:46,108-Speed 3374.92 samples/sec   Loss 9.6863   LearningRate 0.0839   Epoch: 1   Global Step: 20850   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:33:49,141-Speed 3377.41 samples/sec   Loss 9.6127   LearningRate 0.0839   Epoch: 1   Global Step: 20860   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:33:52,211-Speed 3335.85 samples/sec   Loss 9.6980   LearningRate 0.0839   Epoch: 1   Global Step: 20870   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:33:55,321-Speed 3294.35 samples/sec   Loss 9.5664   LearningRate 0.0839   Epoch: 1   Global Step: 20880   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:33:58,354-Speed 3377.26 samples/sec   Loss 9.5572   LearningRate 0.0839   Epoch: 1   Global Step: 20890   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:34:01,412-Speed 3349.70 samples/sec   Loss 9.5651   LearningRate 0.0839   Epoch: 1   Global Step: 20900   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:34:04,489-Speed 3329.05 samples/sec   Loss 9.7397   LearningRate 0.0839   Epoch: 1   Global Step: 20910   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:34:07,563-Speed 3331.62 samples/sec   Loss 9.7247   LearningRate 0.0839   Epoch: 1   Global Step: 20920   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:34:10,642-Speed 3327.01 samples/sec   Loss 9.7357   LearningRate 0.0839   Epoch: 1   Global Step: 20930   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:34:13,691-Speed 3359.64 samples/sec   Loss 9.7500   LearningRate 0.0839   Epoch: 1   Global Step: 20940   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:34:16,733-Speed 3368.12 samples/sec   Loss 9.6818   LearningRate 0.0838   Epoch: 1   Global Step: 20950   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:34:19,797-Speed 3342.91 samples/sec   Loss 9.7207   LearningRate 0.0838   Epoch: 1   Global Step: 20960   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:34:22,834-Speed 3372.95 samples/sec   Loss 9.6895   LearningRate 0.0838   Epoch: 1   Global Step: 20970   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:34:25,928-Speed 3311.00 samples/sec   Loss 9.7228   LearningRate 0.0838   Epoch: 1   Global Step: 20980   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:34:28,982-Speed 3353.36 samples/sec   Loss 9.5730   LearningRate 0.0838   Epoch: 1   Global Step: 20990   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:34:32,035-Speed 3355.17 samples/sec   Loss 9.6888   LearningRate 0.0838   Epoch: 1   Global Step: 21000   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:34:35,090-Speed 3353.69 samples/sec   Loss 9.7709   LearningRate 0.0838   Epoch: 1   Global Step: 21010   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:34:38,163-Speed 3333.40 samples/sec   Loss 9.6421   LearningRate 0.0838   Epoch: 1   Global Step: 21020   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:34:41,221-Speed 3349.93 samples/sec   Loss 9.5346   LearningRate 0.0838   Epoch: 1   Global Step: 21030   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:34:44,247-Speed 3384.08 samples/sec   Loss 9.5561   LearningRate 0.0838   Epoch: 1   Global Step: 21040   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:34:47,289-Speed 3367.92 samples/sec   Loss 9.6852   LearningRate 0.0838   Epoch: 1   Global Step: 21050   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:34:50,388-Speed 3305.67 samples/sec   Loss 9.6878   LearningRate 0.0838   Epoch: 1   Global Step: 21060   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:34:53,444-Speed 3351.60 samples/sec   Loss 9.6879   LearningRate 0.0838   Epoch: 1   Global Step: 21070   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:34:56,496-Speed 3355.99 samples/sec   Loss 9.8461   LearningRate 0.0837   Epoch: 1   Global Step: 21080   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:34:59,562-Speed 3341.34 samples/sec   Loss 9.6009   LearningRate 0.0837   Epoch: 1   Global Step: 21090   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:35:02,618-Speed 3351.51 samples/sec   Loss 9.5545   LearningRate 0.0837   Epoch: 1   Global Step: 21100   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:35:05,684-Speed 3341.03 samples/sec   Loss 9.6187   LearningRate 0.0837   Epoch: 1   Global Step: 21110   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:35:08,735-Speed 3357.92 samples/sec   Loss 9.5771   LearningRate 0.0837   Epoch: 1   Global Step: 21120   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:35:11,751-Speed 3396.24 samples/sec   Loss 9.6018   LearningRate 0.0837   Epoch: 1   Global Step: 21130   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:35:14,820-Speed 3337.63 samples/sec   Loss 9.6188   LearningRate 0.0837   Epoch: 1   Global Step: 21140   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:35:17,881-Speed 3346.20 samples/sec   Loss 9.6791   LearningRate 0.0837   Epoch: 1   Global Step: 21150   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:35:20,898-Speed 3394.93 samples/sec   Loss 9.6446   LearningRate 0.0837   Epoch: 1   Global Step: 21160   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:35:23,942-Speed 3364.79 samples/sec   Loss 9.6708   LearningRate 0.0837   Epoch: 1   Global Step: 21170   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:35:26,951-Speed 3404.62 samples/sec   Loss 9.5929   LearningRate 0.0837   Epoch: 1   Global Step: 21180   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:35:29,998-Speed 3362.11 samples/sec   Loss 9.6711   LearningRate 0.0837   Epoch: 1   Global Step: 21190   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:35:33,002-Speed 3409.49 samples/sec   Loss 9.5532   LearningRate 0.0837   Epoch: 1   Global Step: 21200   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:35:36,078-Speed 3331.10 samples/sec   Loss 9.6638   LearningRate 0.0837   Epoch: 1   Global Step: 21210   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:35:39,115-Speed 3372.60 samples/sec   Loss 9.5368   LearningRate 0.0836   Epoch: 1   Global Step: 21220   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:35:42,170-Speed 3352.72 samples/sec   Loss 9.5374   LearningRate 0.0836   Epoch: 1   Global Step: 21230   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:35:45,181-Speed 3402.58 samples/sec   Loss 9.6566   LearningRate 0.0836   Epoch: 1   Global Step: 21240   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:35:48,181-Speed 3413.51 samples/sec   Loss 9.5991   LearningRate 0.0836   Epoch: 1   Global Step: 21250   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:35:51,232-Speed 3357.38 samples/sec   Loss 9.5736   LearningRate 0.0836   Epoch: 1   Global Step: 21260   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:35:54,287-Speed 3353.52 samples/sec   Loss 9.5370   LearningRate 0.0836   Epoch: 1   Global Step: 21270   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:35:57,313-Speed 3385.20 samples/sec   Loss 9.6887   LearningRate 0.0836   Epoch: 1   Global Step: 21280   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:36:00,331-Speed 3393.99 samples/sec   Loss 9.4805   LearningRate 0.0836   Epoch: 1   Global Step: 21290   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:36:03,339-Speed 3404.96 samples/sec   Loss 9.4110   LearningRate 0.0836   Epoch: 1   Global Step: 21300   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:36:06,391-Speed 3356.20 samples/sec   Loss 9.6910   LearningRate 0.0836   Epoch: 1   Global Step: 21310   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:36:09,399-Speed 3405.21 samples/sec   Loss 9.6755   LearningRate 0.0836   Epoch: 1   Global Step: 21320   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:36:12,424-Speed 3386.13 samples/sec   Loss 9.6317   LearningRate 0.0836   Epoch: 1   Global Step: 21330   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:36:15,465-Speed 3369.59 samples/sec   Loss 9.5017   LearningRate 0.0836   Epoch: 1   Global Step: 21340   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:36:18,478-Speed 3399.40 samples/sec   Loss 9.6552   LearningRate 0.0835   Epoch: 1   Global Step: 21350   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:36:21,493-Speed 3397.71 samples/sec   Loss 9.5477   LearningRate 0.0835   Epoch: 1   Global Step: 21360   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:36:24,546-Speed 3355.09 samples/sec   Loss 9.5873   LearningRate 0.0835   Epoch: 1   Global Step: 21370   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:36:27,586-Speed 3369.29 samples/sec   Loss 9.5180   LearningRate 0.0835   Epoch: 1   Global Step: 21380   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:36:30,584-Speed 3416.69 samples/sec   Loss 9.5766   LearningRate 0.0835   Epoch: 1   Global Step: 21390   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:36:33,601-Speed 3395.02 samples/sec   Loss 9.5792   LearningRate 0.0835   Epoch: 1   Global Step: 21400   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:36:36,648-Speed 3361.53 samples/sec   Loss 9.4246   LearningRate 0.0835   Epoch: 1   Global Step: 21410   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:36:39,671-Speed 3388.68 samples/sec   Loss 9.6917   LearningRate 0.0835   Epoch: 1   Global Step: 21420   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:36:42,721-Speed 3359.11 samples/sec   Loss 9.6729   LearningRate 0.0835   Epoch: 1   Global Step: 21430   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:36:45,719-Speed 3416.88 samples/sec   Loss 9.7871   LearningRate 0.0835   Epoch: 1   Global Step: 21440   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:36:48,739-Speed 3391.31 samples/sec   Loss 9.5441   LearningRate 0.0835   Epoch: 1   Global Step: 21450   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:36:51,847-Speed 3295.92 samples/sec   Loss 9.6728   LearningRate 0.0835   Epoch: 1   Global Step: 21460   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:36:54,940-Speed 3312.07 samples/sec   Loss 9.5539   LearningRate 0.0835   Epoch: 1   Global Step: 21470   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:36:57,987-Speed 3362.24 samples/sec   Loss 9.5947   LearningRate 0.0835   Epoch: 1   Global Step: 21480   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:37:00,991-Speed 3409.81 samples/sec   Loss 9.5846   LearningRate 0.0834   Epoch: 1   Global Step: 21490   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:37:04,038-Speed 3362.17 samples/sec   Loss 9.6718   LearningRate 0.0834   Epoch: 1   Global Step: 21500   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:37:07,095-Speed 3350.53 samples/sec   Loss 9.6305   LearningRate 0.0834   Epoch: 1   Global Step: 21510   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:37:10,106-Speed 3401.51 samples/sec   Loss 9.6631   LearningRate 0.0834   Epoch: 1   Global Step: 21520   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:37:13,208-Speed 3302.22 samples/sec   Loss 9.6698   LearningRate 0.0834   Epoch: 1   Global Step: 21530   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:37:16,230-Speed 3390.49 samples/sec   Loss 9.5163   LearningRate 0.0834   Epoch: 1   Global Step: 21540   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:37:19,272-Speed 3366.64 samples/sec   Loss 9.7164   LearningRate 0.0834   Epoch: 1   Global Step: 21550   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:37:22,291-Speed 3392.86 samples/sec   Loss 9.6187   LearningRate 0.0834   Epoch: 1   Global Step: 21560   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:37:25,300-Speed 3404.87 samples/sec   Loss 9.5589   LearningRate 0.0834   Epoch: 1   Global Step: 21570   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:37:28,343-Speed 3366.36 samples/sec   Loss 9.7111   LearningRate 0.0834   Epoch: 1   Global Step: 21580   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:37:31,400-Speed 3350.77 samples/sec   Loss 9.6561   LearningRate 0.0834   Epoch: 1   Global Step: 21590   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:37:34,452-Speed 3355.85 samples/sec   Loss 9.5591   LearningRate 0.0834   Epoch: 1   Global Step: 21600   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:37:37,550-Speed 3306.72 samples/sec   Loss 9.5895   LearningRate 0.0834   Epoch: 1   Global Step: 21610   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:37:40,675-Speed 3277.99 samples/sec   Loss 9.5428   LearningRate 0.0834   Epoch: 1   Global Step: 21620   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:37:43,722-Speed 3361.80 samples/sec   Loss 9.5535   LearningRate 0.0833   Epoch: 1   Global Step: 21630   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:37:46,801-Speed 3326.18 samples/sec   Loss 9.4439   LearningRate 0.0833   Epoch: 1   Global Step: 21640   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:37:49,837-Speed 3374.15 samples/sec   Loss 9.6198   LearningRate 0.0833   Epoch: 1   Global Step: 21650   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:37:52,894-Speed 3350.99 samples/sec   Loss 9.6228   LearningRate 0.0833   Epoch: 1   Global Step: 21660   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:37:55,977-Speed 3322.38 samples/sec   Loss 9.5478   LearningRate 0.0833   Epoch: 1   Global Step: 21670   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:37:59,026-Speed 3359.86 samples/sec   Loss 9.4861   LearningRate 0.0833   Epoch: 1   Global Step: 21680   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:38:02,087-Speed 3346.56 samples/sec   Loss 9.5026   LearningRate 0.0833   Epoch: 1   Global Step: 21690   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:38:05,111-Speed 3387.24 samples/sec   Loss 9.5179   LearningRate 0.0833   Epoch: 1   Global Step: 21700   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:38:08,122-Speed 3401.50 samples/sec   Loss 9.6576   LearningRate 0.0833   Epoch: 1   Global Step: 21710   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:38:11,119-Speed 3418.06 samples/sec   Loss 9.5453   LearningRate 0.0833   Epoch: 1   Global Step: 21720   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:38:14,206-Speed 3318.37 samples/sec   Loss 9.4793   LearningRate 0.0833   Epoch: 1   Global Step: 21730   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:38:17,249-Speed 3365.36 samples/sec   Loss 9.5895   LearningRate 0.0833   Epoch: 1   Global Step: 21740   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:38:20,227-Speed 3439.68 samples/sec   Loss 9.5949   LearningRate 0.0833   Epoch: 1   Global Step: 21750   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:38:23,235-Speed 3405.37 samples/sec   Loss 9.5571   LearningRate 0.0832   Epoch: 1   Global Step: 21760   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:38:26,239-Speed 3409.77 samples/sec   Loss 9.5050   LearningRate 0.0832   Epoch: 1   Global Step: 21770   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:38:29,240-Speed 3413.27 samples/sec   Loss 9.5307   LearningRate 0.0832   Epoch: 1   Global Step: 21780   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:38:32,266-Speed 3384.97 samples/sec   Loss 9.4711   LearningRate 0.0832   Epoch: 1   Global Step: 21790   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:38:35,340-Speed 3331.74 samples/sec   Loss 9.4617   LearningRate 0.0832   Epoch: 1   Global Step: 21800   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:38:38,421-Speed 3324.65 samples/sec   Loss 9.4684   LearningRate 0.0832   Epoch: 1   Global Step: 21810   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:38:41,481-Speed 3347.87 samples/sec   Loss 9.5667   LearningRate 0.0832   Epoch: 1   Global Step: 21820   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:38:44,509-Speed 3382.65 samples/sec   Loss 9.4222   LearningRate 0.0832   Epoch: 1   Global Step: 21830   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:38:47,567-Speed 3350.22 samples/sec   Loss 9.6718   LearningRate 0.0832   Epoch: 1   Global Step: 21840   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:38:50,563-Speed 3418.87 samples/sec   Loss 9.4784   LearningRate 0.0832   Epoch: 1   Global Step: 21850   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:38:53,572-Speed 3404.38 samples/sec   Loss 9.3825   LearningRate 0.0832   Epoch: 1   Global Step: 21860   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:38:56,588-Speed 3396.18 samples/sec   Loss 9.5777   LearningRate 0.0832   Epoch: 1   Global Step: 21870   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:38:59,618-Speed 3380.44 samples/sec   Loss 9.4324   LearningRate 0.0832   Epoch: 1   Global Step: 21880   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:39:02,675-Speed 3350.95 samples/sec   Loss 9.5886   LearningRate 0.0832   Epoch: 1   Global Step: 21890   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:39:05,726-Speed 3358.27 samples/sec   Loss 9.3664   LearningRate 0.0831   Epoch: 1   Global Step: 21900   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:39:08,754-Speed 3381.71 samples/sec   Loss 9.5344   LearningRate 0.0831   Epoch: 1   Global Step: 21910   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:39:11,830-Speed 3330.23 samples/sec   Loss 9.5501   LearningRate 0.0831   Epoch: 1   Global Step: 21920   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:39:14,857-Speed 3384.88 samples/sec   Loss 9.5495   LearningRate 0.0831   Epoch: 1   Global Step: 21930   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:39:17,869-Speed 3400.18 samples/sec   Loss 9.6850   LearningRate 0.0831   Epoch: 1   Global Step: 21940   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:39:20,917-Speed 3360.96 samples/sec   Loss 9.6271   LearningRate 0.0831   Epoch: 1   Global Step: 21950   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:39:23,992-Speed 3331.56 samples/sec   Loss 9.3466   LearningRate 0.0831   Epoch: 1   Global Step: 21960   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:39:27,089-Speed 3306.89 samples/sec   Loss 9.5797   LearningRate 0.0831   Epoch: 1   Global Step: 21970   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:39:30,187-Speed 3307.49 samples/sec   Loss 9.3204   LearningRate 0.0831   Epoch: 1   Global Step: 21980   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:39:33,239-Speed 3355.71 samples/sec   Loss 9.5271   LearningRate 0.0831   Epoch: 1   Global Step: 21990   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:39:36,269-Speed 3380.76 samples/sec   Loss 9.5265   LearningRate 0.0831   Epoch: 1   Global Step: 22000   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:39:39,341-Speed 3335.19 samples/sec   Loss 9.4576   LearningRate 0.0831   Epoch: 1   Global Step: 22010   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:39:42,350-Speed 3403.73 samples/sec   Loss 9.5201   LearningRate 0.0831   Epoch: 1   Global Step: 22020   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:39:45,380-Speed 3381.11 samples/sec   Loss 9.4934   LearningRate 0.0831   Epoch: 1   Global Step: 22030   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:39:48,455-Speed 3331.35 samples/sec   Loss 9.4905   LearningRate 0.0830   Epoch: 1   Global Step: 22040   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:39:51,505-Speed 3358.78 samples/sec   Loss 9.4880   LearningRate 0.0830   Epoch: 1   Global Step: 22050   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:39:54,551-Speed 3362.21 samples/sec   Loss 9.5122   LearningRate 0.0830   Epoch: 1   Global Step: 22060   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:39:57,580-Speed 3382.81 samples/sec   Loss 9.5267   LearningRate 0.0830   Epoch: 1   Global Step: 22070   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:40:00,677-Speed 3307.26 samples/sec   Loss 9.5646   LearningRate 0.0830   Epoch: 1   Global Step: 22080   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:40:03,722-Speed 3363.46 samples/sec   Loss 9.4707   LearningRate 0.0830   Epoch: 1   Global Step: 22090   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:40:06,747-Speed 3386.52 samples/sec   Loss 9.5233   LearningRate 0.0830   Epoch: 1   Global Step: 22100   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:40:09,783-Speed 3374.57 samples/sec   Loss 9.5019   LearningRate 0.0830   Epoch: 1   Global Step: 22110   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:40:12,795-Speed 3399.94 samples/sec   Loss 9.4528   LearningRate 0.0830   Epoch: 1   Global Step: 22120   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:40:15,830-Speed 3375.04 samples/sec   Loss 9.4026   LearningRate 0.0830   Epoch: 1   Global Step: 22130   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:40:18,929-Speed 3305.90 samples/sec   Loss 9.4669   LearningRate 0.0830   Epoch: 1   Global Step: 22140   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:40:21,958-Speed 3381.46 samples/sec   Loss 9.4877   LearningRate 0.0830   Epoch: 1   Global Step: 22150   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:40:24,981-Speed 3388.06 samples/sec   Loss 9.4277   LearningRate 0.0830   Epoch: 1   Global Step: 22160   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:40:28,020-Speed 3371.25 samples/sec   Loss 9.4982   LearningRate 0.0829   Epoch: 1   Global Step: 22170   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:40:31,044-Speed 3386.69 samples/sec   Loss 9.5622   LearningRate 0.0829   Epoch: 1   Global Step: 22180   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:40:34,071-Speed 3384.66 samples/sec   Loss 9.5260   LearningRate 0.0829   Epoch: 1   Global Step: 22190   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:40:37,108-Speed 3372.18 samples/sec   Loss 9.4665   LearningRate 0.0829   Epoch: 1   Global Step: 22200   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:40:40,110-Speed 3412.91 samples/sec   Loss 9.5136   LearningRate 0.0829   Epoch: 1   Global Step: 22210   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:40:43,148-Speed 3371.18 samples/sec   Loss 9.5427   LearningRate 0.0829   Epoch: 1   Global Step: 22220   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:40:46,198-Speed 3359.12 samples/sec   Loss 9.3395   LearningRate 0.0829   Epoch: 1   Global Step: 22230   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:40:49,218-Speed 3391.41 samples/sec   Loss 9.4933   LearningRate 0.0829   Epoch: 1   Global Step: 22240   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:40:52,290-Speed 3334.76 samples/sec   Loss 9.5898   LearningRate 0.0829   Epoch: 1   Global Step: 22250   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:40:55,342-Speed 3356.32 samples/sec   Loss 9.4377   LearningRate 0.0829   Epoch: 1   Global Step: 22260   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:40:58,341-Speed 3415.12 samples/sec   Loss 9.4567   LearningRate 0.0829   Epoch: 1   Global Step: 22270   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:41:01,399-Speed 3349.84 samples/sec   Loss 9.6186   LearningRate 0.0829   Epoch: 1   Global Step: 22280   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:41:04,501-Speed 3301.67 samples/sec   Loss 9.5255   LearningRate 0.0829   Epoch: 1   Global Step: 22290   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:41:07,585-Speed 3321.42 samples/sec   Loss 9.5104   LearningRate 0.0829   Epoch: 1   Global Step: 22300   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:41:10,615-Speed 3381.16 samples/sec   Loss 9.4374   LearningRate 0.0828   Epoch: 1   Global Step: 22310   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:41:13,685-Speed 3336.47 samples/sec   Loss 9.3645   LearningRate 0.0828   Epoch: 1   Global Step: 22320   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:41:16,763-Speed 3327.82 samples/sec   Loss 9.3922   LearningRate 0.0828   Epoch: 1   Global Step: 22330   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:41:19,813-Speed 3358.76 samples/sec   Loss 9.4503   LearningRate 0.0828   Epoch: 1   Global Step: 22340   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:41:22,835-Speed 3389.09 samples/sec   Loss 9.5304   LearningRate 0.0828   Epoch: 1   Global Step: 22350   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:41:25,904-Speed 3337.28 samples/sec   Loss 9.5046   LearningRate 0.0828   Epoch: 1   Global Step: 22360   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:41:28,992-Speed 3317.46 samples/sec   Loss 9.4408   LearningRate 0.0828   Epoch: 1   Global Step: 22370   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:41:32,021-Speed 3381.51 samples/sec   Loss 9.4398   LearningRate 0.0828   Epoch: 1   Global Step: 22380   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:41:35,053-Speed 3378.91 samples/sec   Loss 9.2223   LearningRate 0.0828   Epoch: 1   Global Step: 22390   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:41:38,080-Speed 3383.43 samples/sec   Loss 9.5064   LearningRate 0.0828   Epoch: 1   Global Step: 22400   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:41:41,130-Speed 3359.03 samples/sec   Loss 9.5652   LearningRate 0.0828   Epoch: 1   Global Step: 22410   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:41:44,207-Speed 3329.19 samples/sec   Loss 9.5867   LearningRate 0.0828   Epoch: 1   Global Step: 22420   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:41:47,237-Speed 3380.42 samples/sec   Loss 9.3555   LearningRate 0.0828   Epoch: 1   Global Step: 22430   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:41:50,352-Speed 3288.19 samples/sec   Loss 9.4128   LearningRate 0.0827   Epoch: 1   Global Step: 22440   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:41:53,438-Speed 3319.23 samples/sec   Loss 9.5013   LearningRate 0.0827   Epoch: 1   Global Step: 22450   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:41:56,454-Speed 3396.81 samples/sec   Loss 9.4627   LearningRate 0.0827   Epoch: 1   Global Step: 22460   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:41:59,482-Speed 3382.72 samples/sec   Loss 9.4267   LearningRate 0.0827   Epoch: 1   Global Step: 22470   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:42:02,531-Speed 3359.78 samples/sec   Loss 9.4133   LearningRate 0.0827   Epoch: 1   Global Step: 22480   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:42:05,646-Speed 3288.16 samples/sec   Loss 9.4655   LearningRate 0.0827   Epoch: 1   Global Step: 22490   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:42:08,696-Speed 3358.34 samples/sec   Loss 9.4809   LearningRate 0.0827   Epoch: 1   Global Step: 22500   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:42:11,760-Speed 3343.32 samples/sec   Loss 9.4601   LearningRate 0.0827   Epoch: 1   Global Step: 22510   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:42:14,782-Speed 3389.76 samples/sec   Loss 9.2440   LearningRate 0.0827   Epoch: 1   Global Step: 22520   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:42:17,835-Speed 3355.40 samples/sec   Loss 9.3596   LearningRate 0.0827   Epoch: 1   Global Step: 22530   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:42:20,864-Speed 3381.08 samples/sec   Loss 9.4063   LearningRate 0.0827   Epoch: 1   Global Step: 22540   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:42:23,907-Speed 3366.64 samples/sec   Loss 9.3001   LearningRate 0.0827   Epoch: 1   Global Step: 22550   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:42:26,971-Speed 3342.43 samples/sec   Loss 9.5336   LearningRate 0.0827   Epoch: 1   Global Step: 22560   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:42:29,981-Speed 3403.60 samples/sec   Loss 9.4052   LearningRate 0.0827   Epoch: 1   Global Step: 22570   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:42:32,999-Speed 3393.69 samples/sec   Loss 9.3842   LearningRate 0.0826   Epoch: 1   Global Step: 22580   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:42:36,034-Speed 3375.67 samples/sec   Loss 9.5616   LearningRate 0.0826   Epoch: 1   Global Step: 22590   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:42:39,065-Speed 3378.68 samples/sec   Loss 9.6326   LearningRate 0.0826   Epoch: 1   Global Step: 22600   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:42:42,118-Speed 3356.04 samples/sec   Loss 9.5983   LearningRate 0.0826   Epoch: 1   Global Step: 22610   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:42:45,127-Speed 3404.23 samples/sec   Loss 9.4683   LearningRate 0.0826   Epoch: 1   Global Step: 22620   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:42:48,152-Speed 3385.96 samples/sec   Loss 9.5484   LearningRate 0.0826   Epoch: 1   Global Step: 22630   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:42:51,186-Speed 3375.47 samples/sec   Loss 9.4726   LearningRate 0.0826   Epoch: 1   Global Step: 22640   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:42:54,300-Speed 3290.14 samples/sec   Loss 9.4539   LearningRate 0.0826   Epoch: 1   Global Step: 22650   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:42:57,310-Speed 3402.81 samples/sec   Loss 9.2898   LearningRate 0.0826   Epoch: 1   Global Step: 22660   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:43:00,348-Speed 3371.25 samples/sec   Loss 9.4269   LearningRate 0.0826   Epoch: 1   Global Step: 22670   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:43:03,411-Speed 3343.51 samples/sec   Loss 9.3017   LearningRate 0.0826   Epoch: 1   Global Step: 22680   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:43:06,460-Speed 3360.33 samples/sec   Loss 9.3273   LearningRate 0.0826   Epoch: 1   Global Step: 22690   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:43:09,489-Speed 3381.48 samples/sec   Loss 9.2780   LearningRate 0.0826   Epoch: 1   Global Step: 22700   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:43:12,588-Speed 3305.43 samples/sec   Loss 9.4615   LearningRate 0.0826   Epoch: 1   Global Step: 22710   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:43:15,612-Speed 3387.40 samples/sec   Loss 9.5317   LearningRate 0.0825   Epoch: 1   Global Step: 22720   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:43:18,704-Speed 3312.08 samples/sec   Loss 9.4065   LearningRate 0.0825   Epoch: 1   Global Step: 22730   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:43:21,733-Speed 3382.01 samples/sec   Loss 9.3853   LearningRate 0.0825   Epoch: 1   Global Step: 22740   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:43:24,804-Speed 3335.98 samples/sec   Loss 9.2953   LearningRate 0.0825   Epoch: 1   Global Step: 22750   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:43:27,861-Speed 3350.75 samples/sec   Loss 9.4167   LearningRate 0.0825   Epoch: 1   Global Step: 22760   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:43:30,955-Speed 3310.21 samples/sec   Loss 9.3739   LearningRate 0.0825   Epoch: 1   Global Step: 22770   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:43:33,980-Speed 3385.20 samples/sec   Loss 9.4655   LearningRate 0.0825   Epoch: 1   Global Step: 22780   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:43:37,055-Speed 3332.10 samples/sec   Loss 9.4489   LearningRate 0.0825   Epoch: 1   Global Step: 22790   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:43:40,105-Speed 3358.27 samples/sec   Loss 9.2815   LearningRate 0.0825   Epoch: 1   Global Step: 22800   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:43:43,163-Speed 3349.02 samples/sec   Loss 9.4515   LearningRate 0.0825   Epoch: 1   Global Step: 22810   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:43:46,197-Speed 3376.26 samples/sec   Loss 9.3221   LearningRate 0.0825   Epoch: 1   Global Step: 22820   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:43:49,226-Speed 3382.10 samples/sec   Loss 9.4159   LearningRate 0.0825   Epoch: 1   Global Step: 22830   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:43:52,278-Speed 3356.18 samples/sec   Loss 9.3031   LearningRate 0.0825   Epoch: 1   Global Step: 22840   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:43:55,328-Speed 3358.94 samples/sec   Loss 9.3903   LearningRate 0.0824   Epoch: 1   Global Step: 22850   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:43:58,338-Speed 3402.30 samples/sec   Loss 9.3535   LearningRate 0.0824   Epoch: 1   Global Step: 22860   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:44:01,462-Speed 3279.24 samples/sec   Loss 9.4637   LearningRate 0.0824   Epoch: 1   Global Step: 22870   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:44:04,499-Speed 3372.58 samples/sec   Loss 9.3969   LearningRate 0.0824   Epoch: 1   Global Step: 22880   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:44:07,510-Speed 3401.66 samples/sec   Loss 9.3536   LearningRate 0.0824   Epoch: 1   Global Step: 22890   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:44:10,520-Speed 3403.50 samples/sec   Loss 9.3428   LearningRate 0.0824   Epoch: 1   Global Step: 22900   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:44:13,541-Speed 3390.92 samples/sec   Loss 9.2849   LearningRate 0.0824   Epoch: 1   Global Step: 22910   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:44:16,597-Speed 3351.76 samples/sec   Loss 9.2800   LearningRate 0.0824   Epoch: 1   Global Step: 22920   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:44:19,602-Speed 3408.70 samples/sec   Loss 9.3245   LearningRate 0.0824   Epoch: 1   Global Step: 22930   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:44:22,640-Speed 3371.59 samples/sec   Loss 9.3173   LearningRate 0.0824   Epoch: 1   Global Step: 22940   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:44:25,662-Speed 3389.67 samples/sec   Loss 9.5493   LearningRate 0.0824   Epoch: 1   Global Step: 22950   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:44:28,696-Speed 3376.04 samples/sec   Loss 9.4005   LearningRate 0.0824   Epoch: 1   Global Step: 22960   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:44:31,779-Speed 3322.09 samples/sec   Loss 9.4993   LearningRate 0.0824   Epoch: 1   Global Step: 22970   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:44:34,856-Speed 3329.53 samples/sec   Loss 9.3260   LearningRate 0.0824   Epoch: 1   Global Step: 22980   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:44:37,952-Speed 3307.83 samples/sec   Loss 9.3846   LearningRate 0.0823   Epoch: 1   Global Step: 22990   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:44:41,065-Speed 3290.73 samples/sec   Loss 9.4471   LearningRate 0.0823   Epoch: 1   Global Step: 23000   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:44:44,136-Speed 3335.71 samples/sec   Loss 9.4558   LearningRate 0.0823   Epoch: 1   Global Step: 23010   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:44:47,134-Speed 3416.80 samples/sec   Loss 9.5097   LearningRate 0.0823   Epoch: 1   Global Step: 23020   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:44:50,251-Speed 3286.54 samples/sec   Loss 9.3426   LearningRate 0.0823   Epoch: 1   Global Step: 23030   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:44:53,313-Speed 3345.03 samples/sec   Loss 9.2649   LearningRate 0.0823   Epoch: 1   Global Step: 23040   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:44:56,363-Speed 3359.09 samples/sec   Loss 9.4111   LearningRate 0.0823   Epoch: 1   Global Step: 23050   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:44:59,361-Speed 3415.63 samples/sec   Loss 9.3337   LearningRate 0.0823   Epoch: 1   Global Step: 23060   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:45:02,437-Speed 3330.31 samples/sec   Loss 9.3323   LearningRate 0.0823   Epoch: 1   Global Step: 23070   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:45:05,485-Speed 3360.84 samples/sec   Loss 9.3079   LearningRate 0.0823   Epoch: 1   Global Step: 23080   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:45:08,505-Speed 3391.64 samples/sec   Loss 9.3951   LearningRate 0.0823   Epoch: 1   Global Step: 23090   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:45:11,587-Speed 3324.05 samples/sec   Loss 9.5179   LearningRate 0.0823   Epoch: 1   Global Step: 23100   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:45:14,616-Speed 3381.88 samples/sec   Loss 9.3720   LearningRate 0.0823   Epoch: 1   Global Step: 23110   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:45:17,686-Speed 3335.66 samples/sec   Loss 9.4074   LearningRate 0.0823   Epoch: 1   Global Step: 23120   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:45:20,749-Speed 3344.40 samples/sec   Loss 9.5002   LearningRate 0.0822   Epoch: 1   Global Step: 23130   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:45:23,801-Speed 3356.63 samples/sec   Loss 9.3851   LearningRate 0.0822   Epoch: 1   Global Step: 23140   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:45:26,825-Speed 3386.81 samples/sec   Loss 9.4494   LearningRate 0.0822   Epoch: 1   Global Step: 23150   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:45:29,862-Speed 3372.40 samples/sec   Loss 9.3132   LearningRate 0.0822   Epoch: 1   Global Step: 23160   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:45:32,925-Speed 3345.31 samples/sec   Loss 9.3407   LearningRate 0.0822   Epoch: 1   Global Step: 23170   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:45:36,021-Speed 3308.18 samples/sec   Loss 9.3980   LearningRate 0.0822   Epoch: 1   Global Step: 23180   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:45:39,079-Speed 3349.91 samples/sec   Loss 9.3698   LearningRate 0.0822   Epoch: 1   Global Step: 23190   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:45:42,125-Speed 3362.53 samples/sec   Loss 9.3619   LearningRate 0.0822   Epoch: 1   Global Step: 23200   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:45:45,124-Speed 3416.20 samples/sec   Loss 9.3534   LearningRate 0.0822   Epoch: 1   Global Step: 23210   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:45:48,162-Speed 3372.31 samples/sec   Loss 9.4370   LearningRate 0.0822   Epoch: 1   Global Step: 23220   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:45:51,163-Speed 3413.32 samples/sec   Loss 9.3976   LearningRate 0.0822   Epoch: 1   Global Step: 23230   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:45:54,237-Speed 3332.45 samples/sec   Loss 9.3236   LearningRate 0.0822   Epoch: 1   Global Step: 23240   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:45:57,270-Speed 3376.82 samples/sec   Loss 9.4359   LearningRate 0.0822   Epoch: 1   Global Step: 23250   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:46:00,357-Speed 3318.09 samples/sec   Loss 9.2441   LearningRate 0.0822   Epoch: 1   Global Step: 23260   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:46:03,430-Speed 3333.11 samples/sec   Loss 9.3447   LearningRate 0.0821   Epoch: 1   Global Step: 23270   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:46:06,510-Speed 3326.17 samples/sec   Loss 9.2400   LearningRate 0.0821   Epoch: 1   Global Step: 23280   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:46:09,537-Speed 3384.26 samples/sec   Loss 9.4018   LearningRate 0.0821   Epoch: 1   Global Step: 23290   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:46:12,669-Speed 3270.41 samples/sec   Loss 9.4003   LearningRate 0.0821   Epoch: 1   Global Step: 23300   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:46:15,743-Speed 3331.91 samples/sec   Loss 9.3011   LearningRate 0.0821   Epoch: 1   Global Step: 23310   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:46:18,874-Speed 3271.74 samples/sec   Loss 9.2512   LearningRate 0.0821   Epoch: 1   Global Step: 23320   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:46:21,899-Speed 3386.79 samples/sec   Loss 9.3802   LearningRate 0.0821   Epoch: 1   Global Step: 23330   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:46:24,945-Speed 3362.63 samples/sec   Loss 9.2771   LearningRate 0.0821   Epoch: 1   Global Step: 23340   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:46:27,975-Speed 3380.39 samples/sec   Loss 9.3363   LearningRate 0.0821   Epoch: 1   Global Step: 23350   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:46:31,088-Speed 3290.49 samples/sec   Loss 9.2565   LearningRate 0.0821   Epoch: 1   Global Step: 23360   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:46:34,073-Speed 3431.20 samples/sec   Loss 9.4509   LearningRate 0.0821   Epoch: 1   Global Step: 23370   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:46:37,141-Speed 3339.64 samples/sec   Loss 9.3148   LearningRate 0.0821   Epoch: 1   Global Step: 23380   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:46:40,216-Speed 3330.90 samples/sec   Loss 9.3114   LearningRate 0.0821   Epoch: 1   Global Step: 23390   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:46:43,268-Speed 3356.19 samples/sec   Loss 9.3835   LearningRate 0.0820   Epoch: 1   Global Step: 23400   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:46:46,286-Speed 3395.14 samples/sec   Loss 9.3675   LearningRate 0.0820   Epoch: 1   Global Step: 23410   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:46:49,330-Speed 3364.96 samples/sec   Loss 9.2313   LearningRate 0.0820   Epoch: 1   Global Step: 23420   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:46:52,365-Speed 3374.72 samples/sec   Loss 9.3415   LearningRate 0.0820   Epoch: 1   Global Step: 23430   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:46:55,375-Speed 3403.14 samples/sec   Loss 9.2726   LearningRate 0.0820   Epoch: 1   Global Step: 23440   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:46:58,440-Speed 3342.41 samples/sec   Loss 9.3209   LearningRate 0.0820   Epoch: 1   Global Step: 23450   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:47:01,484-Speed 3364.28 samples/sec   Loss 9.2779   LearningRate 0.0820   Epoch: 1   Global Step: 23460   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:47:04,516-Speed 3378.52 samples/sec   Loss 9.3445   LearningRate 0.0820   Epoch: 1   Global Step: 23470   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:47:07,596-Speed 3325.49 samples/sec   Loss 9.2502   LearningRate 0.0820   Epoch: 1   Global Step: 23480   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:47:10,625-Speed 3382.75 samples/sec   Loss 9.4315   LearningRate 0.0820   Epoch: 1   Global Step: 23490   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:47:13,673-Speed 3360.48 samples/sec   Loss 9.2816   LearningRate 0.0820   Epoch: 1   Global Step: 23500   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:47:16,710-Speed 3372.81 samples/sec   Loss 9.3061   LearningRate 0.0820   Epoch: 1   Global Step: 23510   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:47:19,747-Speed 3372.13 samples/sec   Loss 9.3033   LearningRate 0.0820   Epoch: 1   Global Step: 23520   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:47:22,772-Speed 3387.29 samples/sec   Loss 9.3701   LearningRate 0.0820   Epoch: 1   Global Step: 23530   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:47:25,812-Speed 3368.57 samples/sec   Loss 9.2995   LearningRate 0.0819   Epoch: 1   Global Step: 23540   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:47:28,885-Speed 3333.91 samples/sec   Loss 9.4093   LearningRate 0.0819   Epoch: 1   Global Step: 23550   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:47:31,903-Speed 3394.14 samples/sec   Loss 9.2348   LearningRate 0.0819   Epoch: 1   Global Step: 23560   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:47:35,002-Speed 3304.72 samples/sec   Loss 9.4142   LearningRate 0.0819   Epoch: 1   Global Step: 23570   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:47:38,031-Speed 3381.89 samples/sec   Loss 9.2773   LearningRate 0.0819   Epoch: 1   Global Step: 23580   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:47:41,053-Speed 3389.91 samples/sec   Loss 9.2936   LearningRate 0.0819   Epoch: 1   Global Step: 23590   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:47:44,068-Speed 3398.20 samples/sec   Loss 9.3121   LearningRate 0.0819   Epoch: 1   Global Step: 23600   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:47:47,110-Speed 3367.05 samples/sec   Loss 9.4444   LearningRate 0.0819   Epoch: 1   Global Step: 23610   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:47:50,152-Speed 3366.97 samples/sec   Loss 9.3958   LearningRate 0.0819   Epoch: 1   Global Step: 23620   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:47:53,197-Speed 3364.54 samples/sec   Loss 9.2652   LearningRate 0.0819   Epoch: 1   Global Step: 23630   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:47:56,237-Speed 3369.53 samples/sec   Loss 9.2514   LearningRate 0.0819   Epoch: 1   Global Step: 23640   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:47:59,322-Speed 3320.15 samples/sec   Loss 9.4228   LearningRate 0.0819   Epoch: 1   Global Step: 23650   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:48:02,473-Speed 3250.89 samples/sec   Loss 9.3877   LearningRate 0.0819   Epoch: 1   Global Step: 23660   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:48:05,517-Speed 3364.60 samples/sec   Loss 9.2007   LearningRate 0.0819   Epoch: 1   Global Step: 23670   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:48:08,524-Speed 3406.90 samples/sec   Loss 9.1816   LearningRate 0.0818   Epoch: 1   Global Step: 23680   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:48:11,554-Speed 3379.77 samples/sec   Loss 9.2044   LearningRate 0.0818   Epoch: 1   Global Step: 23690   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:48:14,624-Speed 3336.97 samples/sec   Loss 9.2743   LearningRate 0.0818   Epoch: 1   Global Step: 23700   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:48:17,640-Speed 3395.98 samples/sec   Loss 9.4254   LearningRate 0.0818   Epoch: 1   Global Step: 23710   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:48:20,650-Speed 3403.42 samples/sec   Loss 9.3323   LearningRate 0.0818   Epoch: 1   Global Step: 23720   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:48:23,664-Speed 3398.81 samples/sec   Loss 9.2853   LearningRate 0.0818   Epoch: 1   Global Step: 23730   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:48:26,666-Speed 3411.68 samples/sec   Loss 9.2991   LearningRate 0.0818   Epoch: 1   Global Step: 23740   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:48:29,717-Speed 3357.68 samples/sec   Loss 9.3696   LearningRate 0.0818   Epoch: 1   Global Step: 23750   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:48:32,743-Speed 3385.61 samples/sec   Loss 9.4096   LearningRate 0.0818   Epoch: 1   Global Step: 23760   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:48:35,821-Speed 3327.32 samples/sec   Loss 9.3220   LearningRate 0.0818   Epoch: 1   Global Step: 23770   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:48:38,827-Speed 3407.95 samples/sec   Loss 9.2786   LearningRate 0.0818   Epoch: 1   Global Step: 23780   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:48:41,853-Speed 3385.05 samples/sec   Loss 9.3387   LearningRate 0.0818   Epoch: 1   Global Step: 23790   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:48:44,866-Speed 3399.87 samples/sec   Loss 9.3037   LearningRate 0.0818   Epoch: 1   Global Step: 23800   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:48:47,872-Speed 3407.62 samples/sec   Loss 9.1852   LearningRate 0.0817   Epoch: 1   Global Step: 23810   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:48:50,921-Speed 3359.11 samples/sec   Loss 9.3383   LearningRate 0.0817   Epoch: 1   Global Step: 23820   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:48:53,991-Speed 3336.26 samples/sec   Loss 9.1772   LearningRate 0.0817   Epoch: 1   Global Step: 23830   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:48:57,016-Speed 3386.94 samples/sec   Loss 9.2262   LearningRate 0.0817   Epoch: 1   Global Step: 23840   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:49:00,091-Speed 3330.69 samples/sec   Loss 9.3300   LearningRate 0.0817   Epoch: 1   Global Step: 23850   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:49:03,132-Speed 3368.17 samples/sec   Loss 9.2932   LearningRate 0.0817   Epoch: 1   Global Step: 23860   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:49:06,177-Speed 3364.69 samples/sec   Loss 9.3395   LearningRate 0.0817   Epoch: 1   Global Step: 23870   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:49:09,196-Speed 3392.26 samples/sec   Loss 9.3122   LearningRate 0.0817   Epoch: 1   Global Step: 23880   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:49:12,275-Speed 3327.19 samples/sec   Loss 9.1555   LearningRate 0.0817   Epoch: 1   Global Step: 23890   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:49:15,349-Speed 3332.00 samples/sec   Loss 9.2794   LearningRate 0.0817   Epoch: 1   Global Step: 23900   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:49:18,469-Speed 3283.30 samples/sec   Loss 9.3048   LearningRate 0.0817   Epoch: 1   Global Step: 23910   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:49:21,492-Speed 3388.56 samples/sec   Loss 9.3004   LearningRate 0.0817   Epoch: 1   Global Step: 23920   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:49:24,598-Speed 3297.99 samples/sec   Loss 9.3107   LearningRate 0.0817   Epoch: 1   Global Step: 23930   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:49:27,647-Speed 3359.94 samples/sec   Loss 9.1487   LearningRate 0.0817   Epoch: 1   Global Step: 23940   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:49:30,734-Speed 3317.04 samples/sec   Loss 9.1623   LearningRate 0.0816   Epoch: 1   Global Step: 23950   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:49:33,768-Speed 3376.43 samples/sec   Loss 9.1281   LearningRate 0.0816   Epoch: 1   Global Step: 23960   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:49:36,885-Speed 3286.91 samples/sec   Loss 9.3220   LearningRate 0.0816   Epoch: 1   Global Step: 23970   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:49:39,928-Speed 3365.30 samples/sec   Loss 9.3120   LearningRate 0.0816   Epoch: 1   Global Step: 23980   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:49:42,980-Speed 3357.22 samples/sec   Loss 9.2172   LearningRate 0.0816   Epoch: 1   Global Step: 23990   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:49:45,991-Speed 3401.33 samples/sec   Loss 9.1878   LearningRate 0.0816   Epoch: 1   Global Step: 24000   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:49:49,056-Speed 3342.92 samples/sec   Loss 9.3521   LearningRate 0.0816   Epoch: 1   Global Step: 24010   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:49:52,091-Speed 3375.01 samples/sec   Loss 9.2557   LearningRate 0.0816   Epoch: 1   Global Step: 24020   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:49:55,104-Speed 3400.19 samples/sec   Loss 9.1154   LearningRate 0.0816   Epoch: 1   Global Step: 24030   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:49:58,113-Speed 3403.56 samples/sec   Loss 9.3479   LearningRate 0.0816   Epoch: 1   Global Step: 24040   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:50:01,152-Speed 3371.17 samples/sec   Loss 9.2439   LearningRate 0.0816   Epoch: 1   Global Step: 24050   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:50:04,201-Speed 3359.13 samples/sec   Loss 9.2937   LearningRate 0.0816   Epoch: 1   Global Step: 24060   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:50:07,277-Speed 3329.71 samples/sec   Loss 9.3226   LearningRate 0.0816   Epoch: 1   Global Step: 24070   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:50:10,263-Speed 3430.73 samples/sec   Loss 9.1767   LearningRate 0.0816   Epoch: 1   Global Step: 24080   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:50:13,273-Speed 3403.06 samples/sec   Loss 9.2144   LearningRate 0.0815   Epoch: 1   Global Step: 24090   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:50:16,395-Speed 3281.76 samples/sec   Loss 9.2398   LearningRate 0.0815   Epoch: 1   Global Step: 24100   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:50:19,393-Speed 3415.42 samples/sec   Loss 9.2502   LearningRate 0.0815   Epoch: 1   Global Step: 24110   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:50:22,498-Speed 3300.00 samples/sec   Loss 9.1600   LearningRate 0.0815   Epoch: 1   Global Step: 24120   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:50:25,559-Speed 3346.08 samples/sec   Loss 9.1278   LearningRate 0.0815   Epoch: 1   Global Step: 24130   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:50:28,729-Speed 3231.74 samples/sec   Loss 9.2941   LearningRate 0.0815   Epoch: 1   Global Step: 24140   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:50:31,779-Speed 3357.93 samples/sec   Loss 9.3047   LearningRate 0.0815   Epoch: 1   Global Step: 24150   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:50:34,835-Speed 3352.54 samples/sec   Loss 9.2932   LearningRate 0.0815   Epoch: 1   Global Step: 24160   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:50:37,851-Speed 3396.77 samples/sec   Loss 9.2687   LearningRate 0.0815   Epoch: 1   Global Step: 24170   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:50:40,917-Speed 3339.80 samples/sec   Loss 9.2944   LearningRate 0.0815   Epoch: 1   Global Step: 24180   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:50:43,965-Speed 3360.99 samples/sec   Loss 9.2088   LearningRate 0.0815   Epoch: 1   Global Step: 24190   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:50:47,001-Speed 3374.65 samples/sec   Loss 9.2734   LearningRate 0.0815   Epoch: 1   Global Step: 24200   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:50:50,115-Speed 3289.40 samples/sec   Loss 9.2970   LearningRate 0.0815   Epoch: 1   Global Step: 24210   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:50:53,180-Speed 3341.74 samples/sec   Loss 9.1916   LearningRate 0.0815   Epoch: 1   Global Step: 24220   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:50:56,235-Speed 3352.60 samples/sec   Loss 9.2861   LearningRate 0.0814   Epoch: 1   Global Step: 24230   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:50:59,283-Speed 3361.28 samples/sec   Loss 9.2564   LearningRate 0.0814   Epoch: 1   Global Step: 24240   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:51:02,328-Speed 3363.52 samples/sec   Loss 9.1633   LearningRate 0.0814   Epoch: 1   Global Step: 24250   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:51:05,415-Speed 3318.54 samples/sec   Loss 9.2768   LearningRate 0.0814   Epoch: 1   Global Step: 24260   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:51:08,453-Speed 3371.46 samples/sec   Loss 9.1350   LearningRate 0.0814   Epoch: 1   Global Step: 24270   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:51:11,521-Speed 3338.85 samples/sec   Loss 9.1320   LearningRate 0.0814   Epoch: 1   Global Step: 24280   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:51:14,584-Speed 3343.97 samples/sec   Loss 9.2692   LearningRate 0.0814   Epoch: 1   Global Step: 24290   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:51:17,590-Speed 3408.48 samples/sec   Loss 9.1658   LearningRate 0.0814   Epoch: 1   Global Step: 24300   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:51:20,577-Speed 3428.39 samples/sec   Loss 9.2988   LearningRate 0.0814   Epoch: 1   Global Step: 24310   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:51:23,595-Speed 3394.13 samples/sec   Loss 9.2384   LearningRate 0.0814   Epoch: 1   Global Step: 24320   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:51:26,610-Speed 3399.34 samples/sec   Loss 9.1405   LearningRate 0.0814   Epoch: 1   Global Step: 24330   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:51:29,710-Speed 3303.54 samples/sec   Loss 9.0709   LearningRate 0.0814   Epoch: 1   Global Step: 24340   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:51:32,712-Speed 3412.30 samples/sec   Loss 9.2842   LearningRate 0.0814   Epoch: 1   Global Step: 24350   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:51:35,768-Speed 3351.99 samples/sec   Loss 9.1909   LearningRate 0.0813   Epoch: 1   Global Step: 24360   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:51:38,809-Speed 3368.85 samples/sec   Loss 9.2665   LearningRate 0.0813   Epoch: 1   Global Step: 24370   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:51:41,804-Speed 3419.60 samples/sec   Loss 9.2371   LearningRate 0.0813   Epoch: 1   Global Step: 24380   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:51:44,825-Speed 3390.69 samples/sec   Loss 9.1785   LearningRate 0.0813   Epoch: 1   Global Step: 24390   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:51:47,827-Speed 3412.21 samples/sec   Loss 9.2497   LearningRate 0.0813   Epoch: 1   Global Step: 24400   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:51:50,920-Speed 3312.23 samples/sec   Loss 9.2693   LearningRate 0.0813   Epoch: 1   Global Step: 24410   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:51:54,027-Speed 3297.10 samples/sec   Loss 9.0870   LearningRate 0.0813   Epoch: 1   Global Step: 24420   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:51:57,035-Speed 3405.51 samples/sec   Loss 9.2276   LearningRate 0.0813   Epoch: 1   Global Step: 24430   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:52:00,071-Speed 3373.19 samples/sec   Loss 9.2385   LearningRate 0.0813   Epoch: 1   Global Step: 24440   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:52:03,172-Speed 3304.07 samples/sec   Loss 9.1759   LearningRate 0.0813   Epoch: 1   Global Step: 24450   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:52:06,228-Speed 3351.57 samples/sec   Loss 9.1479   LearningRate 0.0813   Epoch: 1   Global Step: 24460   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:52:09,253-Speed 3386.26 samples/sec   Loss 9.2314   LearningRate 0.0813   Epoch: 1   Global Step: 24470   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:52:12,275-Speed 3388.75 samples/sec   Loss 9.0081   LearningRate 0.0813   Epoch: 1   Global Step: 24480   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:52:15,321-Speed 3363.83 samples/sec   Loss 9.2867   LearningRate 0.0813   Epoch: 1   Global Step: 24490   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:52:18,322-Speed 3413.14 samples/sec   Loss 9.1551   LearningRate 0.0812   Epoch: 1   Global Step: 24500   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:52:21,349-Speed 3384.50 samples/sec   Loss 9.2464   LearningRate 0.0812   Epoch: 1   Global Step: 24510   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:52:24,392-Speed 3365.58 samples/sec   Loss 9.1924   LearningRate 0.0812   Epoch: 1   Global Step: 24520   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:52:27,405-Speed 3399.70 samples/sec   Loss 9.0587   LearningRate 0.0812   Epoch: 1   Global Step: 24530   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:52:30,446-Speed 3368.48 samples/sec   Loss 9.2192   LearningRate 0.0812   Epoch: 1   Global Step: 24540   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:52:33,484-Speed 3372.05 samples/sec   Loss 9.2269   LearningRate 0.0812   Epoch: 1   Global Step: 24550   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:52:36,585-Speed 3302.93 samples/sec   Loss 9.2276   LearningRate 0.0812   Epoch: 1   Global Step: 24560   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:52:39,689-Speed 3299.95 samples/sec   Loss 9.3401   LearningRate 0.0812   Epoch: 1   Global Step: 24570   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:52:42,755-Speed 3341.12 samples/sec   Loss 9.2100   LearningRate 0.0812   Epoch: 1   Global Step: 24580   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:52:45,764-Speed 3404.79 samples/sec   Loss 9.1321   LearningRate 0.0812   Epoch: 1   Global Step: 24590   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:52:48,827-Speed 3344.25 samples/sec   Loss 9.0688   LearningRate 0.0812   Epoch: 1   Global Step: 24600   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:52:51,847-Speed 3391.13 samples/sec   Loss 9.1626   LearningRate 0.0812   Epoch: 1   Global Step: 24610   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:52:54,882-Speed 3374.74 samples/sec   Loss 9.2894   LearningRate 0.0812   Epoch: 1   Global Step: 24620   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:52:57,885-Speed 3410.94 samples/sec   Loss 9.1703   LearningRate 0.0812   Epoch: 1   Global Step: 24630   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:53:00,931-Speed 3363.44 samples/sec   Loss 9.1349   LearningRate 0.0811   Epoch: 1   Global Step: 24640   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:53:04,025-Speed 3310.05 samples/sec   Loss 9.1326   LearningRate 0.0811   Epoch: 1   Global Step: 24650   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:53:07,159-Speed 3268.90 samples/sec   Loss 9.2363   LearningRate 0.0811   Epoch: 1   Global Step: 24660   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:53:10,167-Speed 3404.91 samples/sec   Loss 9.0441   LearningRate 0.0811   Epoch: 1   Global Step: 24670   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:53:13,219-Speed 3357.21 samples/sec   Loss 9.2049   LearningRate 0.0811   Epoch: 1   Global Step: 24680   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:53:16,251-Speed 3377.73 samples/sec   Loss 9.2717   LearningRate 0.0811   Epoch: 1   Global Step: 24690   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:53:19,330-Speed 3326.95 samples/sec   Loss 9.1284   LearningRate 0.0811   Epoch: 1   Global Step: 24700   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:53:22,322-Speed 3423.46 samples/sec   Loss 9.0873   LearningRate 0.0811   Epoch: 1   Global Step: 24710   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:53:25,352-Speed 3380.68 samples/sec   Loss 9.0851   LearningRate 0.0811   Epoch: 1   Global Step: 24720   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:53:28,490-Speed 3264.03 samples/sec   Loss 9.2270   LearningRate 0.0811   Epoch: 1   Global Step: 24730   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:53:31,588-Speed 3305.84 samples/sec   Loss 9.1926   LearningRate 0.0811   Epoch: 1   Global Step: 24740   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:53:34,639-Speed 3357.57 samples/sec   Loss 9.2136   LearningRate 0.0811   Epoch: 1   Global Step: 24750   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:53:37,697-Speed 3349.59 samples/sec   Loss 9.1516   LearningRate 0.0811   Epoch: 1   Global Step: 24760   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:53:40,742-Speed 3364.60 samples/sec   Loss 9.1569   LearningRate 0.0811   Epoch: 1   Global Step: 24770   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:53:43,790-Speed 3360.01 samples/sec   Loss 9.0104   LearningRate 0.0810   Epoch: 1   Global Step: 24780   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:53:46,810-Speed 3392.71 samples/sec   Loss 9.0199   LearningRate 0.0810   Epoch: 1   Global Step: 24790   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:53:49,881-Speed 3335.52 samples/sec   Loss 9.1359   LearningRate 0.0810   Epoch: 1   Global Step: 24800   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:53:52,983-Speed 3301.57 samples/sec   Loss 9.1670   LearningRate 0.0810   Epoch: 1   Global Step: 24810   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:53:56,084-Speed 3303.13 samples/sec   Loss 9.2920   LearningRate 0.0810   Epoch: 1   Global Step: 24820   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:53:59,120-Speed 3373.56 samples/sec   Loss 9.0979   LearningRate 0.0810   Epoch: 1   Global Step: 24830   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:54:02,386-Speed 3135.95 samples/sec   Loss 9.0534   LearningRate 0.0810   Epoch: 1   Global Step: 24840   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:54:33,692-Speed 327.12 samples/sec   Loss 7.8858   LearningRate 0.0810   Epoch: 2   Global Step: 24850   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:54:36,876-Speed 3216.84 samples/sec   Loss 7.5455   LearningRate 0.0810   Epoch: 2   Global Step: 24860   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:54:39,938-Speed 3345.50 samples/sec   Loss 7.4219   LearningRate 0.0810   Epoch: 2   Global Step: 24870   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:54:42,992-Speed 3353.90 samples/sec   Loss 7.3821   LearningRate 0.0810   Epoch: 2   Global Step: 24880   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:54:45,984-Speed 3424.04 samples/sec   Loss 7.4470   LearningRate 0.0810   Epoch: 2   Global Step: 24890   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:54:49,028-Speed 3365.43 samples/sec   Loss 7.3334   LearningRate 0.0810   Epoch: 2   Global Step: 24900   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:54:52,136-Speed 3295.64 samples/sec   Loss 7.4329   LearningRate 0.0810   Epoch: 2   Global Step: 24910   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:54:55,178-Speed 3367.43 samples/sec   Loss 7.3375   LearningRate 0.0809   Epoch: 2   Global Step: 24920   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:54:58,169-Speed 3424.82 samples/sec   Loss 7.2450   LearningRate 0.0809   Epoch: 2   Global Step: 24930   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:55:01,199-Speed 3380.74 samples/sec   Loss 7.3903   LearningRate 0.0809   Epoch: 2   Global Step: 24940   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:55:04,235-Speed 3373.82 samples/sec   Loss 7.4915   LearningRate 0.0809   Epoch: 2   Global Step: 24950   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:55:07,309-Speed 3331.73 samples/sec   Loss 7.4613   LearningRate 0.0809   Epoch: 2   Global Step: 24960   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:55:10,363-Speed 3355.12 samples/sec   Loss 7.2986   LearningRate 0.0809   Epoch: 2   Global Step: 24970   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:55:13,411-Speed 3360.20 samples/sec   Loss 7.3834   LearningRate 0.0809   Epoch: 2   Global Step: 24980   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:55:16,564-Speed 3249.63 samples/sec   Loss 7.4475   LearningRate 0.0809   Epoch: 2   Global Step: 24990   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:55:19,579-Speed 3397.08 samples/sec   Loss 7.4533   LearningRate 0.0809   Epoch: 2   Global Step: 25000   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:55:22,640-Speed 3346.53 samples/sec   Loss 7.3661   LearningRate 0.0809   Epoch: 2   Global Step: 25010   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:55:25,694-Speed 3354.42 samples/sec   Loss 7.4789   LearningRate 0.0809   Epoch: 2   Global Step: 25020   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:55:28,767-Speed 3332.65 samples/sec   Loss 7.4635   LearningRate 0.0809   Epoch: 2   Global Step: 25030   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:55:31,805-Speed 3372.19 samples/sec   Loss 7.4563   LearningRate 0.0809   Epoch: 2   Global Step: 25040   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:55:34,837-Speed 3377.22 samples/sec   Loss 7.3875   LearningRate 0.0808   Epoch: 2   Global Step: 25050   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:55:37,901-Speed 3343.98 samples/sec   Loss 7.4821   LearningRate 0.0808   Epoch: 2   Global Step: 25060   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:55:40,955-Speed 3353.91 samples/sec   Loss 7.6099   LearningRate 0.0808   Epoch: 2   Global Step: 25070   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:55:44,023-Speed 3338.38 samples/sec   Loss 7.4913   LearningRate 0.0808   Epoch: 2   Global Step: 25080   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:55:47,111-Speed 3318.02 samples/sec   Loss 7.5069   LearningRate 0.0808   Epoch: 2   Global Step: 25090   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:55:50,203-Speed 3312.05 samples/sec   Loss 7.4948   LearningRate 0.0808   Epoch: 2   Global Step: 25100   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:55:53,748-Speed 2889.55 samples/sec   Loss 7.5983   LearningRate 0.0808   Epoch: 2   Global Step: 25110   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:55:56,788-Speed 3370.04 samples/sec   Loss 7.5333   LearningRate 0.0808   Epoch: 2   Global Step: 25120   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:55:59,805-Speed 3394.22 samples/sec   Loss 7.4896   LearningRate 0.0808   Epoch: 2   Global Step: 25130   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:56:02,890-Speed 3321.01 samples/sec   Loss 7.5659   LearningRate 0.0808   Epoch: 2   Global Step: 25140   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:56:05,951-Speed 3346.29 samples/sec   Loss 7.5062   LearningRate 0.0808   Epoch: 2   Global Step: 25150   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:56:08,991-Speed 3368.67 samples/sec   Loss 7.4968   LearningRate 0.0808   Epoch: 2   Global Step: 25160   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:56:12,052-Speed 3346.48 samples/sec   Loss 7.4960   LearningRate 0.0808   Epoch: 2   Global Step: 25170   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:56:15,169-Speed 3286.43 samples/sec   Loss 7.4201   LearningRate 0.0808   Epoch: 2   Global Step: 25180   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:56:18,240-Speed 3335.91 samples/sec   Loss 7.5363   LearningRate 0.0807   Epoch: 2   Global Step: 25190   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:56:21,308-Speed 3338.11 samples/sec   Loss 7.6023   LearningRate 0.0807   Epoch: 2   Global Step: 25200   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:56:24,347-Speed 3371.12 samples/sec   Loss 7.5693   LearningRate 0.0807   Epoch: 2   Global Step: 25210   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:56:27,377-Speed 3380.72 samples/sec   Loss 7.6538   LearningRate 0.0807   Epoch: 2   Global Step: 25220   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:56:30,437-Speed 3347.40 samples/sec   Loss 7.4741   LearningRate 0.0807   Epoch: 2   Global Step: 25230   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:56:33,468-Speed 3379.66 samples/sec   Loss 7.6306   LearningRate 0.0807   Epoch: 2   Global Step: 25240   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:56:36,544-Speed 3330.74 samples/sec   Loss 7.6313   LearningRate 0.0807   Epoch: 2   Global Step: 25250   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:56:39,591-Speed 3361.54 samples/sec   Loss 7.5582   LearningRate 0.0807   Epoch: 2   Global Step: 25260   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:56:42,669-Speed 3327.55 samples/sec   Loss 7.5693   LearningRate 0.0807   Epoch: 2   Global Step: 25270   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:56:45,693-Speed 3388.02 samples/sec   Loss 7.6949   LearningRate 0.0807   Epoch: 2   Global Step: 25280   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:56:48,761-Speed 3338.69 samples/sec   Loss 7.6293   LearningRate 0.0807   Epoch: 2   Global Step: 25290   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:56:51,822-Speed 3346.76 samples/sec   Loss 7.5464   LearningRate 0.0807   Epoch: 2   Global Step: 25300   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:56:54,827-Speed 3408.36 samples/sec   Loss 7.6084   LearningRate 0.0807   Epoch: 2   Global Step: 25310   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:56:57,835-Speed 3404.98 samples/sec   Loss 7.6786   LearningRate 0.0807   Epoch: 2   Global Step: 25320   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:57:00,871-Speed 3373.80 samples/sec   Loss 7.6619   LearningRate 0.0806   Epoch: 2   Global Step: 25330   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:57:03,964-Speed 3311.83 samples/sec   Loss 7.6721   LearningRate 0.0806   Epoch: 2   Global Step: 25340   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:57:06,996-Speed 3378.54 samples/sec   Loss 7.5843   LearningRate 0.0806   Epoch: 2   Global Step: 25350   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:57:09,985-Speed 3427.95 samples/sec   Loss 7.7252   LearningRate 0.0806   Epoch: 2   Global Step: 25360   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:57:13,057-Speed 3334.62 samples/sec   Loss 7.6902   LearningRate 0.0806   Epoch: 2   Global Step: 25370   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:57:16,117-Speed 3347.13 samples/sec   Loss 7.6424   LearningRate 0.0806   Epoch: 2   Global Step: 25380   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:57:19,131-Speed 3398.17 samples/sec   Loss 7.6745   LearningRate 0.0806   Epoch: 2   Global Step: 25390   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:57:22,160-Speed 3381.50 samples/sec   Loss 7.6560   LearningRate 0.0806   Epoch: 2   Global Step: 25400   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:57:25,171-Speed 3401.70 samples/sec   Loss 7.6236   LearningRate 0.0806   Epoch: 2   Global Step: 25410   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:57:28,224-Speed 3355.30 samples/sec   Loss 7.6308   LearningRate 0.0806   Epoch: 2   Global Step: 25420   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:57:31,282-Speed 3350.33 samples/sec   Loss 7.6953   LearningRate 0.0806   Epoch: 2   Global Step: 25430   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:57:34,326-Speed 3364.47 samples/sec   Loss 7.7428   LearningRate 0.0806   Epoch: 2   Global Step: 25440   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:57:37,345-Speed 3393.58 samples/sec   Loss 7.6879   LearningRate 0.0806   Epoch: 2   Global Step: 25450   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:57:40,386-Speed 3368.56 samples/sec   Loss 7.6968   LearningRate 0.0806   Epoch: 2   Global Step: 25460   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:57:43,460-Speed 3332.09 samples/sec   Loss 7.6530   LearningRate 0.0805   Epoch: 2   Global Step: 25470   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:57:46,510-Speed 3357.69 samples/sec   Loss 7.8024   LearningRate 0.0805   Epoch: 2   Global Step: 25480   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:57:49,524-Speed 3399.05 samples/sec   Loss 7.7699   LearningRate 0.0805   Epoch: 2   Global Step: 25490   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:57:52,598-Speed 3331.54 samples/sec   Loss 7.6578   LearningRate 0.0805   Epoch: 2   Global Step: 25500   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:57:55,652-Speed 3354.17 samples/sec   Loss 7.6932   LearningRate 0.0805   Epoch: 2   Global Step: 25510   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:57:58,670-Speed 3394.16 samples/sec   Loss 7.6370   LearningRate 0.0805   Epoch: 2   Global Step: 25520   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:58:01,706-Speed 3374.11 samples/sec   Loss 7.7572   LearningRate 0.0805   Epoch: 2   Global Step: 25530   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:58:04,738-Speed 3378.27 samples/sec   Loss 7.6642   LearningRate 0.0805   Epoch: 2   Global Step: 25540   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:58:07,770-Speed 3378.38 samples/sec   Loss 7.7811   LearningRate 0.0805   Epoch: 2   Global Step: 25550   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:58:10,773-Speed 3410.53 samples/sec   Loss 7.7584   LearningRate 0.0805   Epoch: 2   Global Step: 25560   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:58:13,843-Speed 3337.47 samples/sec   Loss 7.7077   LearningRate 0.0805   Epoch: 2   Global Step: 25570   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:58:16,899-Speed 3350.87 samples/sec   Loss 7.8650   LearningRate 0.0805   Epoch: 2   Global Step: 25580   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:58:19,925-Speed 3385.57 samples/sec   Loss 7.8096   LearningRate 0.0805   Epoch: 2   Global Step: 25590   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:58:22,985-Speed 3346.98 samples/sec   Loss 7.7782   LearningRate 0.0805   Epoch: 2   Global Step: 25600   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:58:26,049-Speed 3343.46 samples/sec   Loss 7.7828   LearningRate 0.0804   Epoch: 2   Global Step: 25610   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:58:29,086-Speed 3372.19 samples/sec   Loss 7.7731   LearningRate 0.0804   Epoch: 2   Global Step: 25620   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:58:32,135-Speed 3360.21 samples/sec   Loss 7.7534   LearningRate 0.0804   Epoch: 2   Global Step: 25630   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:58:35,179-Speed 3364.40 samples/sec   Loss 7.8282   LearningRate 0.0804   Epoch: 2   Global Step: 25640   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:58:38,244-Speed 3342.09 samples/sec   Loss 7.9117   LearningRate 0.0804   Epoch: 2   Global Step: 25650   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:58:41,283-Speed 3370.25 samples/sec   Loss 7.9005   LearningRate 0.0804   Epoch: 2   Global Step: 25660   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:58:44,302-Speed 3393.82 samples/sec   Loss 7.8311   LearningRate 0.0804   Epoch: 2   Global Step: 25670   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 03:58:47,317-Speed 3397.00 samples/sec   Loss 7.7683   LearningRate 0.0804   Epoch: 2   Global Step: 25680   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:58:50,337-Speed 3392.25 samples/sec   Loss 7.8302   LearningRate 0.0804   Epoch: 2   Global Step: 25690   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:58:53,379-Speed 3367.04 samples/sec   Loss 7.8164   LearningRate 0.0804   Epoch: 2   Global Step: 25700   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:58:56,430-Speed 3357.21 samples/sec   Loss 7.8560   LearningRate 0.0804   Epoch: 2   Global Step: 25710   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:58:59,469-Speed 3370.54 samples/sec   Loss 7.7331   LearningRate 0.0804   Epoch: 2   Global Step: 25720   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:59:02,545-Speed 3330.09 samples/sec   Loss 7.9048   LearningRate 0.0804   Epoch: 2   Global Step: 25730   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:59:05,640-Speed 3309.71 samples/sec   Loss 7.8234   LearningRate 0.0804   Epoch: 2   Global Step: 25740   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:59:08,676-Speed 3373.37 samples/sec   Loss 7.9149   LearningRate 0.0803   Epoch: 2   Global Step: 25750   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:59:11,708-Speed 3377.93 samples/sec   Loss 7.8427   LearningRate 0.0803   Epoch: 2   Global Step: 25760   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:59:14,758-Speed 3358.72 samples/sec   Loss 7.8162   LearningRate 0.0803   Epoch: 2   Global Step: 25770   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:59:17,774-Speed 3396.03 samples/sec   Loss 7.7708   LearningRate 0.0803   Epoch: 2   Global Step: 25780   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:59:20,826-Speed 3356.33 samples/sec   Loss 7.7321   LearningRate 0.0803   Epoch: 2   Global Step: 25790   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:59:23,916-Speed 3315.46 samples/sec   Loss 7.8335   LearningRate 0.0803   Epoch: 2   Global Step: 25800   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:59:26,940-Speed 3387.13 samples/sec   Loss 7.8575   LearningRate 0.0803   Epoch: 2   Global Step: 25810   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:59:30,037-Speed 3307.31 samples/sec   Loss 7.9437   LearningRate 0.0803   Epoch: 2   Global Step: 25820   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:59:33,063-Speed 3384.91 samples/sec   Loss 7.9189   LearningRate 0.0803   Epoch: 2   Global Step: 25830   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:59:36,118-Speed 3353.57 samples/sec   Loss 7.8904   LearningRate 0.0803   Epoch: 2   Global Step: 25840   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:59:39,174-Speed 3351.88 samples/sec   Loss 7.8688   LearningRate 0.0803   Epoch: 2   Global Step: 25850   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:59:42,225-Speed 3356.68 samples/sec   Loss 7.7770   LearningRate 0.0803   Epoch: 2   Global Step: 25860   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:59:45,255-Speed 3381.62 samples/sec   Loss 7.8688   LearningRate 0.0803   Epoch: 2   Global Step: 25870   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 03:59:48,279-Speed 3387.28 samples/sec   Loss 7.8841   LearningRate 0.0802   Epoch: 2   Global Step: 25880   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:59:51,342-Speed 3343.47 samples/sec   Loss 7.8660   LearningRate 0.0802   Epoch: 2   Global Step: 25890   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:59:54,422-Speed 3326.75 samples/sec   Loss 7.8390   LearningRate 0.0802   Epoch: 2   Global Step: 25900   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 03:59:57,468-Speed 3362.56 samples/sec   Loss 7.9365   LearningRate 0.0802   Epoch: 2   Global Step: 25910   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:00:00,475-Speed 3405.86 samples/sec   Loss 7.8123   LearningRate 0.0802   Epoch: 2   Global Step: 25920   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:00:03,512-Speed 3372.95 samples/sec   Loss 7.8851   LearningRate 0.0802   Epoch: 2   Global Step: 25930   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:00:06,581-Speed 3337.47 samples/sec   Loss 7.8642   LearningRate 0.0802   Epoch: 2   Global Step: 25940   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:00:09,623-Speed 3367.50 samples/sec   Loss 7.8007   LearningRate 0.0802   Epoch: 2   Global Step: 25950   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:00:12,699-Speed 3330.23 samples/sec   Loss 7.9640   LearningRate 0.0802   Epoch: 2   Global Step: 25960   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:00:15,708-Speed 3403.87 samples/sec   Loss 7.8573   LearningRate 0.0802   Epoch: 2   Global Step: 25970   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:00:18,755-Speed 3362.48 samples/sec   Loss 7.9011   LearningRate 0.0802   Epoch: 2   Global Step: 25980   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:00:21,749-Speed 3420.58 samples/sec   Loss 7.9118   LearningRate 0.0802   Epoch: 2   Global Step: 25990   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:00:24,834-Speed 3320.74 samples/sec   Loss 7.9706   LearningRate 0.0802   Epoch: 2   Global Step: 26000   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:00:27,871-Speed 3372.61 samples/sec   Loss 7.9549   LearningRate 0.0802   Epoch: 2   Global Step: 26010   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:00:30,929-Speed 3349.87 samples/sec   Loss 7.9549   LearningRate 0.0801   Epoch: 2   Global Step: 26020   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:00:33,939-Speed 3402.73 samples/sec   Loss 8.0522   LearningRate 0.0801   Epoch: 2   Global Step: 26030   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:00:36,990-Speed 3358.39 samples/sec   Loss 8.0563   LearningRate 0.0801   Epoch: 2   Global Step: 26040   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:00:39,998-Speed 3404.92 samples/sec   Loss 7.8904   LearningRate 0.0801   Epoch: 2   Global Step: 26050   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:00:42,995-Speed 3417.71 samples/sec   Loss 7.9201   LearningRate 0.0801   Epoch: 2   Global Step: 26060   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:00:46,004-Speed 3405.02 samples/sec   Loss 7.9738   LearningRate 0.0801   Epoch: 2   Global Step: 26070   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:00:49,094-Speed 3314.68 samples/sec   Loss 7.7941   LearningRate 0.0801   Epoch: 2   Global Step: 26080   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:00:52,109-Speed 3396.49 samples/sec   Loss 7.9640   LearningRate 0.0801   Epoch: 2   Global Step: 26090   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:00:55,126-Speed 3396.31 samples/sec   Loss 7.8583   LearningRate 0.0801   Epoch: 2   Global Step: 26100   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:00:58,125-Speed 3414.76 samples/sec   Loss 7.8843   LearningRate 0.0801   Epoch: 2   Global Step: 26110   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:01:01,133-Speed 3406.18 samples/sec   Loss 7.9636   LearningRate 0.0801   Epoch: 2   Global Step: 26120   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:01:04,171-Speed 3371.22 samples/sec   Loss 7.8737   LearningRate 0.0801   Epoch: 2   Global Step: 26130   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:01:07,273-Speed 3302.10 samples/sec   Loss 7.9214   LearningRate 0.0801   Epoch: 2   Global Step: 26140   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:01:10,291-Speed 3394.83 samples/sec   Loss 7.9181   LearningRate 0.0801   Epoch: 2   Global Step: 26150   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:01:13,300-Speed 3403.99 samples/sec   Loss 7.9621   LearningRate 0.0800   Epoch: 2   Global Step: 26160   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:01:16,347-Speed 3361.83 samples/sec   Loss 7.8728   LearningRate 0.0800   Epoch: 2   Global Step: 26170   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:01:19,379-Speed 3378.16 samples/sec   Loss 7.9715   LearningRate 0.0800   Epoch: 2   Global Step: 26180   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:01:22,377-Speed 3415.91 samples/sec   Loss 7.9673   LearningRate 0.0800   Epoch: 2   Global Step: 26190   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:01:25,417-Speed 3370.87 samples/sec   Loss 7.9634   LearningRate 0.0800   Epoch: 2   Global Step: 26200   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:01:28,544-Speed 3275.65 samples/sec   Loss 8.0665   LearningRate 0.0800   Epoch: 2   Global Step: 26210   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:01:31,574-Speed 3380.28 samples/sec   Loss 8.0161   LearningRate 0.0800   Epoch: 2   Global Step: 26220   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:01:34,608-Speed 3376.29 samples/sec   Loss 8.0576   LearningRate 0.0800   Epoch: 2   Global Step: 26230   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:01:37,699-Speed 3313.96 samples/sec   Loss 8.0426   LearningRate 0.0800   Epoch: 2   Global Step: 26240   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:01:40,808-Speed 3294.46 samples/sec   Loss 8.0973   LearningRate 0.0800   Epoch: 2   Global Step: 26250   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:01:43,850-Speed 3367.64 samples/sec   Loss 8.0238   LearningRate 0.0800   Epoch: 2   Global Step: 26260   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:01:46,898-Speed 3359.91 samples/sec   Loss 8.1084   LearningRate 0.0800   Epoch: 2   Global Step: 26270   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:01:49,959-Speed 3346.66 samples/sec   Loss 8.0490   LearningRate 0.0800   Epoch: 2   Global Step: 26280   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:01:53,018-Speed 3348.46 samples/sec   Loss 7.8953   LearningRate 0.0800   Epoch: 2   Global Step: 26290   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:01:56,051-Speed 3377.72 samples/sec   Loss 7.9324   LearningRate 0.0799   Epoch: 2   Global Step: 26300   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:01:59,065-Speed 3398.12 samples/sec   Loss 7.9033   LearningRate 0.0799   Epoch: 2   Global Step: 26310   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:02:02,122-Speed 3351.30 samples/sec   Loss 7.9574   LearningRate 0.0799   Epoch: 2   Global Step: 26320   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:02:05,186-Speed 3343.42 samples/sec   Loss 8.0549   LearningRate 0.0799   Epoch: 2   Global Step: 26330   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:02:08,242-Speed 3351.67 samples/sec   Loss 7.9715   LearningRate 0.0799   Epoch: 2   Global Step: 26340   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:02:11,293-Speed 3357.24 samples/sec   Loss 8.0870   LearningRate 0.0799   Epoch: 2   Global Step: 26350   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:02:14,333-Speed 3370.16 samples/sec   Loss 8.1927   LearningRate 0.0799   Epoch: 2   Global Step: 26360   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:02:17,353-Speed 3391.01 samples/sec   Loss 8.2050   LearningRate 0.0799   Epoch: 2   Global Step: 26370   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:02:20,374-Speed 3391.39 samples/sec   Loss 8.0219   LearningRate 0.0799   Epoch: 2   Global Step: 26380   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:02:23,437-Speed 3344.23 samples/sec   Loss 8.0157   LearningRate 0.0799   Epoch: 2   Global Step: 26390   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:02:26,453-Speed 3395.19 samples/sec   Loss 8.0795   LearningRate 0.0799   Epoch: 2   Global Step: 26400   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:02:29,471-Speed 3394.91 samples/sec   Loss 7.9810   LearningRate 0.0799   Epoch: 2   Global Step: 26410   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:02:32,488-Speed 3395.03 samples/sec   Loss 8.0806   LearningRate 0.0799   Epoch: 2   Global Step: 26420   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:02:35,498-Speed 3403.00 samples/sec   Loss 8.0635   LearningRate 0.0799   Epoch: 2   Global Step: 26430   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:02:38,554-Speed 3351.61 samples/sec   Loss 8.0958   LearningRate 0.0798   Epoch: 2   Global Step: 26440   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:02:41,642-Speed 3317.64 samples/sec   Loss 8.1554   LearningRate 0.0798   Epoch: 2   Global Step: 26450   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:02:44,676-Speed 3375.75 samples/sec   Loss 8.1784   LearningRate 0.0798   Epoch: 2   Global Step: 26460   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:02:47,688-Speed 3401.27 samples/sec   Loss 8.0284   LearningRate 0.0798   Epoch: 2   Global Step: 26470   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:02:50,710-Speed 3389.98 samples/sec   Loss 8.1288   LearningRate 0.0798   Epoch: 2   Global Step: 26480   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:02:53,728-Speed 3393.07 samples/sec   Loss 7.9987   LearningRate 0.0798   Epoch: 2   Global Step: 26490   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:02:56,773-Speed 3363.64 samples/sec   Loss 8.0799   LearningRate 0.0798   Epoch: 2   Global Step: 26500   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:02:59,838-Speed 3342.16 samples/sec   Loss 8.0263   LearningRate 0.0798   Epoch: 2   Global Step: 26510   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:03:02,867-Speed 3381.65 samples/sec   Loss 8.1630   LearningRate 0.0798   Epoch: 2   Global Step: 26520   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:03:05,906-Speed 3371.02 samples/sec   Loss 8.1752   LearningRate 0.0798   Epoch: 2   Global Step: 26530   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:03:08,911-Speed 3408.37 samples/sec   Loss 8.0703   LearningRate 0.0798   Epoch: 2   Global Step: 26540   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:03:11,969-Speed 3350.86 samples/sec   Loss 8.0604   LearningRate 0.0798   Epoch: 2   Global Step: 26550   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:03:15,029-Speed 3346.43 samples/sec   Loss 8.0499   LearningRate 0.0798   Epoch: 2   Global Step: 26560   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:03:18,085-Speed 3352.87 samples/sec   Loss 8.0735   LearningRate 0.0798   Epoch: 2   Global Step: 26570   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:03:21,072-Speed 3429.51 samples/sec   Loss 8.1454   LearningRate 0.0797   Epoch: 2   Global Step: 26580   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:03:24,123-Speed 3356.68 samples/sec   Loss 8.0803   LearningRate 0.0797   Epoch: 2   Global Step: 26590   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:03:27,166-Speed 3366.38 samples/sec   Loss 8.1976   LearningRate 0.0797   Epoch: 2   Global Step: 26600   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:03:30,303-Speed 3265.45 samples/sec   Loss 8.0523   LearningRate 0.0797   Epoch: 2   Global Step: 26610   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:03:33,330-Speed 3384.32 samples/sec   Loss 8.0245   LearningRate 0.0797   Epoch: 2   Global Step: 26620   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:03:36,350-Speed 3391.89 samples/sec   Loss 8.1664   LearningRate 0.0797   Epoch: 2   Global Step: 26630   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:03:39,409-Speed 3348.21 samples/sec   Loss 8.0094   LearningRate 0.0797   Epoch: 2   Global Step: 26640   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:03:42,440-Speed 3380.66 samples/sec   Loss 8.1076   LearningRate 0.0797   Epoch: 2   Global Step: 26650   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:03:45,452-Speed 3400.09 samples/sec   Loss 7.9597   LearningRate 0.0797   Epoch: 2   Global Step: 26660   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:03:48,500-Speed 3361.46 samples/sec   Loss 8.0711   LearningRate 0.0797   Epoch: 2   Global Step: 26670   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:03:51,511-Speed 3401.28 samples/sec   Loss 8.2587   LearningRate 0.0797   Epoch: 2   Global Step: 26680   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:03:54,555-Speed 3365.66 samples/sec   Loss 8.2832   LearningRate 0.0797   Epoch: 2   Global Step: 26690   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:03:57,564-Speed 3403.91 samples/sec   Loss 8.0628   LearningRate 0.0797   Epoch: 2   Global Step: 26700   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:04:00,594-Speed 3380.34 samples/sec   Loss 8.1793   LearningRate 0.0797   Epoch: 2   Global Step: 26710   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:04:03,649-Speed 3353.66 samples/sec   Loss 8.1481   LearningRate 0.0796   Epoch: 2   Global Step: 26720   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:04:06,648-Speed 3415.02 samples/sec   Loss 8.1685   LearningRate 0.0796   Epoch: 2   Global Step: 26730   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:04:09,639-Speed 3424.65 samples/sec   Loss 8.1050   LearningRate 0.0796   Epoch: 2   Global Step: 26740   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:04:12,667-Speed 3383.03 samples/sec   Loss 8.0822   LearningRate 0.0796   Epoch: 2   Global Step: 26750   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:04:15,739-Speed 3334.58 samples/sec   Loss 8.1434   LearningRate 0.0796   Epoch: 2   Global Step: 26760   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:04:18,781-Speed 3367.42 samples/sec   Loss 8.0992   LearningRate 0.0796   Epoch: 2   Global Step: 26770   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:04:21,792-Speed 3402.37 samples/sec   Loss 8.1079   LearningRate 0.0796   Epoch: 2   Global Step: 26780   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:04:24,802-Speed 3402.81 samples/sec   Loss 8.1020   LearningRate 0.0796   Epoch: 2   Global Step: 26790   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:04:27,850-Speed 3360.70 samples/sec   Loss 8.0622   LearningRate 0.0796   Epoch: 2   Global Step: 26800   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:04:30,845-Speed 3420.27 samples/sec   Loss 8.1468   LearningRate 0.0796   Epoch: 2   Global Step: 26810   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:04:33,840-Speed 3420.00 samples/sec   Loss 8.2214   LearningRate 0.0796   Epoch: 2   Global Step: 26820   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:04:36,926-Speed 3318.79 samples/sec   Loss 8.2519   LearningRate 0.0796   Epoch: 2   Global Step: 26830   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:04:39,928-Speed 3412.64 samples/sec   Loss 8.0556   LearningRate 0.0796   Epoch: 2   Global Step: 26840   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:04:42,984-Speed 3351.22 samples/sec   Loss 8.1679   LearningRate 0.0796   Epoch: 2   Global Step: 26850   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:04:46,017-Speed 3378.47 samples/sec   Loss 8.1597   LearningRate 0.0795   Epoch: 2   Global Step: 26860   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:04:49,045-Speed 3382.30 samples/sec   Loss 8.1889   LearningRate 0.0795   Epoch: 2   Global Step: 26870   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:04:52,104-Speed 3348.87 samples/sec   Loss 8.2128   LearningRate 0.0795   Epoch: 2   Global Step: 26880   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:04:55,146-Speed 3367.34 samples/sec   Loss 8.2461   LearningRate 0.0795   Epoch: 2   Global Step: 26890   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:04:58,146-Speed 3413.83 samples/sec   Loss 8.2398   LearningRate 0.0795   Epoch: 2   Global Step: 26900   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:05:01,264-Speed 3285.50 samples/sec   Loss 8.1190   LearningRate 0.0795   Epoch: 2   Global Step: 26910   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:05:04,323-Speed 3348.69 samples/sec   Loss 8.2606   LearningRate 0.0795   Epoch: 2   Global Step: 26920   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:05:07,347-Speed 3387.26 samples/sec   Loss 8.2171   LearningRate 0.0795   Epoch: 2   Global Step: 26930   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:05:10,408-Speed 3346.12 samples/sec   Loss 8.1779   LearningRate 0.0795   Epoch: 2   Global Step: 26940   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:05:13,450-Speed 3367.77 samples/sec   Loss 8.2080   LearningRate 0.0795   Epoch: 2   Global Step: 26950   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:05:16,549-Speed 3305.09 samples/sec   Loss 8.0870   LearningRate 0.0795   Epoch: 2   Global Step: 26960   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:05:19,577-Speed 3383.71 samples/sec   Loss 8.2325   LearningRate 0.0795   Epoch: 2   Global Step: 26970   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:05:22,602-Speed 3385.86 samples/sec   Loss 8.1480   LearningRate 0.0795   Epoch: 2   Global Step: 26980   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:05:25,597-Speed 3420.07 samples/sec   Loss 8.1564   LearningRate 0.0795   Epoch: 2   Global Step: 26990   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:05:28,647-Speed 3358.12 samples/sec   Loss 8.1314   LearningRate 0.0794   Epoch: 2   Global Step: 27000   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:05:31,640-Speed 3422.84 samples/sec   Loss 8.2442   LearningRate 0.0794   Epoch: 2   Global Step: 27010   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:05:34,637-Speed 3418.30 samples/sec   Loss 8.2845   LearningRate 0.0794   Epoch: 2   Global Step: 27020   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:05:37,679-Speed 3366.49 samples/sec   Loss 8.1898   LearningRate 0.0794   Epoch: 2   Global Step: 27030   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:05:40,764-Speed 3320.21 samples/sec   Loss 8.2589   LearningRate 0.0794   Epoch: 2   Global Step: 27040   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:05:43,781-Speed 3395.35 samples/sec   Loss 8.1703   LearningRate 0.0794   Epoch: 2   Global Step: 27050   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:05:46,792-Speed 3401.68 samples/sec   Loss 8.1776   LearningRate 0.0794   Epoch: 2   Global Step: 27060   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:05:49,807-Speed 3397.27 samples/sec   Loss 8.2874   LearningRate 0.0794   Epoch: 2   Global Step: 27070   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:05:52,820-Speed 3400.55 samples/sec   Loss 8.1265   LearningRate 0.0794   Epoch: 2   Global Step: 27080   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:05:55,864-Speed 3364.42 samples/sec   Loss 8.1522   LearningRate 0.0794   Epoch: 2   Global Step: 27090   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:05:58,880-Speed 3396.58 samples/sec   Loss 8.1123   LearningRate 0.0794   Epoch: 2   Global Step: 27100   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:06:01,914-Speed 3376.34 samples/sec   Loss 8.2080   LearningRate 0.0794   Epoch: 2   Global Step: 27110   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:06:04,938-Speed 3387.67 samples/sec   Loss 8.1044   LearningRate 0.0794   Epoch: 2   Global Step: 27120   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:06:07,956-Speed 3393.88 samples/sec   Loss 8.1835   LearningRate 0.0794   Epoch: 2   Global Step: 27130   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:06:10,956-Speed 3413.53 samples/sec   Loss 8.1804   LearningRate 0.0793   Epoch: 2   Global Step: 27140   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:06:14,027-Speed 3336.02 samples/sec   Loss 8.2510   LearningRate 0.0793   Epoch: 2   Global Step: 27150   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:06:17,102-Speed 3330.92 samples/sec   Loss 8.2093   LearningRate 0.0793   Epoch: 2   Global Step: 27160   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:06:20,141-Speed 3370.75 samples/sec   Loss 8.2036   LearningRate 0.0793   Epoch: 2   Global Step: 27170   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:06:23,204-Speed 3343.41 samples/sec   Loss 8.2845   LearningRate 0.0793   Epoch: 2   Global Step: 27180   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:06:26,263-Speed 3349.02 samples/sec   Loss 8.3393   LearningRate 0.0793   Epoch: 2   Global Step: 27190   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:06:29,277-Speed 3398.29 samples/sec   Loss 8.1905   LearningRate 0.0793   Epoch: 2   Global Step: 27200   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:06:32,325-Speed 3360.41 samples/sec   Loss 8.2490   LearningRate 0.0793   Epoch: 2   Global Step: 27210   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:06:35,380-Speed 3353.72 samples/sec   Loss 8.2459   LearningRate 0.0793   Epoch: 2   Global Step: 27220   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:06:38,429-Speed 3359.51 samples/sec   Loss 8.2937   LearningRate 0.0793   Epoch: 2   Global Step: 27230   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:06:41,453-Speed 3387.24 samples/sec   Loss 8.3321   LearningRate 0.0793   Epoch: 2   Global Step: 27240   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:06:44,476-Speed 3388.00 samples/sec   Loss 8.2785   LearningRate 0.0793   Epoch: 2   Global Step: 27250   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:06:47,537-Speed 3346.94 samples/sec   Loss 8.1873   LearningRate 0.0793   Epoch: 2   Global Step: 27260   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:06:50,638-Speed 3303.05 samples/sec   Loss 8.2590   LearningRate 0.0793   Epoch: 2   Global Step: 27270   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:06:53,725-Speed 3318.46 samples/sec   Loss 8.3179   LearningRate 0.0792   Epoch: 2   Global Step: 27280   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:06:56,793-Speed 3338.81 samples/sec   Loss 8.1970   LearningRate 0.0792   Epoch: 2   Global Step: 27290   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:06:59,833-Speed 3369.37 samples/sec   Loss 8.3047   LearningRate 0.0792   Epoch: 2   Global Step: 27300   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:07:02,869-Speed 3373.65 samples/sec   Loss 8.1018   LearningRate 0.0792   Epoch: 2   Global Step: 27310   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:07:05,920-Speed 3358.19 samples/sec   Loss 8.1852   LearningRate 0.0792   Epoch: 2   Global Step: 27320   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:07:08,956-Speed 3372.88 samples/sec   Loss 8.1982   LearningRate 0.0792   Epoch: 2   Global Step: 27330   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:07:11,973-Speed 3395.15 samples/sec   Loss 8.3876   LearningRate 0.0792   Epoch: 2   Global Step: 27340   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:07:15,061-Speed 3317.08 samples/sec   Loss 8.2514   LearningRate 0.0792   Epoch: 2   Global Step: 27350   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:07:18,093-Speed 3379.14 samples/sec   Loss 8.3245   LearningRate 0.0792   Epoch: 2   Global Step: 27360   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:07:21,120-Speed 3383.49 samples/sec   Loss 8.2691   LearningRate 0.0792   Epoch: 2   Global Step: 27370   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:07:24,164-Speed 3365.81 samples/sec   Loss 8.3551   LearningRate 0.0792   Epoch: 2   Global Step: 27380   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:07:27,194-Speed 3381.02 samples/sec   Loss 8.2694   LearningRate 0.0792   Epoch: 2   Global Step: 27390   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:07:30,286-Speed 3312.09 samples/sec   Loss 8.2805   LearningRate 0.0792   Epoch: 2   Global Step: 27400   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:07:33,329-Speed 3366.75 samples/sec   Loss 8.2402   LearningRate 0.0791   Epoch: 2   Global Step: 27410   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:07:36,355-Speed 3384.68 samples/sec   Loss 8.4055   LearningRate 0.0791   Epoch: 2   Global Step: 27420   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:07:39,441-Speed 3318.78 samples/sec   Loss 8.2180   LearningRate 0.0791   Epoch: 2   Global Step: 27430   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:07:42,510-Speed 3338.02 samples/sec   Loss 8.3026   LearningRate 0.0791   Epoch: 2   Global Step: 27440   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:07:45,527-Speed 3395.82 samples/sec   Loss 8.3944   LearningRate 0.0791   Epoch: 2   Global Step: 27450   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:07:48,634-Speed 3295.77 samples/sec   Loss 8.2301   LearningRate 0.0791   Epoch: 2   Global Step: 27460   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:07:51,718-Speed 3322.49 samples/sec   Loss 8.2024   LearningRate 0.0791   Epoch: 2   Global Step: 27470   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:07:54,826-Speed 3295.51 samples/sec   Loss 8.3236   LearningRate 0.0791   Epoch: 2   Global Step: 27480   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:07:57,890-Speed 3343.38 samples/sec   Loss 8.3555   LearningRate 0.0791   Epoch: 2   Global Step: 27490   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:08:00,917-Speed 3384.56 samples/sec   Loss 8.2507   LearningRate 0.0791   Epoch: 2   Global Step: 27500   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:08:03,948-Speed 3378.59 samples/sec   Loss 8.3349   LearningRate 0.0791   Epoch: 2   Global Step: 27510   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:08:06,964-Speed 3396.31 samples/sec   Loss 8.3303   LearningRate 0.0791   Epoch: 2   Global Step: 27520   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:08:09,961-Speed 3418.06 samples/sec   Loss 8.3404   LearningRate 0.0791   Epoch: 2   Global Step: 27530   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:08:13,038-Speed 3329.54 samples/sec   Loss 8.3292   LearningRate 0.0791   Epoch: 2   Global Step: 27540   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:08:16,143-Speed 3299.15 samples/sec   Loss 8.3577   LearningRate 0.0790   Epoch: 2   Global Step: 27550   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:08:19,218-Speed 3330.38 samples/sec   Loss 8.3923   LearningRate 0.0790   Epoch: 2   Global Step: 27560   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:08:22,238-Speed 3392.14 samples/sec   Loss 8.3936   LearningRate 0.0790   Epoch: 2   Global Step: 27570   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:08:25,266-Speed 3383.05 samples/sec   Loss 8.1948   LearningRate 0.0790   Epoch: 2   Global Step: 27580   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:08:28,285-Speed 3392.53 samples/sec   Loss 8.2478   LearningRate 0.0790   Epoch: 2   Global Step: 27590   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:08:31,370-Speed 3321.23 samples/sec   Loss 8.2998   LearningRate 0.0790   Epoch: 2   Global Step: 27600   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:08:34,391-Speed 3389.89 samples/sec   Loss 8.3160   LearningRate 0.0790   Epoch: 2   Global Step: 27610   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:08:37,486-Speed 3309.44 samples/sec   Loss 8.2315   LearningRate 0.0790   Epoch: 2   Global Step: 27620   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:08:40,584-Speed 3306.44 samples/sec   Loss 8.3535   LearningRate 0.0790   Epoch: 2   Global Step: 27630   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:08:43,618-Speed 3377.05 samples/sec   Loss 8.3849   LearningRate 0.0790   Epoch: 2   Global Step: 27640   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:08:46,644-Speed 3384.90 samples/sec   Loss 8.2537   LearningRate 0.0790   Epoch: 2   Global Step: 27650   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:08:49,656-Speed 3400.74 samples/sec   Loss 8.3132   LearningRate 0.0790   Epoch: 2   Global Step: 27660   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:08:52,686-Speed 3380.32 samples/sec   Loss 8.2873   LearningRate 0.0790   Epoch: 2   Global Step: 27670   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:08:55,753-Speed 3340.58 samples/sec   Loss 8.2645   LearningRate 0.0790   Epoch: 2   Global Step: 27680   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:08:58,769-Speed 3396.60 samples/sec   Loss 8.4680   LearningRate 0.0789   Epoch: 2   Global Step: 27690   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:09:01,806-Speed 3372.04 samples/sec   Loss 8.3028   LearningRate 0.0789   Epoch: 2   Global Step: 27700   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:09:04,796-Speed 3426.98 samples/sec   Loss 8.3551   LearningRate 0.0789   Epoch: 2   Global Step: 27710   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:09:07,821-Speed 3385.40 samples/sec   Loss 8.3462   LearningRate 0.0789   Epoch: 2   Global Step: 27720   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:09:10,892-Speed 3335.57 samples/sec   Loss 8.3496   LearningRate 0.0789   Epoch: 2   Global Step: 27730   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:09:13,984-Speed 3313.31 samples/sec   Loss 8.2407   LearningRate 0.0789   Epoch: 2   Global Step: 27740   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:09:17,034-Speed 3358.34 samples/sec   Loss 8.3069   LearningRate 0.0789   Epoch: 2   Global Step: 27750   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:09:20,042-Speed 3405.37 samples/sec   Loss 8.1602   LearningRate 0.0789   Epoch: 2   Global Step: 27760   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:09:23,078-Speed 3373.67 samples/sec   Loss 8.1835   LearningRate 0.0789   Epoch: 2   Global Step: 27770   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:09:26,090-Speed 3400.05 samples/sec   Loss 8.4769   LearningRate 0.0789   Epoch: 2   Global Step: 27780   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:09:29,094-Speed 3410.86 samples/sec   Loss 8.2016   LearningRate 0.0789   Epoch: 2   Global Step: 27790   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:09:32,111-Speed 3395.00 samples/sec   Loss 8.2474   LearningRate 0.0789   Epoch: 2   Global Step: 27800   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:09:35,110-Speed 3414.83 samples/sec   Loss 8.4018   LearningRate 0.0789   Epoch: 2   Global Step: 27810   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:09:38,149-Speed 3370.81 samples/sec   Loss 8.2479   LearningRate 0.0789   Epoch: 2   Global Step: 27820   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:09:41,180-Speed 3380.38 samples/sec   Loss 8.2774   LearningRate 0.0788   Epoch: 2   Global Step: 27830   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:09:44,208-Speed 3381.78 samples/sec   Loss 8.3148   LearningRate 0.0788   Epoch: 2   Global Step: 27840   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:09:47,264-Speed 3352.18 samples/sec   Loss 8.2189   LearningRate 0.0788   Epoch: 2   Global Step: 27850   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:09:50,321-Speed 3350.15 samples/sec   Loss 8.4045   LearningRate 0.0788   Epoch: 2   Global Step: 27860   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:09:53,401-Speed 3326.05 samples/sec   Loss 8.3548   LearningRate 0.0788   Epoch: 2   Global Step: 27870   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:09:56,451-Speed 3358.21 samples/sec   Loss 8.3282   LearningRate 0.0788   Epoch: 2   Global Step: 27880   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:09:59,444-Speed 3423.15 samples/sec   Loss 8.2039   LearningRate 0.0788   Epoch: 2   Global Step: 27890   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:10:02,478-Speed 3375.60 samples/sec   Loss 8.3409   LearningRate 0.0788   Epoch: 2   Global Step: 27900   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:10:05,558-Speed 3325.87 samples/sec   Loss 8.3429   LearningRate 0.0788   Epoch: 2   Global Step: 27910   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:10:08,565-Speed 3405.92 samples/sec   Loss 8.3428   LearningRate 0.0788   Epoch: 2   Global Step: 27920   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:10:11,592-Speed 3384.15 samples/sec   Loss 8.3640   LearningRate 0.0788   Epoch: 2   Global Step: 27930   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:10:14,655-Speed 3344.29 samples/sec   Loss 8.3462   LearningRate 0.0788   Epoch: 2   Global Step: 27940   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:10:17,720-Speed 3341.68 samples/sec   Loss 8.2155   LearningRate 0.0788   Epoch: 2   Global Step: 27950   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:10:20,797-Speed 3329.33 samples/sec   Loss 8.2817   LearningRate 0.0788   Epoch: 2   Global Step: 27960   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:10:23,853-Speed 3352.05 samples/sec   Loss 8.2512   LearningRate 0.0787   Epoch: 2   Global Step: 27970   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:10:26,899-Speed 3362.18 samples/sec   Loss 8.3786   LearningRate 0.0787   Epoch: 2   Global Step: 27980   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:10:30,022-Speed 3280.55 samples/sec   Loss 8.3030   LearningRate 0.0787   Epoch: 2   Global Step: 27990   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:10:33,019-Speed 3417.05 samples/sec   Loss 8.3687   LearningRate 0.0787   Epoch: 2   Global Step: 28000   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:10:36,042-Speed 3388.69 samples/sec   Loss 8.2956   LearningRate 0.0787   Epoch: 2   Global Step: 28010   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:10:39,093-Speed 3357.01 samples/sec   Loss 8.2535   LearningRate 0.0787   Epoch: 2   Global Step: 28020   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:10:42,190-Speed 3307.50 samples/sec   Loss 8.4433   LearningRate 0.0787   Epoch: 2   Global Step: 28030   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:10:45,216-Speed 3384.93 samples/sec   Loss 8.3820   LearningRate 0.0787   Epoch: 2   Global Step: 28040   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:10:48,214-Speed 3417.00 samples/sec   Loss 8.3187   LearningRate 0.0787   Epoch: 2   Global Step: 28050   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:10:51,221-Speed 3405.92 samples/sec   Loss 8.3980   LearningRate 0.0787   Epoch: 2   Global Step: 28060   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:10:54,322-Speed 3303.55 samples/sec   Loss 8.2986   LearningRate 0.0787   Epoch: 2   Global Step: 28070   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:10:57,352-Speed 3380.88 samples/sec   Loss 8.2715   LearningRate 0.0787   Epoch: 2   Global Step: 28080   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:11:00,433-Speed 3324.38 samples/sec   Loss 8.3173   LearningRate 0.0787   Epoch: 2   Global Step: 28090   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:11:03,486-Speed 3355.15 samples/sec   Loss 8.3577   LearningRate 0.0787   Epoch: 2   Global Step: 28100   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:11:06,614-Speed 3274.72 samples/sec   Loss 8.3179   LearningRate 0.0786   Epoch: 2   Global Step: 28110   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:11:09,605-Speed 3424.11 samples/sec   Loss 8.3943   LearningRate 0.0786   Epoch: 2   Global Step: 28120   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:11:12,686-Speed 3325.21 samples/sec   Loss 8.3479   LearningRate 0.0786   Epoch: 2   Global Step: 28130   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:11:15,725-Speed 3371.02 samples/sec   Loss 8.2338   LearningRate 0.0786   Epoch: 2   Global Step: 28140   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:11:18,740-Speed 3397.30 samples/sec   Loss 8.2789   LearningRate 0.0786   Epoch: 2   Global Step: 28150   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:11:21,759-Speed 3392.86 samples/sec   Loss 8.3044   LearningRate 0.0786   Epoch: 2   Global Step: 28160   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:11:24,828-Speed 3336.64 samples/sec   Loss 8.3168   LearningRate 0.0786   Epoch: 2   Global Step: 28170   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:11:27,909-Speed 3325.06 samples/sec   Loss 8.3394   LearningRate 0.0786   Epoch: 2   Global Step: 28180   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:11:30,949-Speed 3369.77 samples/sec   Loss 8.2304   LearningRate 0.0786   Epoch: 2   Global Step: 28190   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:11:33,956-Speed 3405.93 samples/sec   Loss 8.4824   LearningRate 0.0786   Epoch: 2   Global Step: 28200   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:11:36,956-Speed 3414.71 samples/sec   Loss 8.4109   LearningRate 0.0786   Epoch: 2   Global Step: 28210   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:11:39,986-Speed 3380.85 samples/sec   Loss 8.4441   LearningRate 0.0786   Epoch: 2   Global Step: 28220   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:11:43,024-Speed 3371.65 samples/sec   Loss 8.3600   LearningRate 0.0786   Epoch: 2   Global Step: 28230   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:11:46,052-Speed 3382.16 samples/sec   Loss 8.2822   LearningRate 0.0786   Epoch: 2   Global Step: 28240   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:11:49,103-Speed 3357.31 samples/sec   Loss 8.2439   LearningRate 0.0785   Epoch: 2   Global Step: 28250   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:11:52,159-Speed 3352.70 samples/sec   Loss 8.3291   LearningRate 0.0785   Epoch: 2   Global Step: 28260   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:11:55,213-Speed 3353.39 samples/sec   Loss 8.4451   LearningRate 0.0785   Epoch: 2   Global Step: 28270   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:11:58,239-Speed 3384.91 samples/sec   Loss 8.4497   LearningRate 0.0785   Epoch: 2   Global Step: 28280   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:12:01,287-Speed 3361.10 samples/sec   Loss 8.3246   LearningRate 0.0785   Epoch: 2   Global Step: 28290   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:12:04,312-Speed 3385.33 samples/sec   Loss 8.3870   LearningRate 0.0785   Epoch: 2   Global Step: 28300   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:12:07,341-Speed 3382.16 samples/sec   Loss 8.3629   LearningRate 0.0785   Epoch: 2   Global Step: 28310   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:12:10,351-Speed 3403.14 samples/sec   Loss 8.5255   LearningRate 0.0785   Epoch: 2   Global Step: 28320   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:12:13,379-Speed 3382.34 samples/sec   Loss 8.3513   LearningRate 0.0785   Epoch: 2   Global Step: 28330   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:12:16,445-Speed 3341.77 samples/sec   Loss 8.4725   LearningRate 0.0785   Epoch: 2   Global Step: 28340   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:12:19,512-Speed 3338.91 samples/sec   Loss 8.3987   LearningRate 0.0785   Epoch: 2   Global Step: 28350   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:12:22,541-Speed 3381.93 samples/sec   Loss 8.3899   LearningRate 0.0785   Epoch: 2   Global Step: 28360   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:12:25,556-Speed 3398.06 samples/sec   Loss 8.3603   LearningRate 0.0785   Epoch: 2   Global Step: 28370   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:12:28,612-Speed 3351.80 samples/sec   Loss 8.2480   LearningRate 0.0785   Epoch: 2   Global Step: 28380   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:12:31,669-Speed 3350.78 samples/sec   Loss 8.3036   LearningRate 0.0784   Epoch: 2   Global Step: 28390   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:12:34,689-Speed 3391.93 samples/sec   Loss 8.3768   LearningRate 0.0784   Epoch: 2   Global Step: 28400   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:12:37,703-Speed 3398.66 samples/sec   Loss 8.3337   LearningRate 0.0784   Epoch: 2   Global Step: 28410   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:12:40,745-Speed 3366.54 samples/sec   Loss 8.2542   LearningRate 0.0784   Epoch: 2   Global Step: 28420   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:12:43,762-Speed 3395.21 samples/sec   Loss 8.3832   LearningRate 0.0784   Epoch: 2   Global Step: 28430   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:12:46,809-Speed 3361.80 samples/sec   Loss 8.5299   LearningRate 0.0784   Epoch: 2   Global Step: 28440   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:12:49,890-Speed 3324.40 samples/sec   Loss 8.3150   LearningRate 0.0784   Epoch: 2   Global Step: 28450   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:12:52,943-Speed 3355.75 samples/sec   Loss 8.3183   LearningRate 0.0784   Epoch: 2   Global Step: 28460   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:12:55,958-Speed 3397.73 samples/sec   Loss 8.3402   LearningRate 0.0784   Epoch: 2   Global Step: 28470   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:12:58,977-Speed 3392.14 samples/sec   Loss 8.3324   LearningRate 0.0784   Epoch: 2   Global Step: 28480   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:13:02,057-Speed 3326.09 samples/sec   Loss 8.2944   LearningRate 0.0784   Epoch: 2   Global Step: 28490   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:13:05,095-Speed 3371.97 samples/sec   Loss 8.4748   LearningRate 0.0784   Epoch: 2   Global Step: 28500   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:13:08,114-Speed 3392.51 samples/sec   Loss 8.3480   LearningRate 0.0784   Epoch: 2   Global Step: 28510   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:13:11,158-Speed 3365.80 samples/sec   Loss 8.4724   LearningRate 0.0784   Epoch: 2   Global Step: 28520   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:13:14,234-Speed 3329.75 samples/sec   Loss 8.4937   LearningRate 0.0783   Epoch: 2   Global Step: 28530   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:13:17,314-Speed 3325.82 samples/sec   Loss 8.3653   LearningRate 0.0783   Epoch: 2   Global Step: 28540   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:13:20,356-Speed 3366.70 samples/sec   Loss 8.2732   LearningRate 0.0783   Epoch: 2   Global Step: 28550   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:13:23,375-Speed 3392.95 samples/sec   Loss 8.3427   LearningRate 0.0783   Epoch: 2   Global Step: 28560   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:13:26,383-Speed 3406.31 samples/sec   Loss 8.4411   LearningRate 0.0783   Epoch: 2   Global Step: 28570   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:13:29,389-Speed 3406.67 samples/sec   Loss 8.3953   LearningRate 0.0783   Epoch: 2   Global Step: 28580   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:13:32,386-Speed 3418.29 samples/sec   Loss 8.3418   LearningRate 0.0783   Epoch: 2   Global Step: 28590   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:13:35,412-Speed 3385.09 samples/sec   Loss 8.4312   LearningRate 0.0783   Epoch: 2   Global Step: 28600   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:13:38,520-Speed 3295.31 samples/sec   Loss 8.4879   LearningRate 0.0783   Epoch: 2   Global Step: 28610   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:13:41,583-Speed 3344.52 samples/sec   Loss 8.2991   LearningRate 0.0783   Epoch: 2   Global Step: 28620   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:13:44,594-Speed 3401.56 samples/sec   Loss 8.3812   LearningRate 0.0783   Epoch: 2   Global Step: 28630   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:13:47,643-Speed 3359.62 samples/sec   Loss 8.3685   LearningRate 0.0783   Epoch: 2   Global Step: 28640   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:13:50,694-Speed 3357.40 samples/sec   Loss 8.3439   LearningRate 0.0783   Epoch: 2   Global Step: 28650   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:13:53,766-Speed 3334.52 samples/sec   Loss 8.4163   LearningRate 0.0783   Epoch: 2   Global Step: 28660   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:13:56,763-Speed 3418.26 samples/sec   Loss 8.3767   LearningRate 0.0783   Epoch: 2   Global Step: 28670   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:13:59,762-Speed 3415.16 samples/sec   Loss 8.3698   LearningRate 0.0782   Epoch: 2   Global Step: 28680   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:14:02,765-Speed 3410.71 samples/sec   Loss 8.2274   LearningRate 0.0782   Epoch: 2   Global Step: 28690   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:14:05,807-Speed 3367.77 samples/sec   Loss 8.4007   LearningRate 0.0782   Epoch: 2   Global Step: 28700   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:14:08,799-Speed 3423.99 samples/sec   Loss 8.3577   LearningRate 0.0782   Epoch: 2   Global Step: 28710   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:14:11,878-Speed 3326.22 samples/sec   Loss 8.2448   LearningRate 0.0782   Epoch: 2   Global Step: 28720   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:14:14,959-Speed 3324.85 samples/sec   Loss 8.4621   LearningRate 0.0782   Epoch: 2   Global Step: 28730   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:14:17,976-Speed 3394.47 samples/sec   Loss 8.4714   LearningRate 0.0782   Epoch: 2   Global Step: 28740   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:14:20,990-Speed 3398.91 samples/sec   Loss 8.3998   LearningRate 0.0782   Epoch: 2   Global Step: 28750   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:14:24,006-Speed 3396.31 samples/sec   Loss 8.3298   LearningRate 0.0782   Epoch: 2   Global Step: 28760   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:14:27,002-Speed 3418.82 samples/sec   Loss 8.4942   LearningRate 0.0782   Epoch: 2   Global Step: 28770   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:14:30,093-Speed 3313.68 samples/sec   Loss 8.4688   LearningRate 0.0782   Epoch: 2   Global Step: 28780   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:14:33,133-Speed 3369.66 samples/sec   Loss 8.5069   LearningRate 0.0782   Epoch: 2   Global Step: 28790   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:14:36,170-Speed 3373.77 samples/sec   Loss 8.3842   LearningRate 0.0782   Epoch: 2   Global Step: 28800   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:14:39,209-Speed 3369.73 samples/sec   Loss 8.3533   LearningRate 0.0782   Epoch: 2   Global Step: 28810   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:14:42,321-Speed 3292.16 samples/sec   Loss 8.3841   LearningRate 0.0781   Epoch: 2   Global Step: 28820   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:14:45,356-Speed 3374.68 samples/sec   Loss 8.3939   LearningRate 0.0781   Epoch: 2   Global Step: 28830   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:14:48,461-Speed 3299.53 samples/sec   Loss 8.4377   LearningRate 0.0781   Epoch: 2   Global Step: 28840   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:14:51,502-Speed 3367.91 samples/sec   Loss 8.4262   LearningRate 0.0781   Epoch: 2   Global Step: 28850   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:14:54,513-Speed 3402.90 samples/sec   Loss 8.3815   LearningRate 0.0781   Epoch: 2   Global Step: 28860   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:14:57,549-Speed 3373.39 samples/sec   Loss 8.4595   LearningRate 0.0781   Epoch: 2   Global Step: 28870   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:15:00,562-Speed 3399.26 samples/sec   Loss 8.3679   LearningRate 0.0781   Epoch: 2   Global Step: 28880   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:15:03,598-Speed 3374.95 samples/sec   Loss 8.3173   LearningRate 0.0781   Epoch: 2   Global Step: 28890   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:15:06,623-Speed 3385.65 samples/sec   Loss 8.3582   LearningRate 0.0781   Epoch: 2   Global Step: 28900   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:15:09,633-Speed 3403.20 samples/sec   Loss 8.3072   LearningRate 0.0781   Epoch: 2   Global Step: 28910   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:15:12,718-Speed 3320.32 samples/sec   Loss 8.4800   LearningRate 0.0781   Epoch: 2   Global Step: 28920   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:15:15,783-Speed 3342.74 samples/sec   Loss 8.4364   LearningRate 0.0781   Epoch: 2   Global Step: 28930   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:15:18,815-Speed 3378.12 samples/sec   Loss 8.4733   LearningRate 0.0781   Epoch: 2   Global Step: 28940   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:15:21,803-Speed 3428.07 samples/sec   Loss 8.2661   LearningRate 0.0781   Epoch: 2   Global Step: 28950   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:15:24,846-Speed 3366.57 samples/sec   Loss 8.4546   LearningRate 0.0780   Epoch: 2   Global Step: 28960   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:15:27,861-Speed 3397.51 samples/sec   Loss 8.3380   LearningRate 0.0780   Epoch: 2   Global Step: 28970   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:15:30,954-Speed 3310.85 samples/sec   Loss 8.4865   LearningRate 0.0780   Epoch: 2   Global Step: 28980   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:15:34,009-Speed 3353.78 samples/sec   Loss 8.4682   LearningRate 0.0780   Epoch: 2   Global Step: 28990   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:15:37,130-Speed 3282.08 samples/sec   Loss 8.4195   LearningRate 0.0780   Epoch: 2   Global Step: 29000   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:15:40,160-Speed 3380.49 samples/sec   Loss 8.4303   LearningRate 0.0780   Epoch: 2   Global Step: 29010   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:15:43,209-Speed 3359.50 samples/sec   Loss 8.4834   LearningRate 0.0780   Epoch: 2   Global Step: 29020   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:15:46,236-Speed 3383.32 samples/sec   Loss 8.4649   LearningRate 0.0780   Epoch: 2   Global Step: 29030   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:15:49,267-Speed 3379.80 samples/sec   Loss 8.4813   LearningRate 0.0780   Epoch: 2   Global Step: 29040   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:15:52,269-Speed 3412.55 samples/sec   Loss 8.4114   LearningRate 0.0780   Epoch: 2   Global Step: 29050   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:15:55,303-Speed 3375.79 samples/sec   Loss 8.4263   LearningRate 0.0780   Epoch: 2   Global Step: 29060   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:15:58,325-Speed 3389.54 samples/sec   Loss 8.2878   LearningRate 0.0780   Epoch: 2   Global Step: 29070   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:16:01,380-Speed 3352.47 samples/sec   Loss 8.3969   LearningRate 0.0780   Epoch: 2   Global Step: 29080   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:16:04,444-Speed 3343.56 samples/sec   Loss 8.3365   LearningRate 0.0780   Epoch: 2   Global Step: 29090   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:16:07,540-Speed 3308.38 samples/sec   Loss 8.4850   LearningRate 0.0779   Epoch: 2   Global Step: 29100   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:16:10,552-Speed 3400.70 samples/sec   Loss 8.4048   LearningRate 0.0779   Epoch: 2   Global Step: 29110   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:16:13,587-Speed 3374.64 samples/sec   Loss 8.3541   LearningRate 0.0779   Epoch: 2   Global Step: 29120   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:16:16,607-Speed 3392.52 samples/sec   Loss 8.3597   LearningRate 0.0779   Epoch: 2   Global Step: 29130   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:16:19,685-Speed 3327.64 samples/sec   Loss 8.4108   LearningRate 0.0779   Epoch: 2   Global Step: 29140   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:16:22,700-Speed 3397.32 samples/sec   Loss 8.4733   LearningRate 0.0779   Epoch: 2   Global Step: 29150   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:16:25,746-Speed 3363.22 samples/sec   Loss 8.3276   LearningRate 0.0779   Epoch: 2   Global Step: 29160   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:16:28,754-Speed 3405.58 samples/sec   Loss 8.5211   LearningRate 0.0779   Epoch: 2   Global Step: 29170   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:16:31,817-Speed 3344.32 samples/sec   Loss 8.3719   LearningRate 0.0779   Epoch: 2   Global Step: 29180   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:16:34,833-Speed 3395.55 samples/sec   Loss 8.5845   LearningRate 0.0779   Epoch: 2   Global Step: 29190   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:16:37,926-Speed 3312.48 samples/sec   Loss 8.4057   LearningRate 0.0779   Epoch: 2   Global Step: 29200   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:16:40,944-Speed 3393.33 samples/sec   Loss 8.5418   LearningRate 0.0779   Epoch: 2   Global Step: 29210   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:16:43,952-Speed 3405.36 samples/sec   Loss 8.4220   LearningRate 0.0779   Epoch: 2   Global Step: 29220   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:16:46,972-Speed 3392.76 samples/sec   Loss 8.4693   LearningRate 0.0779   Epoch: 2   Global Step: 29230   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:16:50,097-Speed 3277.27 samples/sec   Loss 8.3287   LearningRate 0.0778   Epoch: 2   Global Step: 29240   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:16:53,129-Speed 3378.88 samples/sec   Loss 8.4219   LearningRate 0.0778   Epoch: 2   Global Step: 29250   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:16:56,158-Speed 3381.95 samples/sec   Loss 8.4972   LearningRate 0.0778   Epoch: 2   Global Step: 29260   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:16:59,162-Speed 3409.20 samples/sec   Loss 8.4668   LearningRate 0.0778   Epoch: 2   Global Step: 29270   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:17:02,216-Speed 3353.99 samples/sec   Loss 8.3807   LearningRate 0.0778   Epoch: 2   Global Step: 29280   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:17:05,300-Speed 3321.25 samples/sec   Loss 8.3931   LearningRate 0.0778   Epoch: 2   Global Step: 29290   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:17:08,333-Speed 3377.62 samples/sec   Loss 8.3932   LearningRate 0.0778   Epoch: 2   Global Step: 29300   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:17:11,361-Speed 3382.61 samples/sec   Loss 8.5489   LearningRate 0.0778   Epoch: 2   Global Step: 29310   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:17:14,416-Speed 3353.21 samples/sec   Loss 8.3958   LearningRate 0.0778   Epoch: 2   Global Step: 29320   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:17:17,483-Speed 3339.08 samples/sec   Loss 8.4088   LearningRate 0.0778   Epoch: 2   Global Step: 29330   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:17:20,533-Speed 3359.33 samples/sec   Loss 8.3734   LearningRate 0.0778   Epoch: 2   Global Step: 29340   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:17:23,576-Speed 3366.02 samples/sec   Loss 8.3970   LearningRate 0.0778   Epoch: 2   Global Step: 29350   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:17:26,617-Speed 3367.44 samples/sec   Loss 8.3871   LearningRate 0.0778   Epoch: 2   Global Step: 29360   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:17:29,635-Speed 3394.55 samples/sec   Loss 8.4792   LearningRate 0.0778   Epoch: 2   Global Step: 29370   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:17:32,638-Speed 3410.34 samples/sec   Loss 8.3580   LearningRate 0.0777   Epoch: 2   Global Step: 29380   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:17:35,655-Speed 3395.52 samples/sec   Loss 8.4120   LearningRate 0.0777   Epoch: 2   Global Step: 29390   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:17:38,770-Speed 3288.89 samples/sec   Loss 8.4355   LearningRate 0.0777   Epoch: 2   Global Step: 29400   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:17:41,814-Speed 3364.84 samples/sec   Loss 8.4549   LearningRate 0.0777   Epoch: 2   Global Step: 29410   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:17:44,809-Speed 3420.30 samples/sec   Loss 8.4332   LearningRate 0.0777   Epoch: 2   Global Step: 29420   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:17:47,861-Speed 3355.69 samples/sec   Loss 8.3765   LearningRate 0.0777   Epoch: 2   Global Step: 29430   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:17:50,922-Speed 3347.01 samples/sec   Loss 8.5056   LearningRate 0.0777   Epoch: 2   Global Step: 29440   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:17:54,018-Speed 3309.04 samples/sec   Loss 8.3388   LearningRate 0.0777   Epoch: 2   Global Step: 29450   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:17:57,083-Speed 3341.76 samples/sec   Loss 8.5000   LearningRate 0.0777   Epoch: 2   Global Step: 29460   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:18:00,092-Speed 3403.13 samples/sec   Loss 8.5296   LearningRate 0.0777   Epoch: 2   Global Step: 29470   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:18:03,189-Speed 3307.35 samples/sec   Loss 8.3458   LearningRate 0.0777   Epoch: 2   Global Step: 29480   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:18:06,243-Speed 3355.20 samples/sec   Loss 8.3382   LearningRate 0.0777   Epoch: 2   Global Step: 29490   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:18:09,238-Speed 3420.00 samples/sec   Loss 8.3297   LearningRate 0.0777   Epoch: 2   Global Step: 29500   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:18:12,266-Speed 3382.41 samples/sec   Loss 8.4092   LearningRate 0.0777   Epoch: 2   Global Step: 29510   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:18:15,295-Speed 3381.35 samples/sec   Loss 8.4009   LearningRate 0.0776   Epoch: 2   Global Step: 29520   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:18:18,423-Speed 3274.82 samples/sec   Loss 8.3303   LearningRate 0.0776   Epoch: 2   Global Step: 29530   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:18:21,459-Speed 3374.22 samples/sec   Loss 8.4609   LearningRate 0.0776   Epoch: 2   Global Step: 29540   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:18:24,529-Speed 3336.62 samples/sec   Loss 8.3352   LearningRate 0.0776   Epoch: 2   Global Step: 29550   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:18:27,562-Speed 3376.70 samples/sec   Loss 8.4794   LearningRate 0.0776   Epoch: 2   Global Step: 29560   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:18:30,644-Speed 3323.88 samples/sec   Loss 8.4715   LearningRate 0.0776   Epoch: 2   Global Step: 29570   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:18:33,682-Speed 3372.06 samples/sec   Loss 8.5037   LearningRate 0.0776   Epoch: 2   Global Step: 29580   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:18:36,738-Speed 3351.77 samples/sec   Loss 8.3840   LearningRate 0.0776   Epoch: 2   Global Step: 29590   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:18:39,746-Speed 3404.42 samples/sec   Loss 8.4168   LearningRate 0.0776   Epoch: 2   Global Step: 29600   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:18:42,777-Speed 3380.05 samples/sec   Loss 8.4139   LearningRate 0.0776   Epoch: 2   Global Step: 29610   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:18:45,793-Speed 3395.85 samples/sec   Loss 8.4705   LearningRate 0.0776   Epoch: 2   Global Step: 29620   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:18:48,798-Speed 3409.16 samples/sec   Loss 8.4865   LearningRate 0.0776   Epoch: 2   Global Step: 29630   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:18:51,843-Speed 3364.32 samples/sec   Loss 8.3468   LearningRate 0.0776   Epoch: 2   Global Step: 29640   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:18:54,876-Speed 3376.55 samples/sec   Loss 8.3764   LearningRate 0.0776   Epoch: 2   Global Step: 29650   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:18:57,869-Speed 3422.78 samples/sec   Loss 8.4889   LearningRate 0.0775   Epoch: 2   Global Step: 29660   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:19:00,919-Speed 3358.71 samples/sec   Loss 8.5087   LearningRate 0.0775   Epoch: 2   Global Step: 29670   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:19:03,954-Speed 3374.55 samples/sec   Loss 8.4288   LearningRate 0.0775   Epoch: 2   Global Step: 29680   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:19:06,956-Speed 3413.04 samples/sec   Loss 8.3953   LearningRate 0.0775   Epoch: 2   Global Step: 29690   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:19:09,957-Speed 3412.26 samples/sec   Loss 8.4378   LearningRate 0.0775   Epoch: 2   Global Step: 29700   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:19:13,002-Speed 3364.32 samples/sec   Loss 8.3958   LearningRate 0.0775   Epoch: 2   Global Step: 29710   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:19:16,090-Speed 3317.17 samples/sec   Loss 8.4687   LearningRate 0.0775   Epoch: 2   Global Step: 29720   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:19:19,143-Speed 3355.73 samples/sec   Loss 8.4147   LearningRate 0.0775   Epoch: 2   Global Step: 29730   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:19:22,157-Speed 3398.87 samples/sec   Loss 8.3758   LearningRate 0.0775   Epoch: 2   Global Step: 29740   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:19:25,211-Speed 3353.52 samples/sec   Loss 8.4006   LearningRate 0.0775   Epoch: 2   Global Step: 29750   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:19:28,229-Speed 3394.17 samples/sec   Loss 8.2725   LearningRate 0.0775   Epoch: 2   Global Step: 29760   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:19:31,327-Speed 3306.54 samples/sec   Loss 8.4639   LearningRate 0.0775   Epoch: 2   Global Step: 29770   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:19:34,400-Speed 3332.98 samples/sec   Loss 8.3900   LearningRate 0.0775   Epoch: 2   Global Step: 29780   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:19:37,414-Speed 3399.10 samples/sec   Loss 8.4298   LearningRate 0.0775   Epoch: 2   Global Step: 29790   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:19:40,431-Speed 3395.22 samples/sec   Loss 8.4025   LearningRate 0.0774   Epoch: 2   Global Step: 29800   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:19:43,448-Speed 3395.08 samples/sec   Loss 8.3904   LearningRate 0.0774   Epoch: 2   Global Step: 29810   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:19:46,451-Speed 3411.32 samples/sec   Loss 8.5249   LearningRate 0.0774   Epoch: 2   Global Step: 29820   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:19:49,502-Speed 3356.64 samples/sec   Loss 8.3461   LearningRate 0.0774   Epoch: 2   Global Step: 29830   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:19:52,526-Speed 3387.21 samples/sec   Loss 8.4746   LearningRate 0.0774   Epoch: 2   Global Step: 29840   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:19:55,585-Speed 3349.61 samples/sec   Loss 8.4294   LearningRate 0.0774   Epoch: 2   Global Step: 29850   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:19:58,582-Speed 3417.13 samples/sec   Loss 8.2852   LearningRate 0.0774   Epoch: 2   Global Step: 29860   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:20:01,605-Speed 3388.26 samples/sec   Loss 8.4171   LearningRate 0.0774   Epoch: 2   Global Step: 29870   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:20:04,675-Speed 3336.81 samples/sec   Loss 8.3928   LearningRate 0.0774   Epoch: 2   Global Step: 29880   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:20:07,727-Speed 3355.63 samples/sec   Loss 8.4798   LearningRate 0.0774   Epoch: 2   Global Step: 29890   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:20:10,769-Speed 3367.31 samples/sec   Loss 8.3747   LearningRate 0.0774   Epoch: 2   Global Step: 29900   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:20:13,797-Speed 3383.76 samples/sec   Loss 8.4823   LearningRate 0.0774   Epoch: 2   Global Step: 29910   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:20:16,844-Speed 3361.52 samples/sec   Loss 8.3366   LearningRate 0.0774   Epoch: 2   Global Step: 29920   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:20:19,868-Speed 3387.50 samples/sec   Loss 8.4397   LearningRate 0.0774   Epoch: 2   Global Step: 29930   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:20:22,917-Speed 3360.00 samples/sec   Loss 8.2529   LearningRate 0.0773   Epoch: 2   Global Step: 29940   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:20:25,931-Speed 3398.35 samples/sec   Loss 8.4465   LearningRate 0.0773   Epoch: 2   Global Step: 29950   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:20:28,963-Speed 3378.91 samples/sec   Loss 8.2251   LearningRate 0.0773   Epoch: 2   Global Step: 29960   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:20:32,004-Speed 3368.61 samples/sec   Loss 8.3968   LearningRate 0.0773   Epoch: 2   Global Step: 29970   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:20:35,049-Speed 3364.09 samples/sec   Loss 8.4022   LearningRate 0.0773   Epoch: 2   Global Step: 29980   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:20:38,119-Speed 3336.32 samples/sec   Loss 8.5302   LearningRate 0.0773   Epoch: 2   Global Step: 29990   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:20:41,147-Speed 3382.42 samples/sec   Loss 8.4842   LearningRate 0.0773   Epoch: 2   Global Step: 30000   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:20:44,189-Speed 3367.57 samples/sec   Loss 8.5327   LearningRate 0.0773   Epoch: 2   Global Step: 30010   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:20:47,197-Speed 3405.58 samples/sec   Loss 8.4237   LearningRate 0.0773   Epoch: 2   Global Step: 30020   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:20:50,237-Speed 3369.54 samples/sec   Loss 8.3699   LearningRate 0.0773   Epoch: 2   Global Step: 30030   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:20:53,289-Speed 3356.36 samples/sec   Loss 8.4548   LearningRate 0.0773   Epoch: 2   Global Step: 30040   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:20:56,294-Speed 3408.32 samples/sec   Loss 8.4221   LearningRate 0.0773   Epoch: 2   Global Step: 30050   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:20:59,316-Speed 3390.04 samples/sec   Loss 8.4190   LearningRate 0.0773   Epoch: 2   Global Step: 30060   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:21:02,345-Speed 3381.72 samples/sec   Loss 8.3799   LearningRate 0.0773   Epoch: 2   Global Step: 30070   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:21:05,430-Speed 3320.85 samples/sec   Loss 8.3197   LearningRate 0.0772   Epoch: 2   Global Step: 30080   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:21:08,496-Speed 3340.36 samples/sec   Loss 8.4114   LearningRate 0.0772   Epoch: 2   Global Step: 30090   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:21:11,585-Speed 3315.63 samples/sec   Loss 8.5202   LearningRate 0.0772   Epoch: 2   Global Step: 30100   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:21:14,638-Speed 3355.94 samples/sec   Loss 8.4798   LearningRate 0.0772   Epoch: 2   Global Step: 30110   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:21:17,740-Speed 3301.69 samples/sec   Loss 8.4160   LearningRate 0.0772   Epoch: 2   Global Step: 30120   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:21:20,774-Speed 3375.55 samples/sec   Loss 8.5384   LearningRate 0.0772   Epoch: 2   Global Step: 30130   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:21:23,795-Speed 3391.48 samples/sec   Loss 8.5427   LearningRate 0.0772   Epoch: 2   Global Step: 30140   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:21:26,822-Speed 3384.27 samples/sec   Loss 8.3561   LearningRate 0.0772   Epoch: 2   Global Step: 30150   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:21:29,841-Speed 3392.01 samples/sec   Loss 8.4888   LearningRate 0.0772   Epoch: 2   Global Step: 30160   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:21:32,855-Speed 3398.88 samples/sec   Loss 8.5245   LearningRate 0.0772   Epoch: 2   Global Step: 30170   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:21:35,946-Speed 3314.12 samples/sec   Loss 8.5348   LearningRate 0.0772   Epoch: 2   Global Step: 30180   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:21:39,018-Speed 3334.58 samples/sec   Loss 8.5691   LearningRate 0.0772   Epoch: 2   Global Step: 30190   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:21:42,061-Speed 3366.42 samples/sec   Loss 8.4088   LearningRate 0.0772   Epoch: 2   Global Step: 30200   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:21:45,097-Speed 3373.48 samples/sec   Loss 8.3307   LearningRate 0.0772   Epoch: 2   Global Step: 30210   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:21:48,154-Speed 3350.97 samples/sec   Loss 8.4270   LearningRate 0.0772   Epoch: 2   Global Step: 30220   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:21:51,182-Speed 3383.18 samples/sec   Loss 8.3823   LearningRate 0.0771   Epoch: 2   Global Step: 30230   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:21:54,233-Speed 3357.48 samples/sec   Loss 8.3391   LearningRate 0.0771   Epoch: 2   Global Step: 30240   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:21:57,246-Speed 3399.10 samples/sec   Loss 8.5387   LearningRate 0.0771   Epoch: 2   Global Step: 30250   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:22:00,308-Speed 3345.81 samples/sec   Loss 8.4194   LearningRate 0.0771   Epoch: 2   Global Step: 30260   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:22:03,335-Speed 3384.29 samples/sec   Loss 8.3780   LearningRate 0.0771   Epoch: 2   Global Step: 30270   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:22:06,405-Speed 3336.60 samples/sec   Loss 8.4798   LearningRate 0.0771   Epoch: 2   Global Step: 30280   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:22:09,411-Speed 3407.54 samples/sec   Loss 8.5563   LearningRate 0.0771   Epoch: 2   Global Step: 30290   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:22:12,427-Speed 3396.40 samples/sec   Loss 8.3332   LearningRate 0.0771   Epoch: 2   Global Step: 30300   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:22:15,466-Speed 3370.37 samples/sec   Loss 8.5129   LearningRate 0.0771   Epoch: 2   Global Step: 30310   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:22:18,484-Speed 3394.06 samples/sec   Loss 8.3595   LearningRate 0.0771   Epoch: 2   Global Step: 30320   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:22:21,478-Speed 3421.44 samples/sec   Loss 8.3648   LearningRate 0.0771   Epoch: 2   Global Step: 30330   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:22:24,559-Speed 3324.77 samples/sec   Loss 8.4508   LearningRate 0.0771   Epoch: 2   Global Step: 30340   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:22:27,627-Speed 3338.91 samples/sec   Loss 8.4158   LearningRate 0.0771   Epoch: 2   Global Step: 30350   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:22:30,701-Speed 3332.31 samples/sec   Loss 8.3932   LearningRate 0.0771   Epoch: 2   Global Step: 30360   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:22:33,691-Speed 3425.04 samples/sec   Loss 8.5434   LearningRate 0.0770   Epoch: 2   Global Step: 30370   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:22:36,706-Speed 3396.96 samples/sec   Loss 8.4650   LearningRate 0.0770   Epoch: 2   Global Step: 30380   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:22:39,719-Speed 3400.33 samples/sec   Loss 8.4962   LearningRate 0.0770   Epoch: 2   Global Step: 30390   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:22:42,762-Speed 3366.37 samples/sec   Loss 8.3870   LearningRate 0.0770   Epoch: 2   Global Step: 30400   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:22:45,772-Speed 3402.34 samples/sec   Loss 8.3883   LearningRate 0.0770   Epoch: 2   Global Step: 30410   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:22:48,811-Speed 3370.78 samples/sec   Loss 8.4029   LearningRate 0.0770   Epoch: 2   Global Step: 30420   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:22:51,829-Speed 3394.30 samples/sec   Loss 8.3872   LearningRate 0.0770   Epoch: 2   Global Step: 30430   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:22:54,907-Speed 3328.04 samples/sec   Loss 8.3885   LearningRate 0.0770   Epoch: 2   Global Step: 30440   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:22:57,933-Speed 3384.19 samples/sec   Loss 8.4674   LearningRate 0.0770   Epoch: 2   Global Step: 30450   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:23:00,970-Speed 3373.14 samples/sec   Loss 8.5014   LearningRate 0.0770   Epoch: 2   Global Step: 30460   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:23:04,056-Speed 3318.98 samples/sec   Loss 8.5100   LearningRate 0.0770   Epoch: 2   Global Step: 30470   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:23:07,088-Speed 3379.20 samples/sec   Loss 8.4706   LearningRate 0.0770   Epoch: 2   Global Step: 30480   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:23:10,083-Speed 3419.51 samples/sec   Loss 8.6286   LearningRate 0.0770   Epoch: 2   Global Step: 30490   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:23:13,101-Speed 3394.82 samples/sec   Loss 8.5627   LearningRate 0.0770   Epoch: 2   Global Step: 30500   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:23:16,109-Speed 3404.86 samples/sec   Loss 8.4317   LearningRate 0.0769   Epoch: 2   Global Step: 30510   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:23:19,110-Speed 3413.36 samples/sec   Loss 8.5628   LearningRate 0.0769   Epoch: 2   Global Step: 30520   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:23:22,142-Speed 3378.27 samples/sec   Loss 8.3462   LearningRate 0.0769   Epoch: 2   Global Step: 30530   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:23:25,181-Speed 3371.08 samples/sec   Loss 8.4358   LearningRate 0.0769   Epoch: 2   Global Step: 30540   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:23:28,234-Speed 3354.36 samples/sec   Loss 8.4773   LearningRate 0.0769   Epoch: 2   Global Step: 30550   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:23:31,251-Speed 3395.78 samples/sec   Loss 8.4394   LearningRate 0.0769   Epoch: 2   Global Step: 30560   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:23:34,273-Speed 3389.87 samples/sec   Loss 8.4248   LearningRate 0.0769   Epoch: 2   Global Step: 30570   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:23:37,304-Speed 3379.63 samples/sec   Loss 8.4626   LearningRate 0.0769   Epoch: 2   Global Step: 30580   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:23:40,309-Speed 3409.15 samples/sec   Loss 8.3673   LearningRate 0.0769   Epoch: 2   Global Step: 30590   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:23:43,355-Speed 3363.29 samples/sec   Loss 8.5156   LearningRate 0.0769   Epoch: 2   Global Step: 30600   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:23:46,383-Speed 3382.53 samples/sec   Loss 8.4073   LearningRate 0.0769   Epoch: 2   Global Step: 30610   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:23:49,519-Speed 3265.99 samples/sec   Loss 8.4218   LearningRate 0.0769   Epoch: 2   Global Step: 30620   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:23:52,539-Speed 3392.69 samples/sec   Loss 8.3844   LearningRate 0.0769   Epoch: 2   Global Step: 30630   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:23:55,586-Speed 3361.16 samples/sec   Loss 8.6503   LearningRate 0.0769   Epoch: 2   Global Step: 30640   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:23:58,611-Speed 3386.13 samples/sec   Loss 8.5416   LearningRate 0.0768   Epoch: 2   Global Step: 30650   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:24:01,709-Speed 3306.73 samples/sec   Loss 8.5201   LearningRate 0.0768   Epoch: 2   Global Step: 30660   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:24:04,741-Speed 3378.54 samples/sec   Loss 8.3512   LearningRate 0.0768   Epoch: 2   Global Step: 30670   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:24:07,776-Speed 3375.22 samples/sec   Loss 8.4317   LearningRate 0.0768   Epoch: 2   Global Step: 30680   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:24:10,773-Speed 3417.71 samples/sec   Loss 8.3289   LearningRate 0.0768   Epoch: 2   Global Step: 30690   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:24:13,794-Speed 3391.14 samples/sec   Loss 8.5304   LearningRate 0.0768   Epoch: 2   Global Step: 30700   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:24:16,853-Speed 3349.01 samples/sec   Loss 8.4965   LearningRate 0.0768   Epoch: 2   Global Step: 30710   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:24:19,881-Speed 3382.71 samples/sec   Loss 8.4075   LearningRate 0.0768   Epoch: 2   Global Step: 30720   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:24:22,919-Speed 3372.22 samples/sec   Loss 8.4517   LearningRate 0.0768   Epoch: 2   Global Step: 30730   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:24:25,961-Speed 3366.86 samples/sec   Loss 8.3511   LearningRate 0.0768   Epoch: 2   Global Step: 30740   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:24:29,011-Speed 3357.84 samples/sec   Loss 8.4806   LearningRate 0.0768   Epoch: 2   Global Step: 30750   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:24:32,099-Speed 3317.80 samples/sec   Loss 8.3710   LearningRate 0.0768   Epoch: 2   Global Step: 30760   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:24:35,113-Speed 3398.24 samples/sec   Loss 8.4009   LearningRate 0.0768   Epoch: 2   Global Step: 30770   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:24:38,160-Speed 3362.00 samples/sec   Loss 8.4018   LearningRate 0.0768   Epoch: 2   Global Step: 30780   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:24:41,225-Speed 3342.44 samples/sec   Loss 8.3789   LearningRate 0.0767   Epoch: 2   Global Step: 30790   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:24:44,250-Speed 3386.29 samples/sec   Loss 8.4224   LearningRate 0.0767   Epoch: 2   Global Step: 30800   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:24:47,269-Speed 3392.55 samples/sec   Loss 8.5087   LearningRate 0.0767   Epoch: 2   Global Step: 30810   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:24:50,306-Speed 3372.22 samples/sec   Loss 8.5032   LearningRate 0.0767   Epoch: 2   Global Step: 30820   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:24:53,383-Speed 3329.79 samples/sec   Loss 8.3988   LearningRate 0.0767   Epoch: 2   Global Step: 30830   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:24:56,416-Speed 3377.19 samples/sec   Loss 8.4286   LearningRate 0.0767   Epoch: 2   Global Step: 30840   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:24:59,451-Speed 3374.65 samples/sec   Loss 8.4016   LearningRate 0.0767   Epoch: 2   Global Step: 30850   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:25:02,498-Speed 3361.93 samples/sec   Loss 8.5830   LearningRate 0.0767   Epoch: 2   Global Step: 30860   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:25:05,520-Speed 3389.65 samples/sec   Loss 8.3577   LearningRate 0.0767   Epoch: 2   Global Step: 30870   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:25:08,523-Speed 3411.39 samples/sec   Loss 8.4397   LearningRate 0.0767   Epoch: 2   Global Step: 30880   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:25:11,537-Speed 3398.34 samples/sec   Loss 8.3815   LearningRate 0.0767   Epoch: 2   Global Step: 30890   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:25:14,572-Speed 3374.74 samples/sec   Loss 8.5388   LearningRate 0.0767   Epoch: 2   Global Step: 30900   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:25:17,613-Speed 3369.10 samples/sec   Loss 8.4316   LearningRate 0.0767   Epoch: 2   Global Step: 30910   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:25:20,626-Speed 3399.49 samples/sec   Loss 8.4303   LearningRate 0.0767   Epoch: 2   Global Step: 30920   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:25:23,690-Speed 3342.78 samples/sec   Loss 8.4315   LearningRate 0.0766   Epoch: 2   Global Step: 30930   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:25:26,727-Speed 3372.95 samples/sec   Loss 8.6167   LearningRate 0.0766   Epoch: 2   Global Step: 30940   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:25:29,788-Speed 3346.54 samples/sec   Loss 8.4876   LearningRate 0.0766   Epoch: 2   Global Step: 30950   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:25:32,856-Speed 3337.57 samples/sec   Loss 8.3018   LearningRate 0.0766   Epoch: 2   Global Step: 30960   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:25:35,971-Speed 3288.93 samples/sec   Loss 8.5723   LearningRate 0.0766   Epoch: 2   Global Step: 30970   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:25:39,023-Speed 3356.24 samples/sec   Loss 8.3451   LearningRate 0.0766   Epoch: 2   Global Step: 30980   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:25:42,154-Speed 3271.96 samples/sec   Loss 8.5362   LearningRate 0.0766   Epoch: 2   Global Step: 30990   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:25:45,189-Speed 3375.28 samples/sec   Loss 8.3801   LearningRate 0.0766   Epoch: 2   Global Step: 31000   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:25:48,252-Speed 3343.76 samples/sec   Loss 8.4679   LearningRate 0.0766   Epoch: 2   Global Step: 31010   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:25:51,267-Speed 3397.78 samples/sec   Loss 8.4185   LearningRate 0.0766   Epoch: 2   Global Step: 31020   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:25:54,352-Speed 3320.62 samples/sec   Loss 8.4730   LearningRate 0.0766   Epoch: 2   Global Step: 31030   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:25:57,373-Speed 3390.05 samples/sec   Loss 8.3790   LearningRate 0.0766   Epoch: 2   Global Step: 31040   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:26:00,464-Speed 3314.26 samples/sec   Loss 8.3976   LearningRate 0.0766   Epoch: 2   Global Step: 31050   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:26:03,554-Speed 3314.90 samples/sec   Loss 8.4425   LearningRate 0.0766   Epoch: 2   Global Step: 31060   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:26:06,643-Speed 3315.92 samples/sec   Loss 8.4308   LearningRate 0.0766   Epoch: 2   Global Step: 31070   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:26:09,668-Speed 3385.94 samples/sec   Loss 8.3642   LearningRate 0.0765   Epoch: 2   Global Step: 31080   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:26:12,681-Speed 3400.47 samples/sec   Loss 8.5024   LearningRate 0.0765   Epoch: 2   Global Step: 31090   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:26:15,716-Speed 3374.74 samples/sec   Loss 8.4510   LearningRate 0.0765   Epoch: 2   Global Step: 31100   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:26:18,861-Speed 3256.80 samples/sec   Loss 8.4692   LearningRate 0.0765   Epoch: 2   Global Step: 31110   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:26:21,872-Speed 3401.85 samples/sec   Loss 8.4126   LearningRate 0.0765   Epoch: 2   Global Step: 31120   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:26:24,917-Speed 3364.26 samples/sec   Loss 8.5006   LearningRate 0.0765   Epoch: 2   Global Step: 31130   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:26:27,952-Speed 3375.00 samples/sec   Loss 8.4802   LearningRate 0.0765   Epoch: 2   Global Step: 31140   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:26:31,031-Speed 3326.59 samples/sec   Loss 8.3995   LearningRate 0.0765   Epoch: 2   Global Step: 31150   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:26:34,069-Speed 3372.67 samples/sec   Loss 8.3862   LearningRate 0.0765   Epoch: 2   Global Step: 31160   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:26:37,173-Speed 3299.53 samples/sec   Loss 8.3673   LearningRate 0.0765   Epoch: 2   Global Step: 31170   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:26:40,212-Speed 3369.94 samples/sec   Loss 8.3773   LearningRate 0.0765   Epoch: 2   Global Step: 31180   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:26:43,251-Speed 3370.70 samples/sec   Loss 8.3540   LearningRate 0.0765   Epoch: 2   Global Step: 31190   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:26:46,258-Speed 3406.59 samples/sec   Loss 8.5348   LearningRate 0.0765   Epoch: 2   Global Step: 31200   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:26:49,297-Speed 3370.36 samples/sec   Loss 8.5786   LearningRate 0.0765   Epoch: 2   Global Step: 31210   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:26:52,383-Speed 3320.49 samples/sec   Loss 8.4518   LearningRate 0.0764   Epoch: 2   Global Step: 31220   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:26:55,454-Speed 3334.62 samples/sec   Loss 8.4125   LearningRate 0.0764   Epoch: 2   Global Step: 31230   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:26:58,494-Speed 3369.71 samples/sec   Loss 8.4253   LearningRate 0.0764   Epoch: 2   Global Step: 31240   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:27:01,496-Speed 3412.52 samples/sec   Loss 8.4914   LearningRate 0.0764   Epoch: 2   Global Step: 31250   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:27:04,503-Speed 3406.34 samples/sec   Loss 8.3875   LearningRate 0.0764   Epoch: 2   Global Step: 31260   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:27:07,530-Speed 3383.68 samples/sec   Loss 8.4650   LearningRate 0.0764   Epoch: 2   Global Step: 31270   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:27:10,538-Speed 3405.93 samples/sec   Loss 8.3351   LearningRate 0.0764   Epoch: 2   Global Step: 31280   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:27:13,633-Speed 3309.76 samples/sec   Loss 8.2932   LearningRate 0.0764   Epoch: 2   Global Step: 31290   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:27:16,667-Speed 3376.05 samples/sec   Loss 8.5801   LearningRate 0.0764   Epoch: 2   Global Step: 31300   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:27:19,719-Speed 3356.06 samples/sec   Loss 8.4008   LearningRate 0.0764   Epoch: 2   Global Step: 31310   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:27:22,727-Speed 3405.46 samples/sec   Loss 8.5529   LearningRate 0.0764   Epoch: 2   Global Step: 31320   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:27:25,747-Speed 3391.26 samples/sec   Loss 8.3680   LearningRate 0.0764   Epoch: 2   Global Step: 31330   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:27:28,813-Speed 3341.28 samples/sec   Loss 8.5234   LearningRate 0.0764   Epoch: 2   Global Step: 31340   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:27:31,843-Speed 3379.88 samples/sec   Loss 8.4550   LearningRate 0.0764   Epoch: 2   Global Step: 31350   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-04-27 04:27:34,883-Speed 3370.65 samples/sec   Loss 8.4277   LearningRate 0.0763   Epoch: 2   Global Step: 31360   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:27:37,897-Speed 3399.00 samples/sec   Loss 8.4391   LearningRate 0.0763   Epoch: 2   Global Step: 31370   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:27:40,948-Speed 3356.56 samples/sec   Loss 8.4111   LearningRate 0.0763   Epoch: 2   Global Step: 31380   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:27:43,955-Speed 3406.91 samples/sec   Loss 8.4085   LearningRate 0.0763   Epoch: 2   Global Step: 31390   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:27:47,025-Speed 3336.57 samples/sec   Loss 8.4096   LearningRate 0.0763   Epoch: 2   Global Step: 31400   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:27:50,083-Speed 3350.14 samples/sec   Loss 8.4423   LearningRate 0.0763   Epoch: 2   Global Step: 31410   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:27:53,112-Speed 3381.37 samples/sec   Loss 8.3178   LearningRate 0.0763   Epoch: 2   Global Step: 31420   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:27:56,154-Speed 3367.02 samples/sec   Loss 8.5621   LearningRate 0.0763   Epoch: 2   Global Step: 31430   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:27:59,178-Speed 3387.99 samples/sec   Loss 8.3926   LearningRate 0.0763   Epoch: 2   Global Step: 31440   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:28:02,220-Speed 3366.85 samples/sec   Loss 8.3410   LearningRate 0.0763   Epoch: 2   Global Step: 31450   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:28:05,242-Speed 3390.57 samples/sec   Loss 8.5166   LearningRate 0.0763   Epoch: 2   Global Step: 31460   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:28:08,286-Speed 3364.33 samples/sec   Loss 8.2950   LearningRate 0.0763   Epoch: 2   Global Step: 31470   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:28:11,316-Speed 3380.70 samples/sec   Loss 8.4255   LearningRate 0.0763   Epoch: 2   Global Step: 31480   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:28:14,406-Speed 3315.32 samples/sec   Loss 8.3570   LearningRate 0.0763   Epoch: 2   Global Step: 31490   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:28:17,428-Speed 3389.57 samples/sec   Loss 8.4557   LearningRate 0.0762   Epoch: 2   Global Step: 31500   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:28:20,441-Speed 3400.05 samples/sec   Loss 8.4001   LearningRate 0.0762   Epoch: 2   Global Step: 31510   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:28:23,465-Speed 3387.07 samples/sec   Loss 8.3110   LearningRate 0.0762   Epoch: 2   Global Step: 31520   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:28:26,547-Speed 3323.24 samples/sec   Loss 8.4131   LearningRate 0.0762   Epoch: 2   Global Step: 31530   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:28:29,561-Speed 3398.74 samples/sec   Loss 8.4159   LearningRate 0.0762   Epoch: 2   Global Step: 31540   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:28:32,585-Speed 3387.04 samples/sec   Loss 8.3922   LearningRate 0.0762   Epoch: 2   Global Step: 31550   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:28:35,631-Speed 3363.91 samples/sec   Loss 8.3491   LearningRate 0.0762   Epoch: 2   Global Step: 31560   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 04:28:38,693-Speed 3344.77 samples/sec   Loss 8.3513   LearningRate 0.0762   Epoch: 2   Global Step: 31570   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 04:28:41,791-Speed 3306.90 samples/sec   Loss 8.3260   LearningRate 0.0762   Epoch: 2   Global Step: 31580   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 04:28:44,789-Speed 3416.37 samples/sec   Loss 8.4301   LearningRate 0.0762   Epoch: 2   Global Step: 31590   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-04-27 04:28:47,829-Speed 3369.35 samples/sec   Loss 8.5058   LearningRate 0.0762   Epoch: 2   Global Step: 31600   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:28:50,838-Speed 3403.95 samples/sec   Loss 8.3731   LearningRate 0.0762   Epoch: 2   Global Step: 31610   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:28:53,920-Speed 3324.20 samples/sec   Loss 8.4870   LearningRate 0.0762   Epoch: 2   Global Step: 31620   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:28:56,953-Speed 3377.49 samples/sec   Loss 8.4422   LearningRate 0.0762   Epoch: 2   Global Step: 31630   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:28:59,960-Speed 3405.96 samples/sec   Loss 8.4677   LearningRate 0.0761   Epoch: 2   Global Step: 31640   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:29:02,985-Speed 3385.66 samples/sec   Loss 8.2311   LearningRate 0.0761   Epoch: 2   Global Step: 31650   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:29:06,058-Speed 3333.82 samples/sec   Loss 8.3713   LearningRate 0.0761   Epoch: 2   Global Step: 31660   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:29:09,115-Speed 3351.30 samples/sec   Loss 8.4942   LearningRate 0.0761   Epoch: 2   Global Step: 31670   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:29:12,156-Speed 3380.96 samples/sec   Loss 8.3758   LearningRate 0.0761   Epoch: 2   Global Step: 31680   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:29:15,247-Speed 3314.47 samples/sec   Loss 8.3863   LearningRate 0.0761   Epoch: 2   Global Step: 31690   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:29:18,277-Speed 3380.81 samples/sec   Loss 8.4714   LearningRate 0.0761   Epoch: 2   Global Step: 31700   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:29:21,289-Speed 3400.21 samples/sec   Loss 8.5507   LearningRate 0.0761   Epoch: 2   Global Step: 31710   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-04-27 04:29:24,306-Speed 3395.28 samples/sec   Loss 8.4808   LearningRate 0.0761   Epoch: 2   Global Step: 31720   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:29:27,314-Speed 3405.80 samples/sec   Loss 8.3818   LearningRate 0.0761   Epoch: 2   Global Step: 31730   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:29:30,326-Speed 3400.54 samples/sec   Loss 8.4260   LearningRate 0.0761   Epoch: 2   Global Step: 31740   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-04-27 04:29:33,319-Speed 3422.34 samples/sec   Loss 8.3820   LearningRate 0.0761   Epoch: 2   Global Step: 31750   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:29:36,334-Speed 3397.02 samples/sec   Loss 8.4330   LearningRate 0.0761   Epoch: 2   Global Step: 31760   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:29:39,377-Speed 3367.31 samples/sec   Loss 8.5144   LearningRate 0.0761   Epoch: 2   Global Step: 31770   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:29:42,431-Speed 3354.04 samples/sec   Loss 8.4003   LearningRate 0.0761   Epoch: 2   Global Step: 31780   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:29:45,486-Speed 3351.99 samples/sec   Loss 8.4758   LearningRate 0.0760   Epoch: 2   Global Step: 31790   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:29:48,552-Speed 3341.71 samples/sec   Loss 8.3984   LearningRate 0.0760   Epoch: 2   Global Step: 31800   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:29:51,582-Speed 3380.85 samples/sec   Loss 8.3848   LearningRate 0.0760   Epoch: 2   Global Step: 31810   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:29:54,621-Speed 3370.28 samples/sec   Loss 8.3836   LearningRate 0.0760   Epoch: 2   Global Step: 31820   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:29:57,619-Speed 3416.79 samples/sec   Loss 8.2474   LearningRate 0.0760   Epoch: 2   Global Step: 31830   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:30:00,624-Speed 3408.65 samples/sec   Loss 8.4815   LearningRate 0.0760   Epoch: 2   Global Step: 31840   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:30:03,650-Speed 3384.80 samples/sec   Loss 8.3424   LearningRate 0.0760   Epoch: 2   Global Step: 31850   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:30:06,684-Speed 3376.57 samples/sec   Loss 8.4596   LearningRate 0.0760   Epoch: 2   Global Step: 31860   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:30:09,681-Speed 3417.14 samples/sec   Loss 8.4879   LearningRate 0.0760   Epoch: 2   Global Step: 31870   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:30:12,716-Speed 3376.35 samples/sec   Loss 8.3504   LearningRate 0.0760   Epoch: 2   Global Step: 31880   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:30:15,770-Speed 3354.10 samples/sec   Loss 8.3946   LearningRate 0.0760   Epoch: 2   Global Step: 31890   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:30:18,808-Speed 3370.52 samples/sec   Loss 8.3099   LearningRate 0.0760   Epoch: 2   Global Step: 31900   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:30:21,823-Speed 3397.80 samples/sec   Loss 8.4533   LearningRate 0.0760   Epoch: 2   Global Step: 31910   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:30:24,874-Speed 3357.73 samples/sec   Loss 8.4882   LearningRate 0.0760   Epoch: 2   Global Step: 31920   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:30:27,915-Speed 3367.78 samples/sec   Loss 8.4101   LearningRate 0.0759   Epoch: 2   Global Step: 31930   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:30:30,938-Speed 3388.74 samples/sec   Loss 8.2884   LearningRate 0.0759   Epoch: 2   Global Step: 31940   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:30:33,931-Speed 3423.05 samples/sec   Loss 8.3986   LearningRate 0.0759   Epoch: 2   Global Step: 31950   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:30:36,953-Speed 3389.24 samples/sec   Loss 8.3682   LearningRate 0.0759   Epoch: 2   Global Step: 31960   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:30:39,960-Speed 3407.05 samples/sec   Loss 8.4243   LearningRate 0.0759   Epoch: 2   Global Step: 31970   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:30:42,973-Speed 3398.73 samples/sec   Loss 8.4976   LearningRate 0.0759   Epoch: 2   Global Step: 31980   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:30:45,976-Speed 3411.05 samples/sec   Loss 8.4165   LearningRate 0.0759   Epoch: 2   Global Step: 31990   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:30:49,101-Speed 3278.31 samples/sec   Loss 8.3531   LearningRate 0.0759   Epoch: 2   Global Step: 32000   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:30:52,171-Speed 3336.34 samples/sec   Loss 8.4648   LearningRate 0.0759   Epoch: 2   Global Step: 32010   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:30:55,226-Speed 3353.02 samples/sec   Loss 8.4059   LearningRate 0.0759   Epoch: 2   Global Step: 32020   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:30:58,278-Speed 3355.34 samples/sec   Loss 8.4149   LearningRate 0.0759   Epoch: 2   Global Step: 32030   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:31:01,311-Speed 3377.48 samples/sec   Loss 8.4567   LearningRate 0.0759   Epoch: 2   Global Step: 32040   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:31:04,334-Speed 3389.23 samples/sec   Loss 8.4007   LearningRate 0.0759   Epoch: 2   Global Step: 32050   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:31:07,424-Speed 3314.54 samples/sec   Loss 8.5078   LearningRate 0.0759   Epoch: 2   Global Step: 32060   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:31:10,428-Speed 3409.74 samples/sec   Loss 8.3613   LearningRate 0.0758   Epoch: 2   Global Step: 32070   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:31:13,468-Speed 3370.03 samples/sec   Loss 8.4544   LearningRate 0.0758   Epoch: 2   Global Step: 32080   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:31:16,476-Speed 3404.70 samples/sec   Loss 8.2556   LearningRate 0.0758   Epoch: 2   Global Step: 32090   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:31:19,486-Speed 3403.23 samples/sec   Loss 8.4567   LearningRate 0.0758   Epoch: 2   Global Step: 32100   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:31:22,498-Speed 3401.45 samples/sec   Loss 8.4233   LearningRate 0.0758   Epoch: 2   Global Step: 32110   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:31:25,611-Speed 3290.37 samples/sec   Loss 8.2420   LearningRate 0.0758   Epoch: 2   Global Step: 32120   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:31:28,684-Speed 3332.82 samples/sec   Loss 8.3782   LearningRate 0.0758   Epoch: 2   Global Step: 32130   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:31:31,695-Speed 3401.39 samples/sec   Loss 8.3640   LearningRate 0.0758   Epoch: 2   Global Step: 32140   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:31:34,724-Speed 3382.22 samples/sec   Loss 8.3952   LearningRate 0.0758   Epoch: 2   Global Step: 32150   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:31:37,785-Speed 3345.87 samples/sec   Loss 8.3124   LearningRate 0.0758   Epoch: 2   Global Step: 32160   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:31:40,840-Speed 3352.88 samples/sec   Loss 8.4483   LearningRate 0.0758   Epoch: 2   Global Step: 32170   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:31:43,857-Speed 3396.22 samples/sec   Loss 8.4173   LearningRate 0.0758   Epoch: 2   Global Step: 32180   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:31:46,857-Speed 3414.40 samples/sec   Loss 8.4763   LearningRate 0.0758   Epoch: 2   Global Step: 32190   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:31:49,865-Speed 3405.09 samples/sec   Loss 8.3316   LearningRate 0.0758   Epoch: 2   Global Step: 32200   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:31:52,901-Speed 3372.99 samples/sec   Loss 8.3335   LearningRate 0.0757   Epoch: 2   Global Step: 32210   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:31:55,984-Speed 3322.96 samples/sec   Loss 8.2881   LearningRate 0.0757   Epoch: 2   Global Step: 32220   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:31:59,005-Speed 3390.89 samples/sec   Loss 8.4159   LearningRate 0.0757   Epoch: 2   Global Step: 32230   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:32:02,081-Speed 3330.31 samples/sec   Loss 8.3534   LearningRate 0.0757   Epoch: 2   Global Step: 32240   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:32:05,130-Speed 3358.65 samples/sec   Loss 8.4089   LearningRate 0.0757   Epoch: 2   Global Step: 32250   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:32:08,177-Speed 3362.22 samples/sec   Loss 8.4424   LearningRate 0.0757   Epoch: 2   Global Step: 32260   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:32:11,177-Speed 3414.40 samples/sec   Loss 8.4640   LearningRate 0.0757   Epoch: 2   Global Step: 32270   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:32:14,222-Speed 3364.27 samples/sec   Loss 8.4467   LearningRate 0.0757   Epoch: 2   Global Step: 32280   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:32:17,274-Speed 3355.66 samples/sec   Loss 8.3282   LearningRate 0.0757   Epoch: 2   Global Step: 32290   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:32:20,315-Speed 3369.32 samples/sec   Loss 8.5133   LearningRate 0.0757   Epoch: 2   Global Step: 32300   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:32:23,345-Speed 3380.07 samples/sec   Loss 8.4642   LearningRate 0.0757   Epoch: 2   Global Step: 32310   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:32:26,394-Speed 3359.02 samples/sec   Loss 8.3915   LearningRate 0.0757   Epoch: 2   Global Step: 32320   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:32:29,509-Speed 3288.38 samples/sec   Loss 8.3748   LearningRate 0.0757   Epoch: 2   Global Step: 32330   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:32:32,560-Speed 3357.61 samples/sec   Loss 8.2964   LearningRate 0.0757   Epoch: 2   Global Step: 32340   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:32:35,557-Speed 3418.49 samples/sec   Loss 8.3898   LearningRate 0.0757   Epoch: 2   Global Step: 32350   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:32:38,573-Speed 3395.83 samples/sec   Loss 8.3884   LearningRate 0.0756   Epoch: 2   Global Step: 32360   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:32:41,623-Speed 3359.18 samples/sec   Loss 8.2773   LearningRate 0.0756   Epoch: 2   Global Step: 32370   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:32:44,653-Speed 3379.64 samples/sec   Loss 8.4285   LearningRate 0.0756   Epoch: 2   Global Step: 32380   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:32:47,686-Speed 3377.20 samples/sec   Loss 8.2905   LearningRate 0.0756   Epoch: 2   Global Step: 32390   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:32:50,718-Speed 3378.33 samples/sec   Loss 8.2840   LearningRate 0.0756   Epoch: 2   Global Step: 32400   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:32:53,744-Speed 3385.82 samples/sec   Loss 8.4020   LearningRate 0.0756   Epoch: 2   Global Step: 32410   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:32:56,728-Speed 3432.81 samples/sec   Loss 8.2645   LearningRate 0.0756   Epoch: 2   Global Step: 32420   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:32:59,740-Speed 3399.72 samples/sec   Loss 8.3855   LearningRate 0.0756   Epoch: 2   Global Step: 32430   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:33:02,811-Speed 3335.46 samples/sec   Loss 8.4320   LearningRate 0.0756   Epoch: 2   Global Step: 32440   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:33:05,876-Speed 3342.94 samples/sec   Loss 8.4406   LearningRate 0.0756   Epoch: 2   Global Step: 32450   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:33:08,899-Speed 3387.74 samples/sec   Loss 8.3288   LearningRate 0.0756   Epoch: 2   Global Step: 32460   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:33:11,909-Speed 3402.97 samples/sec   Loss 8.4930   LearningRate 0.0756   Epoch: 2   Global Step: 32470   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:33:14,917-Speed 3406.13 samples/sec   Loss 8.4401   LearningRate 0.0756   Epoch: 2   Global Step: 32480   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:33:17,939-Speed 3389.51 samples/sec   Loss 8.3545   LearningRate 0.0756   Epoch: 2   Global Step: 32490   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:33:20,932-Speed 3421.43 samples/sec   Loss 8.3400   LearningRate 0.0755   Epoch: 2   Global Step: 32500   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:33:23,962-Speed 3380.59 samples/sec   Loss 8.3355   LearningRate 0.0755   Epoch: 2   Global Step: 32510   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:33:27,017-Speed 3353.27 samples/sec   Loss 8.4701   LearningRate 0.0755   Epoch: 2   Global Step: 32520   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:33:30,076-Speed 3349.06 samples/sec   Loss 8.4054   LearningRate 0.0755   Epoch: 2   Global Step: 32530   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:33:33,129-Speed 3355.18 samples/sec   Loss 8.4010   LearningRate 0.0755   Epoch: 2   Global Step: 32540   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:33:36,235-Speed 3297.87 samples/sec   Loss 8.4294   LearningRate 0.0755   Epoch: 2   Global Step: 32550   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:33:39,278-Speed 3365.34 samples/sec   Loss 8.4366   LearningRate 0.0755   Epoch: 2   Global Step: 32560   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:33:42,295-Speed 3395.69 samples/sec   Loss 8.3867   LearningRate 0.0755   Epoch: 2   Global Step: 32570   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:33:45,303-Speed 3405.65 samples/sec   Loss 8.2730   LearningRate 0.0755   Epoch: 2   Global Step: 32580   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:33:48,331-Speed 3382.86 samples/sec   Loss 8.4036   LearningRate 0.0755   Epoch: 2   Global Step: 32590   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:33:51,368-Speed 3372.98 samples/sec   Loss 8.4149   LearningRate 0.0755   Epoch: 2   Global Step: 32600   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:33:54,423-Speed 3352.01 samples/sec   Loss 8.4806   LearningRate 0.0755   Epoch: 2   Global Step: 32610   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:33:57,421-Speed 3416.95 samples/sec   Loss 8.4338   LearningRate 0.0755   Epoch: 2   Global Step: 32620   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:34:00,447-Speed 3385.02 samples/sec   Loss 8.4692   LearningRate 0.0755   Epoch: 2   Global Step: 32630   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:34:03,521-Speed 3332.08 samples/sec   Loss 8.4436   LearningRate 0.0754   Epoch: 2   Global Step: 32640   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:34:06,596-Speed 3331.94 samples/sec   Loss 8.3128   LearningRate 0.0754   Epoch: 2   Global Step: 32650   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:34:09,604-Speed 3405.11 samples/sec   Loss 8.2924   LearningRate 0.0754   Epoch: 2   Global Step: 32660   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:34:12,599-Speed 3419.63 samples/sec   Loss 8.2848   LearningRate 0.0754   Epoch: 2   Global Step: 32670   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:34:15,631-Speed 3378.69 samples/sec   Loss 8.2660   LearningRate 0.0754   Epoch: 2   Global Step: 32680   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:34:18,742-Speed 3292.67 samples/sec   Loss 8.2731   LearningRate 0.0754   Epoch: 2   Global Step: 32690   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:34:21,759-Speed 3395.26 samples/sec   Loss 8.4698   LearningRate 0.0754   Epoch: 2   Global Step: 32700   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:34:24,764-Speed 3408.32 samples/sec   Loss 8.4650   LearningRate 0.0754   Epoch: 2   Global Step: 32710   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:34:27,793-Speed 3382.46 samples/sec   Loss 8.4809   LearningRate 0.0754   Epoch: 2   Global Step: 32720   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:34:30,808-Speed 3396.58 samples/sec   Loss 8.4045   LearningRate 0.0754   Epoch: 2   Global Step: 32730   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:34:33,872-Speed 3343.40 samples/sec   Loss 8.3931   LearningRate 0.0754   Epoch: 2   Global Step: 32740   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:34:36,936-Speed 3343.09 samples/sec   Loss 8.3640   LearningRate 0.0754   Epoch: 2   Global Step: 32750   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:34:39,960-Speed 3388.04 samples/sec   Loss 8.3911   LearningRate 0.0754   Epoch: 2   Global Step: 32760   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:34:43,053-Speed 3311.50 samples/sec   Loss 8.3698   LearningRate 0.0754   Epoch: 2   Global Step: 32770   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:34:46,046-Speed 3422.93 samples/sec   Loss 8.3842   LearningRate 0.0754   Epoch: 2   Global Step: 32780   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:34:49,087-Speed 3368.79 samples/sec   Loss 8.3286   LearningRate 0.0753   Epoch: 2   Global Step: 32790   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:34:52,175-Speed 3316.69 samples/sec   Loss 8.4621   LearningRate 0.0753   Epoch: 2   Global Step: 32800   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:34:55,215-Speed 3369.27 samples/sec   Loss 8.4643   LearningRate 0.0753   Epoch: 2   Global Step: 32810   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:34:58,211-Speed 3419.28 samples/sec   Loss 8.4096   LearningRate 0.0753   Epoch: 2   Global Step: 32820   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:35:01,330-Speed 3283.63 samples/sec   Loss 8.3816   LearningRate 0.0753   Epoch: 2   Global Step: 32830   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:35:04,424-Speed 3310.87 samples/sec   Loss 8.2275   LearningRate 0.0753   Epoch: 2   Global Step: 32840   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:35:07,510-Speed 3319.90 samples/sec   Loss 8.2865   LearningRate 0.0753   Epoch: 2   Global Step: 32850   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:35:10,568-Speed 3348.94 samples/sec   Loss 8.3696   LearningRate 0.0753   Epoch: 2   Global Step: 32860   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:35:13,638-Speed 3336.64 samples/sec   Loss 8.3276   LearningRate 0.0753   Epoch: 2   Global Step: 32870   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:35:16,767-Speed 3274.58 samples/sec   Loss 8.2343   LearningRate 0.0753   Epoch: 2   Global Step: 32880   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:35:19,799-Speed 3377.45 samples/sec   Loss 8.4008   LearningRate 0.0753   Epoch: 2   Global Step: 32890   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:35:22,809-Speed 3403.07 samples/sec   Loss 8.3895   LearningRate 0.0753   Epoch: 2   Global Step: 32900   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:35:25,883-Speed 3332.21 samples/sec   Loss 8.2681   LearningRate 0.0753   Epoch: 2   Global Step: 32910   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:35:28,939-Speed 3352.27 samples/sec   Loss 8.2396   LearningRate 0.0753   Epoch: 2   Global Step: 32920   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:35:32,036-Speed 3308.04 samples/sec   Loss 8.4455   LearningRate 0.0752   Epoch: 2   Global Step: 32930   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:35:35,097-Speed 3345.96 samples/sec   Loss 8.2866   LearningRate 0.0752   Epoch: 2   Global Step: 32940   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:35:38,202-Speed 3298.84 samples/sec   Loss 8.2671   LearningRate 0.0752   Epoch: 2   Global Step: 32950   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:35:41,245-Speed 3366.36 samples/sec   Loss 8.3434   LearningRate 0.0752   Epoch: 2   Global Step: 32960   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:35:44,291-Speed 3363.00 samples/sec   Loss 8.2148   LearningRate 0.0752   Epoch: 2   Global Step: 32970   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:35:47,304-Speed 3399.65 samples/sec   Loss 8.4432   LearningRate 0.0752   Epoch: 2   Global Step: 32980   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:35:50,393-Speed 3315.61 samples/sec   Loss 8.4348   LearningRate 0.0752   Epoch: 2   Global Step: 32990   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:35:53,485-Speed 3312.70 samples/sec   Loss 8.3606   LearningRate 0.0752   Epoch: 2   Global Step: 33000   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:35:56,540-Speed 3352.44 samples/sec   Loss 8.3327   LearningRate 0.0752   Epoch: 2   Global Step: 33010   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:35:59,576-Speed 3374.75 samples/sec   Loss 8.3121   LearningRate 0.0752   Epoch: 2   Global Step: 33020   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:36:02,696-Speed 3283.07 samples/sec   Loss 8.4433   LearningRate 0.0752   Epoch: 2   Global Step: 33030   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:36:05,750-Speed 3353.60 samples/sec   Loss 8.4417   LearningRate 0.0752   Epoch: 2   Global Step: 33040   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:36:08,775-Speed 3386.01 samples/sec   Loss 8.3809   LearningRate 0.0752   Epoch: 2   Global Step: 33050   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:36:11,863-Speed 3317.80 samples/sec   Loss 8.3693   LearningRate 0.0752   Epoch: 2   Global Step: 33060   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:36:14,979-Speed 3286.56 samples/sec   Loss 8.3394   LearningRate 0.0751   Epoch: 2   Global Step: 33070   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:36:18,068-Speed 3316.30 samples/sec   Loss 8.3368   LearningRate 0.0751   Epoch: 2   Global Step: 33080   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:36:21,079-Speed 3402.09 samples/sec   Loss 8.2877   LearningRate 0.0751   Epoch: 2   Global Step: 33090   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:36:24,149-Speed 3335.82 samples/sec   Loss 8.3553   LearningRate 0.0751   Epoch: 2   Global Step: 33100   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:36:27,189-Speed 3370.44 samples/sec   Loss 8.2306   LearningRate 0.0751   Epoch: 2   Global Step: 33110   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:36:30,272-Speed 3322.00 samples/sec   Loss 8.3146   LearningRate 0.0751   Epoch: 2   Global Step: 33120   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:36:34,016-Speed 2735.53 samples/sec   Loss 8.2960   LearningRate 0.0751   Epoch: 2   Global Step: 33130   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:36:37,072-Speed 3352.50 samples/sec   Loss 8.2943   LearningRate 0.0751   Epoch: 2   Global Step: 33140   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:36:40,127-Speed 3352.93 samples/sec   Loss 8.3191   LearningRate 0.0751   Epoch: 2   Global Step: 33150   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:36:43,183-Speed 3351.31 samples/sec   Loss 8.3146   LearningRate 0.0751   Epoch: 2   Global Step: 33160   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:36:46,198-Speed 3398.18 samples/sec   Loss 8.3150   LearningRate 0.0751   Epoch: 2   Global Step: 33170   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:36:49,206-Speed 3405.31 samples/sec   Loss 8.4054   LearningRate 0.0751   Epoch: 2   Global Step: 33180   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:36:52,306-Speed 3304.59 samples/sec   Loss 8.3804   LearningRate 0.0751   Epoch: 2   Global Step: 33190   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:36:55,341-Speed 3374.60 samples/sec   Loss 8.4929   LearningRate 0.0751   Epoch: 2   Global Step: 33200   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:36:58,358-Speed 3395.31 samples/sec   Loss 8.4128   LearningRate 0.0751   Epoch: 2   Global Step: 33210   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:37:01,383-Speed 3386.09 samples/sec   Loss 8.2612   LearningRate 0.0750   Epoch: 2   Global Step: 33220   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:37:04,402-Speed 3393.87 samples/sec   Loss 8.3818   LearningRate 0.0750   Epoch: 2   Global Step: 33230   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:37:07,414-Speed 3400.45 samples/sec   Loss 8.3683   LearningRate 0.0750   Epoch: 2   Global Step: 33240   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:37:10,480-Speed 3340.39 samples/sec   Loss 8.3241   LearningRate 0.0750   Epoch: 2   Global Step: 33250   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:37:13,524-Speed 3365.79 samples/sec   Loss 8.4128   LearningRate 0.0750   Epoch: 2   Global Step: 33260   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:37:16,591-Speed 3339.67 samples/sec   Loss 8.2665   LearningRate 0.0750   Epoch: 2   Global Step: 33270   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:37:19,604-Speed 3399.28 samples/sec   Loss 8.3106   LearningRate 0.0750   Epoch: 2   Global Step: 33280   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:37:22,602-Speed 3417.35 samples/sec   Loss 8.2662   LearningRate 0.0750   Epoch: 2   Global Step: 33290   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:37:25,628-Speed 3385.58 samples/sec   Loss 8.3027   LearningRate 0.0750   Epoch: 2   Global Step: 33300   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:37:28,673-Speed 3363.29 samples/sec   Loss 8.2861   LearningRate 0.0750   Epoch: 2   Global Step: 33310   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:37:31,702-Speed 3381.95 samples/sec   Loss 8.3403   LearningRate 0.0750   Epoch: 2   Global Step: 33320   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:37:34,767-Speed 3341.56 samples/sec   Loss 8.2706   LearningRate 0.0750   Epoch: 2   Global Step: 33330   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:37:37,792-Speed 3386.12 samples/sec   Loss 8.3034   LearningRate 0.0750   Epoch: 2   Global Step: 33340   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:37:40,903-Speed 3292.70 samples/sec   Loss 8.4082   LearningRate 0.0750   Epoch: 2   Global Step: 33350   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:37:43,906-Speed 3410.81 samples/sec   Loss 8.3240   LearningRate 0.0749   Epoch: 2   Global Step: 33360   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:37:46,967-Speed 3346.73 samples/sec   Loss 8.3454   LearningRate 0.0749   Epoch: 2   Global Step: 33370   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:37:49,979-Speed 3400.67 samples/sec   Loss 8.2349   LearningRate 0.0749   Epoch: 2   Global Step: 33380   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:37:53,010-Speed 3379.76 samples/sec   Loss 8.3445   LearningRate 0.0749   Epoch: 2   Global Step: 33390   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:37:56,076-Speed 3341.30 samples/sec   Loss 8.3314   LearningRate 0.0749   Epoch: 2   Global Step: 33400   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:37:59,109-Speed 3377.13 samples/sec   Loss 8.3528   LearningRate 0.0749   Epoch: 2   Global Step: 33410   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:38:02,103-Speed 3421.52 samples/sec   Loss 8.3601   LearningRate 0.0749   Epoch: 2   Global Step: 33420   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:38:05,189-Speed 3318.44 samples/sec   Loss 8.2241   LearningRate 0.0749   Epoch: 2   Global Step: 33430   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:38:08,207-Speed 3394.60 samples/sec   Loss 8.4419   LearningRate 0.0749   Epoch: 2   Global Step: 33440   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:38:11,238-Speed 3379.18 samples/sec   Loss 8.3862   LearningRate 0.0749   Epoch: 2   Global Step: 33450   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:38:14,246-Speed 3404.93 samples/sec   Loss 8.3542   LearningRate 0.0749   Epoch: 2   Global Step: 33460   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:38:17,361-Speed 3289.11 samples/sec   Loss 8.3997   LearningRate 0.0749   Epoch: 2   Global Step: 33470   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:38:20,370-Speed 3404.61 samples/sec   Loss 8.3206   LearningRate 0.0749   Epoch: 2   Global Step: 33480   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:38:23,400-Speed 3380.10 samples/sec   Loss 8.2396   LearningRate 0.0749   Epoch: 2   Global Step: 33490   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:38:26,449-Speed 3359.31 samples/sec   Loss 8.2506   LearningRate 0.0748   Epoch: 2   Global Step: 33500   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:38:29,499-Speed 3358.79 samples/sec   Loss 8.2869   LearningRate 0.0748   Epoch: 2   Global Step: 33510   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:38:32,510-Speed 3402.34 samples/sec   Loss 8.3356   LearningRate 0.0748   Epoch: 2   Global Step: 33520   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:38:35,570-Speed 3347.07 samples/sec   Loss 8.3941   LearningRate 0.0748   Epoch: 2   Global Step: 33530   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:38:38,612-Speed 3367.49 samples/sec   Loss 8.3329   LearningRate 0.0748   Epoch: 2   Global Step: 33540   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:38:41,625-Speed 3399.95 samples/sec   Loss 8.3344   LearningRate 0.0748   Epoch: 2   Global Step: 33550   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:38:44,657-Speed 3377.71 samples/sec   Loss 8.3694   LearningRate 0.0748   Epoch: 2   Global Step: 33560   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:38:47,679-Speed 3390.22 samples/sec   Loss 8.1880   LearningRate 0.0748   Epoch: 2   Global Step: 33570   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:38:50,741-Speed 3344.89 samples/sec   Loss 8.3020   LearningRate 0.0748   Epoch: 2   Global Step: 33580   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:38:53,749-Speed 3405.71 samples/sec   Loss 8.4029   LearningRate 0.0748   Epoch: 2   Global Step: 33590   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:38:56,747-Speed 3417.26 samples/sec   Loss 8.2708   LearningRate 0.0748   Epoch: 2   Global Step: 33600   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:38:59,753-Speed 3407.89 samples/sec   Loss 8.4038   LearningRate 0.0748   Epoch: 2   Global Step: 33610   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:39:02,749-Speed 3418.58 samples/sec   Loss 8.2259   LearningRate 0.0748   Epoch: 2   Global Step: 33620   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:39:05,755-Speed 3408.25 samples/sec   Loss 8.2564   LearningRate 0.0748   Epoch: 2   Global Step: 33630   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:39:08,754-Speed 3415.55 samples/sec   Loss 8.2933   LearningRate 0.0748   Epoch: 2   Global Step: 33640   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:39:11,765-Speed 3401.88 samples/sec   Loss 8.2674   LearningRate 0.0747   Epoch: 2   Global Step: 33650   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:39:14,811-Speed 3362.61 samples/sec   Loss 8.2715   LearningRate 0.0747   Epoch: 2   Global Step: 33660   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:39:17,843-Speed 3378.01 samples/sec   Loss 8.3213   LearningRate 0.0747   Epoch: 2   Global Step: 33670   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:39:20,852-Speed 3405.05 samples/sec   Loss 8.2671   LearningRate 0.0747   Epoch: 2   Global Step: 33680   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:39:23,894-Speed 3367.37 samples/sec   Loss 8.1984   LearningRate 0.0747   Epoch: 2   Global Step: 33690   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:39:26,919-Speed 3385.48 samples/sec   Loss 8.3400   LearningRate 0.0747   Epoch: 2   Global Step: 33700   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:39:29,973-Speed 3353.92 samples/sec   Loss 8.3748   LearningRate 0.0747   Epoch: 2   Global Step: 33710   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:39:32,975-Speed 3412.69 samples/sec   Loss 8.2773   LearningRate 0.0747   Epoch: 2   Global Step: 33720   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:39:36,065-Speed 3315.05 samples/sec   Loss 8.3669   LearningRate 0.0747   Epoch: 2   Global Step: 33730   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:39:39,150-Speed 3320.12 samples/sec   Loss 8.3024   LearningRate 0.0747   Epoch: 2   Global Step: 33740   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:39:42,190-Speed 3369.47 samples/sec   Loss 8.2354   LearningRate 0.0747   Epoch: 2   Global Step: 33750   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:39:45,196-Speed 3407.25 samples/sec   Loss 8.2660   LearningRate 0.0747   Epoch: 2   Global Step: 33760   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:39:48,204-Speed 3405.85 samples/sec   Loss 8.2951   LearningRate 0.0747   Epoch: 2   Global Step: 33770   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:39:51,251-Speed 3361.55 samples/sec   Loss 8.2148   LearningRate 0.0747   Epoch: 2   Global Step: 33780   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:39:54,283-Speed 3378.72 samples/sec   Loss 8.4605   LearningRate 0.0746   Epoch: 2   Global Step: 33790   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:39:57,299-Speed 3396.48 samples/sec   Loss 8.3985   LearningRate 0.0746   Epoch: 2   Global Step: 33800   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:40:00,347-Speed 3360.27 samples/sec   Loss 8.3327   LearningRate 0.0746   Epoch: 2   Global Step: 33810   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:40:03,395-Speed 3360.64 samples/sec   Loss 8.3490   LearningRate 0.0746   Epoch: 2   Global Step: 33820   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:40:06,426-Speed 3379.69 samples/sec   Loss 8.2953   LearningRate 0.0746   Epoch: 2   Global Step: 33830   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:40:09,418-Speed 3423.62 samples/sec   Loss 8.2687   LearningRate 0.0746   Epoch: 2   Global Step: 33840   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:40:12,525-Speed 3296.92 samples/sec   Loss 8.3644   LearningRate 0.0746   Epoch: 2   Global Step: 33850   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:40:15,627-Speed 3302.12 samples/sec   Loss 8.4236   LearningRate 0.0746   Epoch: 2   Global Step: 33860   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:40:18,712-Speed 3320.33 samples/sec   Loss 8.2354   LearningRate 0.0746   Epoch: 2   Global Step: 33870   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:40:21,714-Speed 3411.54 samples/sec   Loss 8.3655   LearningRate 0.0746   Epoch: 2   Global Step: 33880   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:40:24,771-Speed 3351.53 samples/sec   Loss 8.2746   LearningRate 0.0746   Epoch: 2   Global Step: 33890   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:40:27,842-Speed 3335.11 samples/sec   Loss 8.4476   LearningRate 0.0746   Epoch: 2   Global Step: 33900   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:40:30,869-Speed 3384.02 samples/sec   Loss 8.2958   LearningRate 0.0746   Epoch: 2   Global Step: 33910   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:40:33,864-Speed 3420.17 samples/sec   Loss 8.3428   LearningRate 0.0746   Epoch: 2   Global Step: 33920   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:40:36,918-Speed 3354.34 samples/sec   Loss 8.4460   LearningRate 0.0745   Epoch: 2   Global Step: 33930   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:40:39,939-Speed 3391.07 samples/sec   Loss 8.3681   LearningRate 0.0745   Epoch: 2   Global Step: 33940   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:40:43,022-Speed 3321.46 samples/sec   Loss 8.3424   LearningRate 0.0745   Epoch: 2   Global Step: 33950   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:40:46,028-Speed 3408.27 samples/sec   Loss 8.1729   LearningRate 0.0745   Epoch: 2   Global Step: 33960   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:40:49,047-Speed 3392.37 samples/sec   Loss 8.2617   LearningRate 0.0745   Epoch: 2   Global Step: 33970   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:40:52,055-Speed 3406.29 samples/sec   Loss 8.2088   LearningRate 0.0745   Epoch: 2   Global Step: 33980   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:40:55,051-Speed 3418.93 samples/sec   Loss 8.4235   LearningRate 0.0745   Epoch: 2   Global Step: 33990   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:40:58,042-Speed 3425.45 samples/sec   Loss 8.3140   LearningRate 0.0745   Epoch: 2   Global Step: 34000   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:41:01,074-Speed 3377.73 samples/sec   Loss 8.2558   LearningRate 0.0745   Epoch: 2   Global Step: 34010   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:41:04,103-Speed 3382.21 samples/sec   Loss 8.2574   LearningRate 0.0745   Epoch: 2   Global Step: 34020   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:41:07,159-Speed 3351.69 samples/sec   Loss 8.3538   LearningRate 0.0745   Epoch: 2   Global Step: 34030   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:41:10,145-Speed 3430.52 samples/sec   Loss 8.3682   LearningRate 0.0745   Epoch: 2   Global Step: 34040   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:41:13,155-Speed 3402.92 samples/sec   Loss 8.3444   LearningRate 0.0745   Epoch: 2   Global Step: 34050   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:41:16,173-Speed 3394.52 samples/sec   Loss 8.3541   LearningRate 0.0745   Epoch: 2   Global Step: 34060   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:41:19,177-Speed 3409.75 samples/sec   Loss 8.2582   LearningRate 0.0745   Epoch: 2   Global Step: 34070   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:41:22,183-Speed 3407.80 samples/sec   Loss 8.2262   LearningRate 0.0744   Epoch: 2   Global Step: 34080   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:41:25,220-Speed 3372.21 samples/sec   Loss 8.4114   LearningRate 0.0744   Epoch: 2   Global Step: 34090   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:41:28,251-Speed 3379.97 samples/sec   Loss 8.1528   LearningRate 0.0744   Epoch: 2   Global Step: 34100   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:41:31,314-Speed 3343.75 samples/sec   Loss 8.3734   LearningRate 0.0744   Epoch: 2   Global Step: 34110   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:41:34,393-Speed 3327.59 samples/sec   Loss 8.2058   LearningRate 0.0744   Epoch: 2   Global Step: 34120   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:41:37,437-Speed 3364.37 samples/sec   Loss 8.5269   LearningRate 0.0744   Epoch: 2   Global Step: 34130   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:41:40,462-Speed 3386.14 samples/sec   Loss 8.2470   LearningRate 0.0744   Epoch: 2   Global Step: 34140   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:41:43,588-Speed 3276.58 samples/sec   Loss 8.3832   LearningRate 0.0744   Epoch: 2   Global Step: 34150   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:41:46,631-Speed 3366.75 samples/sec   Loss 8.2668   LearningRate 0.0744   Epoch: 2   Global Step: 34160   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:41:49,666-Speed 3374.81 samples/sec   Loss 8.2990   LearningRate 0.0744   Epoch: 2   Global Step: 34170   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:41:52,721-Speed 3352.99 samples/sec   Loss 8.2399   LearningRate 0.0744   Epoch: 2   Global Step: 34180   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:41:55,757-Speed 3373.65 samples/sec   Loss 8.2469   LearningRate 0.0744   Epoch: 2   Global Step: 34190   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 04:41:58,775-Speed 3394.25 samples/sec   Loss 8.2772   LearningRate 0.0744   Epoch: 2   Global Step: 34200   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:42:01,818-Speed 3366.78 samples/sec   Loss 8.2007   LearningRate 0.0744   Epoch: 2   Global Step: 34210   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:42:04,890-Speed 3334.35 samples/sec   Loss 8.2468   LearningRate 0.0743   Epoch: 2   Global Step: 34220   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:42:07,935-Speed 3363.96 samples/sec   Loss 8.2625   LearningRate 0.0743   Epoch: 2   Global Step: 34230   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:42:10,972-Speed 3372.68 samples/sec   Loss 8.2383   LearningRate 0.0743   Epoch: 2   Global Step: 34240   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:42:14,029-Speed 3351.20 samples/sec   Loss 8.1914   LearningRate 0.0743   Epoch: 2   Global Step: 34250   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:42:17,059-Speed 3381.26 samples/sec   Loss 8.1325   LearningRate 0.0743   Epoch: 2   Global Step: 34260   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:42:20,063-Speed 3409.65 samples/sec   Loss 8.2265   LearningRate 0.0743   Epoch: 2   Global Step: 34270   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:42:23,111-Speed 3360.31 samples/sec   Loss 8.4165   LearningRate 0.0743   Epoch: 2   Global Step: 34280   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:42:26,146-Speed 3375.33 samples/sec   Loss 8.2860   LearningRate 0.0743   Epoch: 2   Global Step: 34290   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:42:29,203-Speed 3351.65 samples/sec   Loss 8.3537   LearningRate 0.0743   Epoch: 2   Global Step: 34300   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:42:32,239-Speed 3373.18 samples/sec   Loss 8.3032   LearningRate 0.0743   Epoch: 2   Global Step: 34310   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:42:35,295-Speed 3351.89 samples/sec   Loss 8.2970   LearningRate 0.0743   Epoch: 2   Global Step: 34320   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:42:38,328-Speed 3377.38 samples/sec   Loss 8.3953   LearningRate 0.0743   Epoch: 2   Global Step: 34330   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:42:41,366-Speed 3372.47 samples/sec   Loss 8.1877   LearningRate 0.0743   Epoch: 2   Global Step: 34340   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:42:44,375-Speed 3403.27 samples/sec   Loss 8.3181   LearningRate 0.0743   Epoch: 2   Global Step: 34350   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:42:47,423-Speed 3361.39 samples/sec   Loss 8.3582   LearningRate 0.0743   Epoch: 2   Global Step: 34360   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:42:50,493-Speed 3335.46 samples/sec   Loss 8.2583   LearningRate 0.0742   Epoch: 2   Global Step: 34370   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:42:53,553-Speed 3348.09 samples/sec   Loss 8.3155   LearningRate 0.0742   Epoch: 2   Global Step: 34380   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:42:56,599-Speed 3362.25 samples/sec   Loss 8.2399   LearningRate 0.0742   Epoch: 2   Global Step: 34390   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:42:59,623-Speed 3387.55 samples/sec   Loss 8.1093   LearningRate 0.0742   Epoch: 2   Global Step: 34400   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:43:02,667-Speed 3365.52 samples/sec   Loss 8.2429   LearningRate 0.0742   Epoch: 2   Global Step: 34410   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:43:05,718-Speed 3356.96 samples/sec   Loss 8.2910   LearningRate 0.0742   Epoch: 2   Global Step: 34420   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:43:08,779-Speed 3346.71 samples/sec   Loss 8.3560   LearningRate 0.0742   Epoch: 2   Global Step: 34430   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:43:11,853-Speed 3332.11 samples/sec   Loss 8.3457   LearningRate 0.0742   Epoch: 2   Global Step: 34440   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:43:14,918-Speed 3341.40 samples/sec   Loss 8.2556   LearningRate 0.0742   Epoch: 2   Global Step: 34450   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:43:17,942-Speed 3388.10 samples/sec   Loss 8.2818   LearningRate 0.0742   Epoch: 2   Global Step: 34460   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:43:20,986-Speed 3364.00 samples/sec   Loss 8.4600   LearningRate 0.0742   Epoch: 2   Global Step: 34470   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:43:24,005-Speed 3393.30 samples/sec   Loss 8.2519   LearningRate 0.0742   Epoch: 2   Global Step: 34480   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:43:27,080-Speed 3330.81 samples/sec   Loss 8.2033   LearningRate 0.0742   Epoch: 2   Global Step: 34490   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:43:30,107-Speed 3384.67 samples/sec   Loss 8.1610   LearningRate 0.0742   Epoch: 2   Global Step: 34500   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:43:33,126-Speed 3393.16 samples/sec   Loss 8.2090   LearningRate 0.0741   Epoch: 2   Global Step: 34510   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:43:36,184-Speed 3349.49 samples/sec   Loss 8.2292   LearningRate 0.0741   Epoch: 2   Global Step: 34520   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:43:39,237-Speed 3355.07 samples/sec   Loss 8.3362   LearningRate 0.0741   Epoch: 2   Global Step: 34530   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:43:42,302-Speed 3341.39 samples/sec   Loss 8.3138   LearningRate 0.0741   Epoch: 2   Global Step: 34540   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:43:45,317-Speed 3398.51 samples/sec   Loss 8.1224   LearningRate 0.0741   Epoch: 2   Global Step: 34550   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:43:48,343-Speed 3384.73 samples/sec   Loss 8.3033   LearningRate 0.0741   Epoch: 2   Global Step: 34560   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:43:51,428-Speed 3320.08 samples/sec   Loss 8.2240   LearningRate 0.0741   Epoch: 2   Global Step: 34570   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:43:54,502-Speed 3331.74 samples/sec   Loss 8.2858   LearningRate 0.0741   Epoch: 2   Global Step: 34580   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:43:57,512-Speed 3403.06 samples/sec   Loss 8.4127   LearningRate 0.0741   Epoch: 2   Global Step: 34590   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:44:00,533-Speed 3390.58 samples/sec   Loss 8.2108   LearningRate 0.0741   Epoch: 2   Global Step: 34600   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:44:03,569-Speed 3374.17 samples/sec   Loss 8.2604   LearningRate 0.0741   Epoch: 2   Global Step: 34610   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:44:06,593-Speed 3387.17 samples/sec   Loss 8.2213   LearningRate 0.0741   Epoch: 2   Global Step: 34620   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:44:09,615-Speed 3389.44 samples/sec   Loss 8.2434   LearningRate 0.0741   Epoch: 2   Global Step: 34630   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:44:12,634-Speed 3392.72 samples/sec   Loss 8.2263   LearningRate 0.0741   Epoch: 2   Global Step: 34640   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:44:15,653-Speed 3393.85 samples/sec   Loss 8.2862   LearningRate 0.0740   Epoch: 2   Global Step: 34650   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:44:18,710-Speed 3350.72 samples/sec   Loss 8.1698   LearningRate 0.0740   Epoch: 2   Global Step: 34660   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:44:21,751-Speed 3368.39 samples/sec   Loss 8.2497   LearningRate 0.0740   Epoch: 2   Global Step: 34670   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:44:24,818-Speed 3339.17 samples/sec   Loss 8.4007   LearningRate 0.0740   Epoch: 2   Global Step: 34680   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:44:27,856-Speed 3372.00 samples/sec   Loss 8.2585   LearningRate 0.0740   Epoch: 2   Global Step: 34690   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:44:30,877-Speed 3391.19 samples/sec   Loss 8.3248   LearningRate 0.0740   Epoch: 2   Global Step: 34700   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:44:33,901-Speed 3387.70 samples/sec   Loss 8.2938   LearningRate 0.0740   Epoch: 2   Global Step: 34710   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:44:36,971-Speed 3335.73 samples/sec   Loss 8.2615   LearningRate 0.0740   Epoch: 2   Global Step: 34720   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:44:40,065-Speed 3310.73 samples/sec   Loss 8.2660   LearningRate 0.0740   Epoch: 2   Global Step: 34730   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:44:43,096-Speed 3378.91 samples/sec   Loss 8.3541   LearningRate 0.0740   Epoch: 2   Global Step: 34740   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:44:46,130-Speed 3376.46 samples/sec   Loss 8.2802   LearningRate 0.0740   Epoch: 2   Global Step: 34750   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:44:49,232-Speed 3302.88 samples/sec   Loss 8.3199   LearningRate 0.0740   Epoch: 2   Global Step: 34760   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:44:52,287-Speed 3351.90 samples/sec   Loss 8.2078   LearningRate 0.0740   Epoch: 2   Global Step: 34770   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:44:55,302-Speed 3397.25 samples/sec   Loss 8.4577   LearningRate 0.0740   Epoch: 2   Global Step: 34780   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:44:58,336-Speed 3377.29 samples/sec   Loss 8.2960   LearningRate 0.0740   Epoch: 2   Global Step: 34790   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:45:01,392-Speed 3351.52 samples/sec   Loss 8.3752   LearningRate 0.0739   Epoch: 2   Global Step: 34800   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:45:04,438-Speed 3363.27 samples/sec   Loss 8.2282   LearningRate 0.0739   Epoch: 2   Global Step: 34810   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:45:07,501-Speed 3343.97 samples/sec   Loss 8.3956   LearningRate 0.0739   Epoch: 2   Global Step: 34820   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:45:10,528-Speed 3384.17 samples/sec   Loss 8.3949   LearningRate 0.0739   Epoch: 2   Global Step: 34830   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:45:13,594-Speed 3340.54 samples/sec   Loss 8.2631   LearningRate 0.0739   Epoch: 2   Global Step: 34840   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:45:16,644-Speed 3359.31 samples/sec   Loss 8.0910   LearningRate 0.0739   Epoch: 2   Global Step: 34850   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:45:19,657-Speed 3398.82 samples/sec   Loss 8.1904   LearningRate 0.0739   Epoch: 2   Global Step: 34860   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:45:22,742-Speed 3320.07 samples/sec   Loss 8.2680   LearningRate 0.0739   Epoch: 2   Global Step: 34870   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:45:25,792-Speed 3358.82 samples/sec   Loss 8.3699   LearningRate 0.0739   Epoch: 2   Global Step: 34880   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:45:28,916-Speed 3278.87 samples/sec   Loss 8.3032   LearningRate 0.0739   Epoch: 2   Global Step: 34890   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:45:31,985-Speed 3337.81 samples/sec   Loss 8.3319   LearningRate 0.0739   Epoch: 2   Global Step: 34900   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:45:35,036-Speed 3357.20 samples/sec   Loss 8.2568   LearningRate 0.0739   Epoch: 2   Global Step: 34910   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:45:38,131-Speed 3310.00 samples/sec   Loss 8.1565   LearningRate 0.0739   Epoch: 2   Global Step: 34920   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:45:41,225-Speed 3310.75 samples/sec   Loss 8.3137   LearningRate 0.0739   Epoch: 2   Global Step: 34930   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:45:44,240-Speed 3397.46 samples/sec   Loss 8.1882   LearningRate 0.0738   Epoch: 2   Global Step: 34940   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:45:47,286-Speed 3363.31 samples/sec   Loss 8.3353   LearningRate 0.0738   Epoch: 2   Global Step: 34950   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:45:50,349-Speed 3344.29 samples/sec   Loss 8.2605   LearningRate 0.0738   Epoch: 2   Global Step: 34960   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:45:53,363-Speed 3397.35 samples/sec   Loss 8.3413   LearningRate 0.0738   Epoch: 2   Global Step: 34970   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:45:56,359-Speed 3420.34 samples/sec   Loss 8.2242   LearningRate 0.0738   Epoch: 2   Global Step: 34980   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:45:59,416-Speed 3350.37 samples/sec   Loss 8.2201   LearningRate 0.0738   Epoch: 2   Global Step: 34990   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:46:02,494-Speed 3328.28 samples/sec   Loss 8.3550   LearningRate 0.0738   Epoch: 2   Global Step: 35000   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:46:05,546-Speed 3355.92 samples/sec   Loss 8.2981   LearningRate 0.0738   Epoch: 2   Global Step: 35010   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:46:08,590-Speed 3364.97 samples/sec   Loss 8.2304   LearningRate 0.0738   Epoch: 2   Global Step: 35020   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:46:11,652-Speed 3345.47 samples/sec   Loss 8.1427   LearningRate 0.0738   Epoch: 2   Global Step: 35030   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:46:14,711-Speed 3348.42 samples/sec   Loss 8.2083   LearningRate 0.0738   Epoch: 2   Global Step: 35040   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:46:17,733-Speed 3389.87 samples/sec   Loss 8.1648   LearningRate 0.0738   Epoch: 2   Global Step: 35050   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:46:20,789-Speed 3351.17 samples/sec   Loss 8.1774   LearningRate 0.0738   Epoch: 2   Global Step: 35060   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:46:23,841-Speed 3356.43 samples/sec   Loss 8.0920   LearningRate 0.0738   Epoch: 2   Global Step: 35070   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:46:26,866-Speed 3386.39 samples/sec   Loss 8.2477   LearningRate 0.0738   Epoch: 2   Global Step: 35080   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:46:29,946-Speed 3324.92 samples/sec   Loss 8.1168   LearningRate 0.0737   Epoch: 2   Global Step: 35090   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:46:32,974-Speed 3384.02 samples/sec   Loss 8.3325   LearningRate 0.0737   Epoch: 2   Global Step: 35100   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:46:36,044-Speed 3335.80 samples/sec   Loss 8.2795   LearningRate 0.0737   Epoch: 2   Global Step: 35110   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:46:39,640-Speed 2848.24 samples/sec   Loss 8.1375   LearningRate 0.0737   Epoch: 2   Global Step: 35120   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:46:42,773-Speed 3269.69 samples/sec   Loss 8.2689   LearningRate 0.0737   Epoch: 2   Global Step: 35130   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:46:45,797-Speed 3387.24 samples/sec   Loss 8.1736   LearningRate 0.0737   Epoch: 2   Global Step: 35140   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:46:48,875-Speed 3328.59 samples/sec   Loss 8.2944   LearningRate 0.0737   Epoch: 2   Global Step: 35150   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:46:51,933-Speed 3349.11 samples/sec   Loss 8.1424   LearningRate 0.0737   Epoch: 2   Global Step: 35160   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:46:54,985-Speed 3356.40 samples/sec   Loss 8.2853   LearningRate 0.0737   Epoch: 2   Global Step: 35170   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:46:57,994-Speed 3403.73 samples/sec   Loss 8.2548   LearningRate 0.0737   Epoch: 2   Global Step: 35180   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:47:01,123-Speed 3274.32 samples/sec   Loss 8.3565   LearningRate 0.0737   Epoch: 2   Global Step: 35190   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:47:04,245-Speed 3280.33 samples/sec   Loss 8.1812   LearningRate 0.0737   Epoch: 2   Global Step: 35200   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:47:07,284-Speed 3371.49 samples/sec   Loss 8.1825   LearningRate 0.0737   Epoch: 2   Global Step: 35210   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:47:10,290-Speed 3406.83 samples/sec   Loss 8.1467   LearningRate 0.0737   Epoch: 2   Global Step: 35220   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:47:13,389-Speed 3305.77 samples/sec   Loss 8.2839   LearningRate 0.0736   Epoch: 2   Global Step: 35230   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:47:16,464-Speed 3330.57 samples/sec   Loss 8.2068   LearningRate 0.0736   Epoch: 2   Global Step: 35240   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:47:19,491-Speed 3384.11 samples/sec   Loss 8.2495   LearningRate 0.0736   Epoch: 2   Global Step: 35250   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:47:22,520-Speed 3381.67 samples/sec   Loss 8.2069   LearningRate 0.0736   Epoch: 2   Global Step: 35260   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:47:25,563-Speed 3366.93 samples/sec   Loss 8.2513   LearningRate 0.0736   Epoch: 2   Global Step: 35270   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:47:28,680-Speed 3286.17 samples/sec   Loss 8.1840   LearningRate 0.0736   Epoch: 2   Global Step: 35280   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:47:31,739-Speed 3347.83 samples/sec   Loss 8.2535   LearningRate 0.0736   Epoch: 2   Global Step: 35290   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:47:34,778-Speed 3371.58 samples/sec   Loss 8.2275   LearningRate 0.0736   Epoch: 2   Global Step: 35300   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:47:37,889-Speed 3292.11 samples/sec   Loss 8.1874   LearningRate 0.0736   Epoch: 2   Global Step: 35310   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:47:40,930-Speed 3368.27 samples/sec   Loss 8.2072   LearningRate 0.0736   Epoch: 2   Global Step: 35320   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:47:43,935-Speed 3409.74 samples/sec   Loss 8.1928   LearningRate 0.0736   Epoch: 2   Global Step: 35330   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:47:46,992-Speed 3350.88 samples/sec   Loss 8.1788   LearningRate 0.0736   Epoch: 2   Global Step: 35340   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:47:50,087-Speed 3309.47 samples/sec   Loss 8.1735   LearningRate 0.0736   Epoch: 2   Global Step: 35350   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:47:53,106-Speed 3392.85 samples/sec   Loss 8.3015   LearningRate 0.0736   Epoch: 2   Global Step: 35360   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:47:56,133-Speed 3383.39 samples/sec   Loss 8.1782   LearningRate 0.0736   Epoch: 2   Global Step: 35370   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:47:59,255-Speed 3281.52 samples/sec   Loss 8.2776   LearningRate 0.0735   Epoch: 2   Global Step: 35380   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:48:02,368-Speed 3290.66 samples/sec   Loss 8.2215   LearningRate 0.0735   Epoch: 2   Global Step: 35390   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:48:05,418-Speed 3358.52 samples/sec   Loss 8.2923   LearningRate 0.0735   Epoch: 2   Global Step: 35400   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:48:08,426-Speed 3405.21 samples/sec   Loss 8.1955   LearningRate 0.0735   Epoch: 2   Global Step: 35410   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:48:11,451-Speed 3386.08 samples/sec   Loss 8.1580   LearningRate 0.0735   Epoch: 2   Global Step: 35420   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:48:16,400-Speed 2069.87 samples/sec   Loss 8.2162   LearningRate 0.0735   Epoch: 2   Global Step: 35430   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:48:19,434-Speed 3375.37 samples/sec   Loss 8.1766   LearningRate 0.0735   Epoch: 2   Global Step: 35440   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:48:22,498-Speed 3343.09 samples/sec   Loss 8.1660   LearningRate 0.0735   Epoch: 2   Global Step: 35450   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:48:25,609-Speed 3293.02 samples/sec   Loss 8.3043   LearningRate 0.0735   Epoch: 2   Global Step: 35460   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:48:28,631-Speed 3389.23 samples/sec   Loss 8.2330   LearningRate 0.0735   Epoch: 2   Global Step: 35470   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:48:31,713-Speed 3323.60 samples/sec   Loss 8.1545   LearningRate 0.0735   Epoch: 2   Global Step: 35480   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:48:34,757-Speed 3365.99 samples/sec   Loss 8.2446   LearningRate 0.0735   Epoch: 2   Global Step: 35490   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:48:37,776-Speed 3392.97 samples/sec   Loss 8.1876   LearningRate 0.0735   Epoch: 2   Global Step: 35500   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:48:40,837-Speed 3345.71 samples/sec   Loss 8.2810   LearningRate 0.0735   Epoch: 2   Global Step: 35510   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:48:43,892-Speed 3352.91 samples/sec   Loss 8.2393   LearningRate 0.0734   Epoch: 2   Global Step: 35520   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:48:46,957-Speed 3341.97 samples/sec   Loss 8.4077   LearningRate 0.0734   Epoch: 2   Global Step: 35530   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:48:50,018-Speed 3346.54 samples/sec   Loss 8.2572   LearningRate 0.0734   Epoch: 2   Global Step: 35540   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:48:53,150-Speed 3270.22 samples/sec   Loss 8.1660   LearningRate 0.0734   Epoch: 2   Global Step: 35550   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:48:56,193-Speed 3366.49 samples/sec   Loss 8.1280   LearningRate 0.0734   Epoch: 2   Global Step: 35560   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:48:59,201-Speed 3405.55 samples/sec   Loss 8.1137   LearningRate 0.0734   Epoch: 2   Global Step: 35570   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:49:02,239-Speed 3371.32 samples/sec   Loss 8.0892   LearningRate 0.0734   Epoch: 2   Global Step: 35580   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:49:05,295-Speed 3351.64 samples/sec   Loss 8.2026   LearningRate 0.0734   Epoch: 2   Global Step: 35590   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:49:08,306-Speed 3401.86 samples/sec   Loss 8.1216   LearningRate 0.0734   Epoch: 2   Global Step: 35600   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:49:11,326-Speed 3391.38 samples/sec   Loss 8.2234   LearningRate 0.0734   Epoch: 2   Global Step: 35610   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:49:14,423-Speed 3308.48 samples/sec   Loss 8.2962   LearningRate 0.0734   Epoch: 2   Global Step: 35620   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:49:17,514-Speed 3313.71 samples/sec   Loss 8.2567   LearningRate 0.0734   Epoch: 2   Global Step: 35630   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:49:20,550-Speed 3373.66 samples/sec   Loss 8.1400   LearningRate 0.0734   Epoch: 2   Global Step: 35640   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:49:23,572-Speed 3389.07 samples/sec   Loss 8.0452   LearningRate 0.0734   Epoch: 2   Global Step: 35650   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:49:26,660-Speed 3317.64 samples/sec   Loss 8.2012   LearningRate 0.0734   Epoch: 2   Global Step: 35660   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:49:29,691-Speed 3379.64 samples/sec   Loss 8.0920   LearningRate 0.0733   Epoch: 2   Global Step: 35670   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:49:32,705-Speed 3398.73 samples/sec   Loss 8.0894   LearningRate 0.0733   Epoch: 2   Global Step: 35680   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:49:35,727-Speed 3389.19 samples/sec   Loss 8.2822   LearningRate 0.0733   Epoch: 2   Global Step: 35690   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:49:38,748-Speed 3390.57 samples/sec   Loss 8.1913   LearningRate 0.0733   Epoch: 2   Global Step: 35700   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:49:41,801-Speed 3355.47 samples/sec   Loss 8.2147   LearningRate 0.0733   Epoch: 2   Global Step: 35710   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:49:44,849-Speed 3360.64 samples/sec   Loss 8.2333   LearningRate 0.0733   Epoch: 2   Global Step: 35720   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:49:47,967-Speed 3285.88 samples/sec   Loss 8.2121   LearningRate 0.0733   Epoch: 2   Global Step: 35730   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:49:51,048-Speed 3324.65 samples/sec   Loss 8.1845   LearningRate 0.0733   Epoch: 2   Global Step: 35740   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:49:54,113-Speed 3341.22 samples/sec   Loss 8.1100   LearningRate 0.0733   Epoch: 2   Global Step: 35750   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:49:57,136-Speed 3388.93 samples/sec   Loss 8.3004   LearningRate 0.0733   Epoch: 2   Global Step: 35760   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:50:00,225-Speed 3315.97 samples/sec   Loss 8.3301   LearningRate 0.0733   Epoch: 2   Global Step: 35770   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:50:03,310-Speed 3320.49 samples/sec   Loss 8.2476   LearningRate 0.0733   Epoch: 2   Global Step: 35780   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:50:06,364-Speed 3353.66 samples/sec   Loss 8.2583   LearningRate 0.0733   Epoch: 2   Global Step: 35790   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:50:09,417-Speed 3356.23 samples/sec   Loss 8.0862   LearningRate 0.0733   Epoch: 2   Global Step: 35800   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:50:12,488-Speed 3334.97 samples/sec   Loss 8.1399   LearningRate 0.0732   Epoch: 2   Global Step: 35810   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:50:15,513-Speed 3386.29 samples/sec   Loss 8.2299   LearningRate 0.0732   Epoch: 2   Global Step: 35820   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:50:18,535-Speed 3389.03 samples/sec   Loss 8.2712   LearningRate 0.0732   Epoch: 2   Global Step: 35830   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:50:21,551-Speed 3397.19 samples/sec   Loss 8.3024   LearningRate 0.0732   Epoch: 2   Global Step: 35840   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:50:24,621-Speed 3336.79 samples/sec   Loss 8.2037   LearningRate 0.0732   Epoch: 2   Global Step: 35850   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:50:27,688-Speed 3340.46 samples/sec   Loss 8.1652   LearningRate 0.0732   Epoch: 2   Global Step: 35860   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:50:30,732-Speed 3365.01 samples/sec   Loss 8.2979   LearningRate 0.0732   Epoch: 2   Global Step: 35870   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:50:33,767-Speed 3374.32 samples/sec   Loss 8.0774   LearningRate 0.0732   Epoch: 2   Global Step: 35880   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:50:36,820-Speed 3356.50 samples/sec   Loss 8.2712   LearningRate 0.0732   Epoch: 2   Global Step: 35890   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:50:39,879-Speed 3348.69 samples/sec   Loss 8.2077   LearningRate 0.0732   Epoch: 2   Global Step: 35900   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:50:42,952-Speed 3333.28 samples/sec   Loss 8.2570   LearningRate 0.0732   Epoch: 2   Global Step: 35910   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:50:45,966-Speed 3398.18 samples/sec   Loss 8.1519   LearningRate 0.0732   Epoch: 2   Global Step: 35920   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:50:48,981-Speed 3397.46 samples/sec   Loss 8.1826   LearningRate 0.0732   Epoch: 2   Global Step: 35930   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:50:52,100-Speed 3283.45 samples/sec   Loss 8.0990   LearningRate 0.0732   Epoch: 2   Global Step: 35940   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:50:55,183-Speed 3323.69 samples/sec   Loss 8.1283   LearningRate 0.0732   Epoch: 2   Global Step: 35950   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:50:58,200-Speed 3394.95 samples/sec   Loss 8.2236   LearningRate 0.0731   Epoch: 2   Global Step: 35960   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:51:01,233-Speed 3377.03 samples/sec   Loss 8.2368   LearningRate 0.0731   Epoch: 2   Global Step: 35970   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:51:04,300-Speed 3340.38 samples/sec   Loss 8.1904   LearningRate 0.0731   Epoch: 2   Global Step: 35980   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:51:07,338-Speed 3371.51 samples/sec   Loss 8.2688   LearningRate 0.0731   Epoch: 2   Global Step: 35990   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:51:10,340-Speed 3411.78 samples/sec   Loss 8.1452   LearningRate 0.0731   Epoch: 2   Global Step: 36000   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:51:13,409-Speed 3338.03 samples/sec   Loss 8.1914   LearningRate 0.0731   Epoch: 2   Global Step: 36010   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:51:16,417-Speed 3405.14 samples/sec   Loss 8.3227   LearningRate 0.0731   Epoch: 2   Global Step: 36020   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:51:19,438-Speed 3391.63 samples/sec   Loss 8.1554   LearningRate 0.0731   Epoch: 2   Global Step: 36030   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:51:22,471-Speed 3376.41 samples/sec   Loss 8.2160   LearningRate 0.0731   Epoch: 2   Global Step: 36040   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:51:25,551-Speed 3326.77 samples/sec   Loss 8.2507   LearningRate 0.0731   Epoch: 2   Global Step: 36050   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:51:28,565-Speed 3398.48 samples/sec   Loss 8.1929   LearningRate 0.0731   Epoch: 2   Global Step: 36060   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:51:31,662-Speed 3306.90 samples/sec   Loss 8.2300   LearningRate 0.0731   Epoch: 2   Global Step: 36070   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:51:34,686-Speed 3387.99 samples/sec   Loss 8.1084   LearningRate 0.0731   Epoch: 2   Global Step: 36080   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:51:37,753-Speed 3339.01 samples/sec   Loss 8.0509   LearningRate 0.0731   Epoch: 2   Global Step: 36090   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:51:40,810-Speed 3351.23 samples/sec   Loss 8.1253   LearningRate 0.0730   Epoch: 2   Global Step: 36100   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:51:43,880-Speed 3336.20 samples/sec   Loss 8.1094   LearningRate 0.0730   Epoch: 2   Global Step: 36110   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:51:46,918-Speed 3372.53 samples/sec   Loss 8.1796   LearningRate 0.0730   Epoch: 2   Global Step: 36120   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:51:49,942-Speed 3386.68 samples/sec   Loss 8.1634   LearningRate 0.0730   Epoch: 2   Global Step: 36130   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:51:53,047-Speed 3299.65 samples/sec   Loss 8.1585   LearningRate 0.0730   Epoch: 2   Global Step: 36140   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:51:56,119-Speed 3333.96 samples/sec   Loss 8.1443   LearningRate 0.0730   Epoch: 2   Global Step: 36150   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:51:59,159-Speed 3368.91 samples/sec   Loss 8.1658   LearningRate 0.0730   Epoch: 2   Global Step: 36160   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:52:02,253-Speed 3311.17 samples/sec   Loss 8.1140   LearningRate 0.0730   Epoch: 2   Global Step: 36170   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:52:05,302-Speed 3359.16 samples/sec   Loss 8.0717   LearningRate 0.0730   Epoch: 2   Global Step: 36180   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:52:08,320-Speed 3394.69 samples/sec   Loss 8.1851   LearningRate 0.0730   Epoch: 2   Global Step: 36190   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:52:11,378-Speed 3349.55 samples/sec   Loss 8.0545   LearningRate 0.0730   Epoch: 2   Global Step: 36200   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:52:14,440-Speed 3345.62 samples/sec   Loss 8.0724   LearningRate 0.0730   Epoch: 2   Global Step: 36210   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:52:17,524-Speed 3321.89 samples/sec   Loss 8.0163   LearningRate 0.0730   Epoch: 2   Global Step: 36220   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:52:20,568-Speed 3364.73 samples/sec   Loss 8.1979   LearningRate 0.0730   Epoch: 2   Global Step: 36230   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 04:52:23,560-Speed 3423.68 samples/sec   Loss 8.1399   LearningRate 0.0730   Epoch: 2   Global Step: 36240   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:52:26,690-Speed 3273.23 samples/sec   Loss 8.1455   LearningRate 0.0729   Epoch: 2   Global Step: 36250   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:52:29,707-Speed 3394.74 samples/sec   Loss 8.0190   LearningRate 0.0729   Epoch: 2   Global Step: 36260   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:52:32,719-Speed 3400.74 samples/sec   Loss 8.2482   LearningRate 0.0729   Epoch: 2   Global Step: 36270   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:52:35,747-Speed 3383.58 samples/sec   Loss 8.0631   LearningRate 0.0729   Epoch: 2   Global Step: 36280   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:52:38,777-Speed 3381.21 samples/sec   Loss 8.0941   LearningRate 0.0729   Epoch: 2   Global Step: 36290   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:52:41,850-Speed 3332.55 samples/sec   Loss 8.1935   LearningRate 0.0729   Epoch: 2   Global Step: 36300   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:52:44,894-Speed 3364.54 samples/sec   Loss 8.0874   LearningRate 0.0729   Epoch: 2   Global Step: 36310   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:52:47,942-Speed 3361.73 samples/sec   Loss 8.1215   LearningRate 0.0729   Epoch: 2   Global Step: 36320   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:52:50,961-Speed 3392.72 samples/sec   Loss 8.1867   LearningRate 0.0729   Epoch: 2   Global Step: 36330   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:52:53,975-Speed 3399.33 samples/sec   Loss 8.2645   LearningRate 0.0729   Epoch: 2   Global Step: 36340   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:52:57,003-Speed 3382.99 samples/sec   Loss 8.0567   LearningRate 0.0729   Epoch: 2   Global Step: 36350   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:53:00,078-Speed 3330.66 samples/sec   Loss 8.0956   LearningRate 0.0729   Epoch: 2   Global Step: 36360   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:53:03,093-Speed 3397.40 samples/sec   Loss 7.9307   LearningRate 0.0729   Epoch: 2   Global Step: 36370   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:53:06,098-Speed 3408.76 samples/sec   Loss 8.2209   LearningRate 0.0729   Epoch: 2   Global Step: 36380   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:53:09,121-Speed 3388.43 samples/sec   Loss 7.9954   LearningRate 0.0728   Epoch: 2   Global Step: 36390   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:53:12,131-Speed 3403.74 samples/sec   Loss 8.1259   LearningRate 0.0728   Epoch: 2   Global Step: 36400   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:53:15,213-Speed 3323.82 samples/sec   Loss 8.2049   LearningRate 0.0728   Epoch: 2   Global Step: 36410   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:53:18,229-Speed 3396.95 samples/sec   Loss 8.2033   LearningRate 0.0728   Epoch: 2   Global Step: 36420   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:53:21,280-Speed 3357.52 samples/sec   Loss 8.2099   LearningRate 0.0728   Epoch: 2   Global Step: 36430   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:53:24,334-Speed 3354.03 samples/sec   Loss 8.1099   LearningRate 0.0728   Epoch: 2   Global Step: 36440   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:53:27,378-Speed 3365.04 samples/sec   Loss 8.0891   LearningRate 0.0728   Epoch: 2   Global Step: 36450   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:53:30,405-Speed 3383.74 samples/sec   Loss 8.1366   LearningRate 0.0728   Epoch: 2   Global Step: 36460   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:53:33,420-Speed 3397.49 samples/sec   Loss 8.1498   LearningRate 0.0728   Epoch: 2   Global Step: 36470   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:53:36,449-Speed 3382.17 samples/sec   Loss 7.9756   LearningRate 0.0728   Epoch: 2   Global Step: 36480   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:53:39,584-Speed 3266.66 samples/sec   Loss 8.0520   LearningRate 0.0728   Epoch: 2   Global Step: 36490   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:53:42,673-Speed 3316.43 samples/sec   Loss 8.0392   LearningRate 0.0728   Epoch: 2   Global Step: 36500   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:53:45,704-Speed 3379.88 samples/sec   Loss 8.1757   LearningRate 0.0728   Epoch: 2   Global Step: 36510   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:53:48,773-Speed 3337.35 samples/sec   Loss 8.0913   LearningRate 0.0728   Epoch: 2   Global Step: 36520   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:53:51,852-Speed 3327.19 samples/sec   Loss 8.1665   LearningRate 0.0728   Epoch: 2   Global Step: 36530   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:53:54,876-Speed 3387.42 samples/sec   Loss 8.1906   LearningRate 0.0727   Epoch: 2   Global Step: 36540   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:53:57,924-Speed 3360.53 samples/sec   Loss 8.0870   LearningRate 0.0727   Epoch: 2   Global Step: 36550   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:54:00,956-Speed 3378.55 samples/sec   Loss 8.1727   LearningRate 0.0727   Epoch: 2   Global Step: 36560   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:54:04,041-Speed 3320.48 samples/sec   Loss 8.1201   LearningRate 0.0727   Epoch: 2   Global Step: 36570   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:54:07,079-Speed 3371.87 samples/sec   Loss 8.0214   LearningRate 0.0727   Epoch: 2   Global Step: 36580   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:54:10,127-Speed 3361.01 samples/sec   Loss 8.0938   LearningRate 0.0727   Epoch: 2   Global Step: 36590   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:54:13,125-Speed 3416.44 samples/sec   Loss 8.1319   LearningRate 0.0727   Epoch: 2   Global Step: 36600   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:54:16,254-Speed 3273.39 samples/sec   Loss 8.1086   LearningRate 0.0727   Epoch: 2   Global Step: 36610   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:54:19,290-Speed 3374.46 samples/sec   Loss 8.1478   LearningRate 0.0727   Epoch: 2   Global Step: 36620   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:54:22,293-Speed 3411.39 samples/sec   Loss 8.1995   LearningRate 0.0727   Epoch: 2   Global Step: 36630   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:54:25,312-Speed 3392.13 samples/sec   Loss 8.1454   LearningRate 0.0727   Epoch: 2   Global Step: 36640   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:54:28,340-Speed 3383.73 samples/sec   Loss 7.9863   LearningRate 0.0727   Epoch: 2   Global Step: 36650   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:54:31,401-Speed 3346.02 samples/sec   Loss 8.1837   LearningRate 0.0727   Epoch: 2   Global Step: 36660   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:54:34,418-Speed 3395.19 samples/sec   Loss 8.2089   LearningRate 0.0727   Epoch: 2   Global Step: 36670   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:54:37,453-Speed 3374.63 samples/sec   Loss 8.1603   LearningRate 0.0726   Epoch: 2   Global Step: 36680   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:54:40,456-Speed 3410.65 samples/sec   Loss 8.1061   LearningRate 0.0726   Epoch: 2   Global Step: 36690   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:54:43,524-Speed 3339.75 samples/sec   Loss 8.2177   LearningRate 0.0726   Epoch: 2   Global Step: 36700   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:54:46,527-Speed 3411.00 samples/sec   Loss 8.2093   LearningRate 0.0726   Epoch: 2   Global Step: 36710   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:54:49,658-Speed 3271.45 samples/sec   Loss 8.2006   LearningRate 0.0726   Epoch: 2   Global Step: 36720   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:54:52,683-Speed 3386.43 samples/sec   Loss 8.2017   LearningRate 0.0726   Epoch: 2   Global Step: 36730   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:54:55,729-Speed 3362.58 samples/sec   Loss 8.1259   LearningRate 0.0726   Epoch: 2   Global Step: 36740   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:54:58,747-Speed 3394.19 samples/sec   Loss 8.0456   LearningRate 0.0726   Epoch: 2   Global Step: 36750   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:55:01,767-Speed 3391.41 samples/sec   Loss 8.0217   LearningRate 0.0726   Epoch: 2   Global Step: 36760   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:55:04,846-Speed 3327.20 samples/sec   Loss 7.9517   LearningRate 0.0726   Epoch: 2   Global Step: 36770   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:55:07,886-Speed 3369.41 samples/sec   Loss 8.0379   LearningRate 0.0726   Epoch: 2   Global Step: 36780   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:55:10,919-Speed 3376.30 samples/sec   Loss 8.2551   LearningRate 0.0726   Epoch: 2   Global Step: 36790   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:55:13,990-Speed 3336.61 samples/sec   Loss 8.0579   LearningRate 0.0726   Epoch: 2   Global Step: 36800   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:55:17,017-Speed 3383.27 samples/sec   Loss 8.0713   LearningRate 0.0726   Epoch: 2   Global Step: 36810   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:55:20,098-Speed 3324.82 samples/sec   Loss 8.0705   LearningRate 0.0726   Epoch: 2   Global Step: 36820   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:55:23,131-Speed 3377.29 samples/sec   Loss 8.1302   LearningRate 0.0725   Epoch: 2   Global Step: 36830   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:55:26,154-Speed 3388.35 samples/sec   Loss 8.1249   LearningRate 0.0725   Epoch: 2   Global Step: 36840   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:55:29,218-Speed 3343.50 samples/sec   Loss 8.0824   LearningRate 0.0725   Epoch: 2   Global Step: 36850   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:55:32,301-Speed 3322.98 samples/sec   Loss 8.0607   LearningRate 0.0725   Epoch: 2   Global Step: 36860   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:55:35,381-Speed 3325.69 samples/sec   Loss 8.0761   LearningRate 0.0725   Epoch: 2   Global Step: 36870   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:55:38,423-Speed 3367.42 samples/sec   Loss 8.1426   LearningRate 0.0725   Epoch: 2   Global Step: 36880   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:55:41,448-Speed 3385.66 samples/sec   Loss 8.1852   LearningRate 0.0725   Epoch: 2   Global Step: 36890   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:55:44,486-Speed 3372.22 samples/sec   Loss 8.1795   LearningRate 0.0725   Epoch: 2   Global Step: 36900   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:55:47,518-Speed 3377.95 samples/sec   Loss 8.2937   LearningRate 0.0725   Epoch: 2   Global Step: 36910   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:55:50,524-Speed 3406.83 samples/sec   Loss 8.0160   LearningRate 0.0725   Epoch: 2   Global Step: 36920   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:55:53,544-Speed 3392.69 samples/sec   Loss 8.0851   LearningRate 0.0725   Epoch: 2   Global Step: 36930   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:55:56,536-Speed 3423.62 samples/sec   Loss 8.0984   LearningRate 0.0725   Epoch: 2   Global Step: 36940   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:55:59,580-Speed 3364.33 samples/sec   Loss 8.0518   LearningRate 0.0725   Epoch: 2   Global Step: 36950   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:56:02,603-Speed 3388.53 samples/sec   Loss 8.0906   LearningRate 0.0725   Epoch: 2   Global Step: 36960   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:56:05,619-Speed 3397.43 samples/sec   Loss 8.0705   LearningRate 0.0725   Epoch: 2   Global Step: 36970   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:56:08,606-Speed 3429.14 samples/sec   Loss 8.1035   LearningRate 0.0724   Epoch: 2   Global Step: 36980   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:56:11,660-Speed 3353.69 samples/sec   Loss 8.1113   LearningRate 0.0724   Epoch: 2   Global Step: 36990   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:56:14,701-Speed 3367.82 samples/sec   Loss 8.0413   LearningRate 0.0724   Epoch: 2   Global Step: 37000   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:56:17,766-Speed 3342.79 samples/sec   Loss 8.0781   LearningRate 0.0724   Epoch: 2   Global Step: 37010   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:56:20,784-Speed 3393.26 samples/sec   Loss 8.1792   LearningRate 0.0724   Epoch: 2   Global Step: 37020   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:56:23,843-Speed 3348.75 samples/sec   Loss 8.1658   LearningRate 0.0724   Epoch: 2   Global Step: 37030   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:56:26,881-Speed 3372.38 samples/sec   Loss 8.1618   LearningRate 0.0724   Epoch: 2   Global Step: 37040   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:56:29,922-Speed 3367.90 samples/sec   Loss 8.1088   LearningRate 0.0724   Epoch: 2   Global Step: 37050   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:56:32,967-Speed 3364.12 samples/sec   Loss 8.1203   LearningRate 0.0724   Epoch: 2   Global Step: 37060   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 04:56:36,064-Speed 3307.94 samples/sec   Loss 8.0484   LearningRate 0.0724   Epoch: 2   Global Step: 37070   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:56:39,129-Speed 3341.76 samples/sec   Loss 8.0611   LearningRate 0.0724   Epoch: 2   Global Step: 37080   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:56:42,146-Speed 3395.19 samples/sec   Loss 8.1263   LearningRate 0.0724   Epoch: 2   Global Step: 37090   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:56:45,154-Speed 3405.46 samples/sec   Loss 8.3563   LearningRate 0.0724   Epoch: 2   Global Step: 37100   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:56:48,170-Speed 3396.21 samples/sec   Loss 8.1166   LearningRate 0.0724   Epoch: 2   Global Step: 37110   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:56:51,212-Speed 3367.42 samples/sec   Loss 8.0985   LearningRate 0.0723   Epoch: 2   Global Step: 37120   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:56:54,241-Speed 3381.29 samples/sec   Loss 8.0528   LearningRate 0.0723   Epoch: 2   Global Step: 37130   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:56:57,288-Speed 3361.86 samples/sec   Loss 7.9912   LearningRate 0.0723   Epoch: 2   Global Step: 37140   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:57:00,288-Speed 3414.29 samples/sec   Loss 8.0117   LearningRate 0.0723   Epoch: 2   Global Step: 37150   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:57:03,333-Speed 3364.94 samples/sec   Loss 8.0628   LearningRate 0.0723   Epoch: 2   Global Step: 37160   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:57:06,412-Speed 3326.41 samples/sec   Loss 8.1157   LearningRate 0.0723   Epoch: 2   Global Step: 37170   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:57:09,410-Speed 3416.23 samples/sec   Loss 8.0448   LearningRate 0.0723   Epoch: 2   Global Step: 37180   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:57:12,446-Speed 3374.81 samples/sec   Loss 8.1454   LearningRate 0.0723   Epoch: 2   Global Step: 37190   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:57:15,541-Speed 3309.73 samples/sec   Loss 8.1489   LearningRate 0.0723   Epoch: 2   Global Step: 37200   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:57:18,551-Speed 3402.38 samples/sec   Loss 8.0277   LearningRate 0.0723   Epoch: 2   Global Step: 37210   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:57:21,574-Speed 3388.91 samples/sec   Loss 7.8973   LearningRate 0.0723   Epoch: 2   Global Step: 37220   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:57:24,604-Speed 3379.98 samples/sec   Loss 8.0044   LearningRate 0.0723   Epoch: 2   Global Step: 37230   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:57:27,636-Speed 3379.17 samples/sec   Loss 8.0340   LearningRate 0.0723   Epoch: 2   Global Step: 37240   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:57:30,657-Speed 3390.23 samples/sec   Loss 8.2785   LearningRate 0.0723   Epoch: 2   Global Step: 37250   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:57:33,884-Speed 3173.97 samples/sec   Loss 8.0983   LearningRate 0.0723   Epoch: 2   Global Step: 37260   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:58:05,237-Speed 326.62 samples/sec   Loss 6.9871   LearningRate 0.0722   Epoch: 3   Global Step: 37270   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:58:08,737-Speed 2926.75 samples/sec   Loss 6.5843   LearningRate 0.0722   Epoch: 3   Global Step: 37280   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:58:11,783-Speed 3362.14 samples/sec   Loss 6.4734   LearningRate 0.0722   Epoch: 3   Global Step: 37290   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:58:14,891-Speed 3296.55 samples/sec   Loss 6.3784   LearningRate 0.0722   Epoch: 3   Global Step: 37300   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:58:17,949-Speed 3350.09 samples/sec   Loss 6.4153   LearningRate 0.0722   Epoch: 3   Global Step: 37310   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:58:20,963-Speed 3398.12 samples/sec   Loss 6.5037   LearningRate 0.0722   Epoch: 3   Global Step: 37320   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:58:24,038-Speed 3330.64 samples/sec   Loss 6.4025   LearningRate 0.0722   Epoch: 3   Global Step: 37330   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:58:27,055-Speed 3395.70 samples/sec   Loss 6.4574   LearningRate 0.0722   Epoch: 3   Global Step: 37340   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:58:30,061-Speed 3407.11 samples/sec   Loss 6.4113   LearningRate 0.0722   Epoch: 3   Global Step: 37350   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:58:33,079-Speed 3394.87 samples/sec   Loss 6.3736   LearningRate 0.0722   Epoch: 3   Global Step: 37360   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:58:36,116-Speed 3372.99 samples/sec   Loss 6.4045   LearningRate 0.0722   Epoch: 3   Global Step: 37370   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:58:39,166-Speed 3357.97 samples/sec   Loss 6.3196   LearningRate 0.0722   Epoch: 3   Global Step: 37380   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:58:42,262-Speed 3309.05 samples/sec   Loss 6.4262   LearningRate 0.0722   Epoch: 3   Global Step: 37390   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:58:45,261-Speed 3415.55 samples/sec   Loss 6.5546   LearningRate 0.0722   Epoch: 3   Global Step: 37400   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:58:48,280-Speed 3392.60 samples/sec   Loss 6.4501   LearningRate 0.0721   Epoch: 3   Global Step: 37410   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:58:51,358-Speed 3328.15 samples/sec   Loss 6.4731   LearningRate 0.0721   Epoch: 3   Global Step: 37420   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:58:54,382-Speed 3387.21 samples/sec   Loss 6.4842   LearningRate 0.0721   Epoch: 3   Global Step: 37430   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:58:57,381-Speed 3415.77 samples/sec   Loss 6.4561   LearningRate 0.0721   Epoch: 3   Global Step: 37440   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:59:00,441-Speed 3347.77 samples/sec   Loss 6.4149   LearningRate 0.0721   Epoch: 3   Global Step: 37450   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:59:03,476-Speed 3374.81 samples/sec   Loss 6.5143   LearningRate 0.0721   Epoch: 3   Global Step: 37460   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:59:06,535-Speed 3347.98 samples/sec   Loss 6.3945   LearningRate 0.0721   Epoch: 3   Global Step: 37470   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:59:09,556-Speed 3391.13 samples/sec   Loss 6.4545   LearningRate 0.0721   Epoch: 3   Global Step: 37480   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:59:12,561-Speed 3408.16 samples/sec   Loss 6.4055   LearningRate 0.0721   Epoch: 3   Global Step: 37490   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:59:15,646-Speed 3320.68 samples/sec   Loss 6.5315   LearningRate 0.0721   Epoch: 3   Global Step: 37500   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:59:18,731-Speed 3319.98 samples/sec   Loss 6.5518   LearningRate 0.0721   Epoch: 3   Global Step: 37510   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:59:21,768-Speed 3373.46 samples/sec   Loss 6.5634   LearningRate 0.0721   Epoch: 3   Global Step: 37520   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-04-27 04:59:24,811-Speed 3365.53 samples/sec   Loss 6.5250   LearningRate 0.0721   Epoch: 3   Global Step: 37530   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:59:27,861-Speed 3358.94 samples/sec   Loss 6.4222   LearningRate 0.0721   Epoch: 3   Global Step: 37540   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:59:30,926-Speed 3341.62 samples/sec   Loss 6.4483   LearningRate 0.0721   Epoch: 3   Global Step: 37550   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:59:33,965-Speed 3371.02 samples/sec   Loss 6.5204   LearningRate 0.0720   Epoch: 3   Global Step: 37560   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:59:37,025-Speed 3347.34 samples/sec   Loss 6.4790   LearningRate 0.0720   Epoch: 3   Global Step: 37570   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:59:40,088-Speed 3343.99 samples/sec   Loss 6.5234   LearningRate 0.0720   Epoch: 3   Global Step: 37580   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:59:43,127-Speed 3370.37 samples/sec   Loss 6.5625   LearningRate 0.0720   Epoch: 3   Global Step: 37590   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:59:46,192-Speed 3342.83 samples/sec   Loss 6.5597   LearningRate 0.0720   Epoch: 3   Global Step: 37600   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 04:59:49,254-Speed 3344.92 samples/sec   Loss 6.5054   LearningRate 0.0720   Epoch: 3   Global Step: 37610   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:59:52,305-Speed 3357.86 samples/sec   Loss 6.6064   LearningRate 0.0720   Epoch: 3   Global Step: 37620   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:59:55,366-Speed 3346.08 samples/sec   Loss 6.5634   LearningRate 0.0720   Epoch: 3   Global Step: 37630   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 04:59:58,393-Speed 3384.28 samples/sec   Loss 6.5832   LearningRate 0.0720   Epoch: 3   Global Step: 37640   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:00:01,446-Speed 3355.23 samples/sec   Loss 6.5040   LearningRate 0.0720   Epoch: 3   Global Step: 37650   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:00:04,491-Speed 3363.23 samples/sec   Loss 6.5711   LearningRate 0.0720   Epoch: 3   Global Step: 37660   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:00:07,541-Speed 3359.07 samples/sec   Loss 6.4350   LearningRate 0.0720   Epoch: 3   Global Step: 37670   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:00:10,547-Speed 3407.52 samples/sec   Loss 6.5994   LearningRate 0.0720   Epoch: 3   Global Step: 37680   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:00:13,608-Speed 3346.21 samples/sec   Loss 6.5329   LearningRate 0.0720   Epoch: 3   Global Step: 37690   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:00:16,651-Speed 3365.87 samples/sec   Loss 6.5574   LearningRate 0.0720   Epoch: 3   Global Step: 37700   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:00:19,731-Speed 3326.21 samples/sec   Loss 6.6449   LearningRate 0.0719   Epoch: 3   Global Step: 37710   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:00:22,767-Speed 3374.04 samples/sec   Loss 6.6470   LearningRate 0.0719   Epoch: 3   Global Step: 37720   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:00:25,859-Speed 3312.21 samples/sec   Loss 6.6696   LearningRate 0.0719   Epoch: 3   Global Step: 37730   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:00:28,907-Speed 3360.48 samples/sec   Loss 6.5607   LearningRate 0.0719   Epoch: 3   Global Step: 37740   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:00:31,947-Speed 3370.15 samples/sec   Loss 6.6684   LearningRate 0.0719   Epoch: 3   Global Step: 37750   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:00:34,958-Speed 3402.02 samples/sec   Loss 6.6582   LearningRate 0.0719   Epoch: 3   Global Step: 37760   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:00:37,961-Speed 3410.95 samples/sec   Loss 6.7519   LearningRate 0.0719   Epoch: 3   Global Step: 37770   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:00:40,994-Speed 3377.01 samples/sec   Loss 6.6867   LearningRate 0.0719   Epoch: 3   Global Step: 37780   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:00:44,036-Speed 3366.60 samples/sec   Loss 6.5871   LearningRate 0.0719   Epoch: 3   Global Step: 37790   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:00:47,048-Speed 3401.34 samples/sec   Loss 6.7045   LearningRate 0.0719   Epoch: 3   Global Step: 37800   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:00:50,083-Speed 3374.72 samples/sec   Loss 6.6907   LearningRate 0.0719   Epoch: 3   Global Step: 37810   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:00:53,165-Speed 3323.65 samples/sec   Loss 6.6073   LearningRate 0.0719   Epoch: 3   Global Step: 37820   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:00:56,190-Speed 3386.66 samples/sec   Loss 6.6567   LearningRate 0.0719   Epoch: 3   Global Step: 37830   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:00:59,234-Speed 3364.89 samples/sec   Loss 6.5275   LearningRate 0.0719   Epoch: 3   Global Step: 37840   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:01:02,313-Speed 3326.69 samples/sec   Loss 6.6567   LearningRate 0.0718   Epoch: 3   Global Step: 37850   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:01:05,392-Speed 3327.59 samples/sec   Loss 6.6778   LearningRate 0.0718   Epoch: 3   Global Step: 37860   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:01:08,417-Speed 3385.92 samples/sec   Loss 6.8308   LearningRate 0.0718   Epoch: 3   Global Step: 37870   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:01:11,428-Speed 3401.26 samples/sec   Loss 6.6722   LearningRate 0.0718   Epoch: 3   Global Step: 37880   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:01:14,493-Speed 3342.32 samples/sec   Loss 6.5297   LearningRate 0.0718   Epoch: 3   Global Step: 37890   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:01:17,600-Speed 3297.29 samples/sec   Loss 6.6083   LearningRate 0.0718   Epoch: 3   Global Step: 37900   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:01:20,612-Speed 3400.47 samples/sec   Loss 6.6815   LearningRate 0.0718   Epoch: 3   Global Step: 37910   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:01:23,628-Speed 3396.09 samples/sec   Loss 6.6927   LearningRate 0.0718   Epoch: 3   Global Step: 37920   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:01:26,721-Speed 3311.58 samples/sec   Loss 6.8571   LearningRate 0.0718   Epoch: 3   Global Step: 37930   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:01:29,773-Speed 3356.44 samples/sec   Loss 6.8532   LearningRate 0.0718   Epoch: 3   Global Step: 37940   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:01:32,790-Speed 3395.85 samples/sec   Loss 6.7679   LearningRate 0.0718   Epoch: 3   Global Step: 37950   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:01:35,855-Speed 3341.87 samples/sec   Loss 6.7341   LearningRate 0.0718   Epoch: 3   Global Step: 37960   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:01:38,903-Speed 3360.62 samples/sec   Loss 6.7055   LearningRate 0.0718   Epoch: 3   Global Step: 37970   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:01:41,997-Speed 3310.47 samples/sec   Loss 6.7180   LearningRate 0.0718   Epoch: 3   Global Step: 37980   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:01:45,024-Speed 3384.24 samples/sec   Loss 6.5578   LearningRate 0.0718   Epoch: 3   Global Step: 37990   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:01:48,036-Speed 3400.89 samples/sec   Loss 6.8036   LearningRate 0.0717   Epoch: 3   Global Step: 38000   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:01:51,062-Speed 3384.72 samples/sec   Loss 6.6521   LearningRate 0.0717   Epoch: 3   Global Step: 38010   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:01:54,099-Speed 3373.50 samples/sec   Loss 6.8511   LearningRate 0.0717   Epoch: 3   Global Step: 38020   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:01:57,114-Speed 3397.16 samples/sec   Loss 6.8337   LearningRate 0.0717   Epoch: 3   Global Step: 38030   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:02:00,117-Speed 3410.74 samples/sec   Loss 6.7960   LearningRate 0.0717   Epoch: 3   Global Step: 38040   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:02:03,159-Speed 3368.35 samples/sec   Loss 6.7420   LearningRate 0.0717   Epoch: 3   Global Step: 38050   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:02:06,182-Speed 3387.89 samples/sec   Loss 6.8332   LearningRate 0.0717   Epoch: 3   Global Step: 38060   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:02:09,192-Speed 3403.56 samples/sec   Loss 6.6281   LearningRate 0.0717   Epoch: 3   Global Step: 38070   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:02:12,235-Speed 3365.74 samples/sec   Loss 6.8279   LearningRate 0.0717   Epoch: 3   Global Step: 38080   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:02:15,241-Speed 3407.39 samples/sec   Loss 6.6854   LearningRate 0.0717   Epoch: 3   Global Step: 38090   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:02:18,239-Speed 3417.24 samples/sec   Loss 6.7432   LearningRate 0.0717   Epoch: 3   Global Step: 38100   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:02:21,265-Speed 3384.91 samples/sec   Loss 6.8264   LearningRate 0.0717   Epoch: 3   Global Step: 38110   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:02:24,318-Speed 3355.69 samples/sec   Loss 6.8080   LearningRate 0.0717   Epoch: 3   Global Step: 38120   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:02:27,335-Speed 3395.35 samples/sec   Loss 6.7411   LearningRate 0.0717   Epoch: 3   Global Step: 38130   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:02:30,391-Speed 3352.04 samples/sec   Loss 6.6986   LearningRate 0.0717   Epoch: 3   Global Step: 38140   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:02:33,406-Speed 3397.48 samples/sec   Loss 6.8863   LearningRate 0.0716   Epoch: 3   Global Step: 38150   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:02:36,458-Speed 3355.93 samples/sec   Loss 6.8804   LearningRate 0.0716   Epoch: 3   Global Step: 38160   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:02:39,480-Speed 3389.59 samples/sec   Loss 6.7491   LearningRate 0.0716   Epoch: 3   Global Step: 38170   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:02:42,585-Speed 3299.42 samples/sec   Loss 6.7146   LearningRate 0.0716   Epoch: 3   Global Step: 38180   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:02:45,585-Speed 3414.44 samples/sec   Loss 6.8566   LearningRate 0.0716   Epoch: 3   Global Step: 38190   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:02:48,696-Speed 3292.65 samples/sec   Loss 6.8920   LearningRate 0.0716   Epoch: 3   Global Step: 38200   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:02:51,800-Speed 3299.68 samples/sec   Loss 7.0516   LearningRate 0.0716   Epoch: 3   Global Step: 38210   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:02:54,859-Speed 3348.59 samples/sec   Loss 6.9954   LearningRate 0.0716   Epoch: 3   Global Step: 38220   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:02:57,858-Speed 3415.20 samples/sec   Loss 6.7958   LearningRate 0.0716   Epoch: 3   Global Step: 38230   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:03:00,909-Speed 3357.68 samples/sec   Loss 6.8582   LearningRate 0.0716   Epoch: 3   Global Step: 38240   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:03:04,003-Speed 3310.34 samples/sec   Loss 6.7146   LearningRate 0.0716   Epoch: 3   Global Step: 38250   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:03:07,109-Speed 3298.74 samples/sec   Loss 6.8047   LearningRate 0.0716   Epoch: 3   Global Step: 38260   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:03:10,129-Speed 3391.86 samples/sec   Loss 6.9378   LearningRate 0.0716   Epoch: 3   Global Step: 38270   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:03:13,152-Speed 3387.77 samples/sec   Loss 6.8330   LearningRate 0.0716   Epoch: 3   Global Step: 38280   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:03:16,247-Speed 3309.99 samples/sec   Loss 6.8845   LearningRate 0.0715   Epoch: 3   Global Step: 38290   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:03:19,344-Speed 3307.44 samples/sec   Loss 6.9465   LearningRate 0.0715   Epoch: 3   Global Step: 38300   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:03:22,392-Speed 3360.70 samples/sec   Loss 6.7882   LearningRate 0.0715   Epoch: 3   Global Step: 38310   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:03:25,442-Speed 3358.90 samples/sec   Loss 6.9501   LearningRate 0.0715   Epoch: 3   Global Step: 38320   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:03:28,475-Speed 3377.14 samples/sec   Loss 6.8386   LearningRate 0.0715   Epoch: 3   Global Step: 38330   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:03:31,517-Speed 3367.19 samples/sec   Loss 6.8763   LearningRate 0.0715   Epoch: 3   Global Step: 38340   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:03:34,535-Speed 3393.83 samples/sec   Loss 6.9865   LearningRate 0.0715   Epoch: 3   Global Step: 38350   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:03:37,559-Speed 3388.03 samples/sec   Loss 6.9001   LearningRate 0.0715   Epoch: 3   Global Step: 38360   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:03:40,616-Speed 3350.08 samples/sec   Loss 6.9600   LearningRate 0.0715   Epoch: 3   Global Step: 38370   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:03:43,636-Speed 3392.07 samples/sec   Loss 6.9826   LearningRate 0.0715   Epoch: 3   Global Step: 38380   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:03:46,650-Speed 3397.69 samples/sec   Loss 6.8845   LearningRate 0.0715   Epoch: 3   Global Step: 38390   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:03:49,742-Speed 3312.99 samples/sec   Loss 6.8322   LearningRate 0.0715   Epoch: 3   Global Step: 38400   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:03:52,763-Speed 3391.55 samples/sec   Loss 6.9328   LearningRate 0.0715   Epoch: 3   Global Step: 38410   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:03:55,770-Speed 3406.03 samples/sec   Loss 6.8385   LearningRate 0.0715   Epoch: 3   Global Step: 38420   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:03:58,806-Speed 3374.59 samples/sec   Loss 6.9734   LearningRate 0.0715   Epoch: 3   Global Step: 38430   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:04:01,878-Speed 3334.35 samples/sec   Loss 6.9084   LearningRate 0.0714   Epoch: 3   Global Step: 38440   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:04:04,930-Speed 3356.53 samples/sec   Loss 6.9613   LearningRate 0.0714   Epoch: 3   Global Step: 38450   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:04:07,935-Speed 3408.02 samples/sec   Loss 6.9162   LearningRate 0.0714   Epoch: 3   Global Step: 38460   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:04:10,966-Speed 3380.31 samples/sec   Loss 6.8513   LearningRate 0.0714   Epoch: 3   Global Step: 38470   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:04:14,043-Speed 3328.87 samples/sec   Loss 6.9831   LearningRate 0.0714   Epoch: 3   Global Step: 38480   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:04:17,123-Speed 3325.06 samples/sec   Loss 6.8802   LearningRate 0.0714   Epoch: 3   Global Step: 38490   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:04:20,119-Speed 3419.06 samples/sec   Loss 7.0114   LearningRate 0.0714   Epoch: 3   Global Step: 38500   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:04:23,126-Speed 3406.72 samples/sec   Loss 6.9964   LearningRate 0.0714   Epoch: 3   Global Step: 38510   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:04:26,182-Speed 3352.13 samples/sec   Loss 6.9662   LearningRate 0.0714   Epoch: 3   Global Step: 38520   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:04:29,211-Speed 3381.68 samples/sec   Loss 7.0059   LearningRate 0.0714   Epoch: 3   Global Step: 38530   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:04:32,249-Speed 3372.36 samples/sec   Loss 6.8451   LearningRate 0.0714   Epoch: 3   Global Step: 38540   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:04:35,248-Speed 3415.45 samples/sec   Loss 7.0581   LearningRate 0.0714   Epoch: 3   Global Step: 38550   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:04:38,292-Speed 3364.92 samples/sec   Loss 6.9745   LearningRate 0.0714   Epoch: 3   Global Step: 38560   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:04:41,337-Speed 3364.17 samples/sec   Loss 6.9158   LearningRate 0.0714   Epoch: 3   Global Step: 38570   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:04:44,343-Speed 3407.62 samples/sec   Loss 7.1026   LearningRate 0.0714   Epoch: 3   Global Step: 38580   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:04:47,415-Speed 3334.74 samples/sec   Loss 6.9675   LearningRate 0.0713   Epoch: 3   Global Step: 38590   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:04:50,446-Speed 3378.96 samples/sec   Loss 7.0229   LearningRate 0.0713   Epoch: 3   Global Step: 38600   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:04:53,444-Speed 3416.76 samples/sec   Loss 6.8184   LearningRate 0.0713   Epoch: 3   Global Step: 38610   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:04:56,449-Speed 3408.94 samples/sec   Loss 6.9186   LearningRate 0.0713   Epoch: 3   Global Step: 38620   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:04:59,454-Speed 3409.20 samples/sec   Loss 7.0349   LearningRate 0.0713   Epoch: 3   Global Step: 38630   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:05:02,608-Speed 3247.39 samples/sec   Loss 6.8983   LearningRate 0.0713   Epoch: 3   Global Step: 38640   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:05:05,637-Speed 3382.14 samples/sec   Loss 7.0805   LearningRate 0.0713   Epoch: 3   Global Step: 38650   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:05:08,641-Speed 3409.51 samples/sec   Loss 7.0000   LearningRate 0.0713   Epoch: 3   Global Step: 38660   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:05:11,675-Speed 3377.06 samples/sec   Loss 6.9642   LearningRate 0.0713   Epoch: 3   Global Step: 38670   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:05:14,703-Speed 3382.82 samples/sec   Loss 6.9170   LearningRate 0.0713   Epoch: 3   Global Step: 38680   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:05:17,722-Speed 3392.66 samples/sec   Loss 7.0512   LearningRate 0.0713   Epoch: 3   Global Step: 38690   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:05:20,737-Speed 3396.43 samples/sec   Loss 6.9801   LearningRate 0.0713   Epoch: 3   Global Step: 38700   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:05:23,754-Speed 3395.81 samples/sec   Loss 7.0077   LearningRate 0.0713   Epoch: 3   Global Step: 38710   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:05:26,770-Speed 3396.22 samples/sec   Loss 7.0937   LearningRate 0.0713   Epoch: 3   Global Step: 38720   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:05:29,768-Speed 3417.05 samples/sec   Loss 6.9624   LearningRate 0.0712   Epoch: 3   Global Step: 38730   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:05:32,862-Speed 3309.62 samples/sec   Loss 7.1036   LearningRate 0.0712   Epoch: 3   Global Step: 38740   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:05:35,976-Speed 3290.71 samples/sec   Loss 6.8847   LearningRate 0.0712   Epoch: 3   Global Step: 38750   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:05:39,000-Speed 3387.38 samples/sec   Loss 6.9359   LearningRate 0.0712   Epoch: 3   Global Step: 38760   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:05:42,117-Speed 3285.02 samples/sec   Loss 6.9088   LearningRate 0.0712   Epoch: 3   Global Step: 38770   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:05:45,197-Speed 3327.01 samples/sec   Loss 7.0581   LearningRate 0.0712   Epoch: 3   Global Step: 38780   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:05:48,253-Speed 3350.99 samples/sec   Loss 6.9532   LearningRate 0.0712   Epoch: 3   Global Step: 38790   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:05:51,375-Speed 3280.86 samples/sec   Loss 6.9803   LearningRate 0.0712   Epoch: 3   Global Step: 38800   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:05:54,450-Speed 3331.01 samples/sec   Loss 7.1318   LearningRate 0.0712   Epoch: 3   Global Step: 38810   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:05:57,497-Speed 3362.68 samples/sec   Loss 6.9287   LearningRate 0.0712   Epoch: 3   Global Step: 38820   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:06:00,622-Speed 3277.72 samples/sec   Loss 7.0771   LearningRate 0.0712   Epoch: 3   Global Step: 38830   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:06:03,706-Speed 3320.32 samples/sec   Loss 7.0831   LearningRate 0.0712   Epoch: 3   Global Step: 38840   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:06:06,717-Speed 3402.17 samples/sec   Loss 7.1290   LearningRate 0.0712   Epoch: 3   Global Step: 38850   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:06:09,736-Speed 3392.97 samples/sec   Loss 7.0510   LearningRate 0.0712   Epoch: 3   Global Step: 38860   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:06:12,872-Speed 3266.26 samples/sec   Loss 7.0872   LearningRate 0.0712   Epoch: 3   Global Step: 38870   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:06:15,936-Speed 3343.63 samples/sec   Loss 7.0517   LearningRate 0.0711   Epoch: 3   Global Step: 38880   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:06:18,951-Speed 3397.28 samples/sec   Loss 6.8099   LearningRate 0.0711   Epoch: 3   Global Step: 38890   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:06:21,957-Speed 3407.64 samples/sec   Loss 7.1268   LearningRate 0.0711   Epoch: 3   Global Step: 38900   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:06:25,072-Speed 3288.06 samples/sec   Loss 7.1024   LearningRate 0.0711   Epoch: 3   Global Step: 38910   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:06:28,141-Speed 3338.12 samples/sec   Loss 7.0633   LearningRate 0.0711   Epoch: 3   Global Step: 38920   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:06:31,163-Speed 3389.93 samples/sec   Loss 7.1999   LearningRate 0.0711   Epoch: 3   Global Step: 38930   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:06:34,164-Speed 3413.10 samples/sec   Loss 7.0107   LearningRate 0.0711   Epoch: 3   Global Step: 38940   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:06:37,247-Speed 3322.24 samples/sec   Loss 7.0417   LearningRate 0.0711   Epoch: 3   Global Step: 38950   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:06:40,312-Speed 3341.56 samples/sec   Loss 7.0254   LearningRate 0.0711   Epoch: 3   Global Step: 38960   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:06:43,347-Speed 3375.50 samples/sec   Loss 7.1480   LearningRate 0.0711   Epoch: 3   Global Step: 38970   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:06:46,398-Speed 3357.29 samples/sec   Loss 7.0167   LearningRate 0.0711   Epoch: 3   Global Step: 38980   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:06:49,434-Speed 3374.05 samples/sec   Loss 7.0609   LearningRate 0.0711   Epoch: 3   Global Step: 38990   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:06:52,448-Speed 3398.19 samples/sec   Loss 7.0680   LearningRate 0.0711   Epoch: 3   Global Step: 39000   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:06:55,474-Speed 3386.08 samples/sec   Loss 7.0802   LearningRate 0.0711   Epoch: 3   Global Step: 39010   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:06:58,520-Speed 3362.65 samples/sec   Loss 7.0458   LearningRate 0.0711   Epoch: 3   Global Step: 39020   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:07:01,558-Speed 3372.06 samples/sec   Loss 7.1194   LearningRate 0.0710   Epoch: 3   Global Step: 39030   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:07:04,645-Speed 3318.19 samples/sec   Loss 7.1511   LearningRate 0.0710   Epoch: 3   Global Step: 39040   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:07:07,733-Speed 3317.47 samples/sec   Loss 7.1470   LearningRate 0.0710   Epoch: 3   Global Step: 39050   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:07:10,781-Speed 3360.25 samples/sec   Loss 7.1126   LearningRate 0.0710   Epoch: 3   Global Step: 39060   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:07:13,823-Speed 3366.98 samples/sec   Loss 7.1315   LearningRate 0.0710   Epoch: 3   Global Step: 39070   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:07:16,846-Speed 3389.17 samples/sec   Loss 7.1343   LearningRate 0.0710   Epoch: 3   Global Step: 39080   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:07:19,883-Speed 3371.61 samples/sec   Loss 7.1201   LearningRate 0.0710   Epoch: 3   Global Step: 39090   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:07:22,919-Speed 3374.85 samples/sec   Loss 7.0655   LearningRate 0.0710   Epoch: 3   Global Step: 39100   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:07:25,958-Speed 3370.32 samples/sec   Loss 7.0654   LearningRate 0.0710   Epoch: 3   Global Step: 39110   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:07:29,031-Speed 3332.96 samples/sec   Loss 7.1535   LearningRate 0.0710   Epoch: 3   Global Step: 39120   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:07:32,074-Speed 3366.17 samples/sec   Loss 7.2027   LearningRate 0.0710   Epoch: 3   Global Step: 39130   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:07:35,146-Speed 3335.32 samples/sec   Loss 7.2021   LearningRate 0.0710   Epoch: 3   Global Step: 39140   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:07:38,217-Speed 3334.88 samples/sec   Loss 7.1434   LearningRate 0.0710   Epoch: 3   Global Step: 39150   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:07:41,236-Speed 3393.29 samples/sec   Loss 7.1450   LearningRate 0.0710   Epoch: 3   Global Step: 39160   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:07:44,269-Speed 3377.35 samples/sec   Loss 7.0762   LearningRate 0.0710   Epoch: 3   Global Step: 39170   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:07:47,303-Speed 3376.72 samples/sec   Loss 7.2894   LearningRate 0.0709   Epoch: 3   Global Step: 39180   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:07:50,322-Speed 3392.94 samples/sec   Loss 7.0737   LearningRate 0.0709   Epoch: 3   Global Step: 39190   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:07:53,438-Speed 3287.34 samples/sec   Loss 7.2241   LearningRate 0.0709   Epoch: 3   Global Step: 39200   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:07:56,497-Speed 3348.05 samples/sec   Loss 7.3175   LearningRate 0.0709   Epoch: 3   Global Step: 39210   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:07:59,520-Speed 3388.75 samples/sec   Loss 7.0297   LearningRate 0.0709   Epoch: 3   Global Step: 39220   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:08:02,566-Speed 3362.92 samples/sec   Loss 7.1280   LearningRate 0.0709   Epoch: 3   Global Step: 39230   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:08:05,600-Speed 3375.83 samples/sec   Loss 7.1316   LearningRate 0.0709   Epoch: 3   Global Step: 39240   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:08:08,613-Speed 3399.52 samples/sec   Loss 7.1666   LearningRate 0.0709   Epoch: 3   Global Step: 39250   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:08:11,713-Speed 3304.66 samples/sec   Loss 7.0208   LearningRate 0.0709   Epoch: 3   Global Step: 39260   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:08:14,760-Speed 3362.65 samples/sec   Loss 7.1294   LearningRate 0.0709   Epoch: 3   Global Step: 39270   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:08:17,801-Speed 3367.69 samples/sec   Loss 7.2245   LearningRate 0.0709   Epoch: 3   Global Step: 39280   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:08:20,813-Speed 3400.66 samples/sec   Loss 7.1484   LearningRate 0.0709   Epoch: 3   Global Step: 39290   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:08:23,838-Speed 3386.13 samples/sec   Loss 7.2545   LearningRate 0.0709   Epoch: 3   Global Step: 39300   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:08:26,915-Speed 3329.94 samples/sec   Loss 7.2535   LearningRate 0.0709   Epoch: 3   Global Step: 39310   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:08:29,943-Speed 3382.29 samples/sec   Loss 7.1803   LearningRate 0.0708   Epoch: 3   Global Step: 39320   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:08:32,992-Speed 3360.63 samples/sec   Loss 7.1677   LearningRate 0.0708   Epoch: 3   Global Step: 39330   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:08:36,028-Speed 3373.83 samples/sec   Loss 7.1557   LearningRate 0.0708   Epoch: 3   Global Step: 39340   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:08:39,063-Speed 3374.74 samples/sec   Loss 7.1445   LearningRate 0.0708   Epoch: 3   Global Step: 39350   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:08:42,088-Speed 3385.51 samples/sec   Loss 7.0983   LearningRate 0.0708   Epoch: 3   Global Step: 39360   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:08:45,111-Speed 3389.42 samples/sec   Loss 7.0915   LearningRate 0.0708   Epoch: 3   Global Step: 39370   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:08:48,134-Speed 3388.68 samples/sec   Loss 7.1408   LearningRate 0.0708   Epoch: 3   Global Step: 39380   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:08:51,177-Speed 3365.49 samples/sec   Loss 7.0480   LearningRate 0.0708   Epoch: 3   Global Step: 39390   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:08:54,284-Speed 3297.29 samples/sec   Loss 7.1117   LearningRate 0.0708   Epoch: 3   Global Step: 39400   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:08:57,292-Speed 3405.00 samples/sec   Loss 7.2726   LearningRate 0.0708   Epoch: 3   Global Step: 39410   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:09:00,369-Speed 3328.98 samples/sec   Loss 7.2726   LearningRate 0.0708   Epoch: 3   Global Step: 39420   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:09:03,424-Speed 3352.73 samples/sec   Loss 7.2235   LearningRate 0.0708   Epoch: 3   Global Step: 39430   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:09:06,452-Speed 3383.20 samples/sec   Loss 7.3084   LearningRate 0.0708   Epoch: 3   Global Step: 39440   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:09:09,494-Speed 3366.71 samples/sec   Loss 7.1958   LearningRate 0.0708   Epoch: 3   Global Step: 39450   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:09:12,550-Speed 3351.61 samples/sec   Loss 7.2227   LearningRate 0.0708   Epoch: 3   Global Step: 39460   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:09:15,622-Speed 3334.74 samples/sec   Loss 7.2426   LearningRate 0.0707   Epoch: 3   Global Step: 39470   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:09:18,658-Speed 3374.58 samples/sec   Loss 7.2172   LearningRate 0.0707   Epoch: 3   Global Step: 39480   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:09:21,688-Speed 3380.43 samples/sec   Loss 7.2150   LearningRate 0.0707   Epoch: 3   Global Step: 39490   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:09:24,804-Speed 3287.06 samples/sec   Loss 7.2724   LearningRate 0.0707   Epoch: 3   Global Step: 39500   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:09:27,893-Speed 3316.17 samples/sec   Loss 7.1994   LearningRate 0.0707   Epoch: 3   Global Step: 39510   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:09:30,970-Speed 3328.20 samples/sec   Loss 7.2005   LearningRate 0.0707   Epoch: 3   Global Step: 39520   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:09:33,991-Speed 3391.05 samples/sec   Loss 7.2639   LearningRate 0.0707   Epoch: 3   Global Step: 39530   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:09:37,065-Speed 3332.13 samples/sec   Loss 7.2953   LearningRate 0.0707   Epoch: 3   Global Step: 39540   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:09:40,110-Speed 3364.54 samples/sec   Loss 7.1127   LearningRate 0.0707   Epoch: 3   Global Step: 39550   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:09:43,188-Speed 3327.96 samples/sec   Loss 7.1734   LearningRate 0.0707   Epoch: 3   Global Step: 39560   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:09:46,218-Speed 3380.06 samples/sec   Loss 7.2029   LearningRate 0.0707   Epoch: 3   Global Step: 39570   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:09:49,297-Speed 3327.36 samples/sec   Loss 7.3378   LearningRate 0.0707   Epoch: 3   Global Step: 39580   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:09:52,366-Speed 3337.61 samples/sec   Loss 7.1864   LearningRate 0.0707   Epoch: 3   Global Step: 39590   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:09:55,407-Speed 3368.64 samples/sec   Loss 7.1359   LearningRate 0.0707   Epoch: 3   Global Step: 39600   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:09:58,448-Speed 3368.08 samples/sec   Loss 7.0892   LearningRate 0.0707   Epoch: 3   Global Step: 39610   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:10:01,525-Speed 3329.15 samples/sec   Loss 7.1243   LearningRate 0.0706   Epoch: 3   Global Step: 39620   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:10:04,639-Speed 3289.21 samples/sec   Loss 7.3766   LearningRate 0.0706   Epoch: 3   Global Step: 39630   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:10:07,685-Speed 3363.59 samples/sec   Loss 7.2885   LearningRate 0.0706   Epoch: 3   Global Step: 39640   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:10:10,761-Speed 3329.66 samples/sec   Loss 7.2023   LearningRate 0.0706   Epoch: 3   Global Step: 39650   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:10:13,822-Speed 3347.09 samples/sec   Loss 7.1154   LearningRate 0.0706   Epoch: 3   Global Step: 39660   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:10:16,883-Speed 3346.62 samples/sec   Loss 7.2482   LearningRate 0.0706   Epoch: 3   Global Step: 39670   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:10:19,916-Speed 3376.55 samples/sec   Loss 7.0908   LearningRate 0.0706   Epoch: 3   Global Step: 39680   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:10:22,951-Speed 3375.96 samples/sec   Loss 7.2986   LearningRate 0.0706   Epoch: 3   Global Step: 39690   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:10:26,030-Speed 3326.32 samples/sec   Loss 7.2866   LearningRate 0.0706   Epoch: 3   Global Step: 39700   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:10:29,094-Speed 3342.92 samples/sec   Loss 7.0712   LearningRate 0.0706   Epoch: 3   Global Step: 39710   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:10:32,157-Speed 3345.02 samples/sec   Loss 7.1516   LearningRate 0.0706   Epoch: 3   Global Step: 39720   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:10:35,195-Speed 3371.86 samples/sec   Loss 7.2221   LearningRate 0.0706   Epoch: 3   Global Step: 39730   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:10:38,223-Speed 3382.38 samples/sec   Loss 7.1954   LearningRate 0.0706   Epoch: 3   Global Step: 39740   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:10:41,281-Speed 3350.05 samples/sec   Loss 7.0743   LearningRate 0.0706   Epoch: 3   Global Step: 39750   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:10:44,300-Speed 3392.67 samples/sec   Loss 7.2318   LearningRate 0.0706   Epoch: 3   Global Step: 39760   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:10:47,322-Speed 3390.01 samples/sec   Loss 7.1869   LearningRate 0.0705   Epoch: 3   Global Step: 39770   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:10:50,362-Speed 3369.97 samples/sec   Loss 7.1951   LearningRate 0.0705   Epoch: 3   Global Step: 39780   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:10:53,443-Speed 3324.08 samples/sec   Loss 7.2607   LearningRate 0.0705   Epoch: 3   Global Step: 39790   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:10:56,494-Speed 3358.37 samples/sec   Loss 7.2727   LearningRate 0.0705   Epoch: 3   Global Step: 39800   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:10:59,542-Speed 3360.26 samples/sec   Loss 7.2118   LearningRate 0.0705   Epoch: 3   Global Step: 39810   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:11:02,613-Speed 3335.22 samples/sec   Loss 7.3320   LearningRate 0.0705   Epoch: 3   Global Step: 39820   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:11:05,681-Speed 3338.79 samples/sec   Loss 7.2689   LearningRate 0.0705   Epoch: 3   Global Step: 39830   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:11:08,686-Speed 3409.22 samples/sec   Loss 7.2350   LearningRate 0.0705   Epoch: 3   Global Step: 39840   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:11:11,748-Speed 3345.06 samples/sec   Loss 7.2613   LearningRate 0.0705   Epoch: 3   Global Step: 39850   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:11:14,828-Speed 3326.09 samples/sec   Loss 7.3135   LearningRate 0.0705   Epoch: 3   Global Step: 39860   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:11:17,904-Speed 3329.48 samples/sec   Loss 7.2880   LearningRate 0.0705   Epoch: 3   Global Step: 39870   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:11:20,963-Speed 3348.90 samples/sec   Loss 7.2449   LearningRate 0.0705   Epoch: 3   Global Step: 39880   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:11:24,007-Speed 3364.99 samples/sec   Loss 7.2691   LearningRate 0.0705   Epoch: 3   Global Step: 39890   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:11:27,043-Speed 3373.84 samples/sec   Loss 7.2083   LearningRate 0.0705   Epoch: 3   Global Step: 39900   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:11:30,093-Speed 3358.52 samples/sec   Loss 7.2380   LearningRate 0.0704   Epoch: 3   Global Step: 39910   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:11:33,113-Speed 3392.56 samples/sec   Loss 7.2357   LearningRate 0.0704   Epoch: 3   Global Step: 39920   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:11:36,157-Speed 3365.53 samples/sec   Loss 7.2618   LearningRate 0.0704   Epoch: 3   Global Step: 39930   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:11:39,230-Speed 3332.25 samples/sec   Loss 7.2662   LearningRate 0.0704   Epoch: 3   Global Step: 39940   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:11:42,275-Speed 3364.25 samples/sec   Loss 7.3239   LearningRate 0.0704   Epoch: 3   Global Step: 39950   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:11:45,307-Speed 3379.25 samples/sec   Loss 7.3634   LearningRate 0.0704   Epoch: 3   Global Step: 39960   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:11:48,425-Speed 3285.13 samples/sec   Loss 7.3175   LearningRate 0.0704   Epoch: 3   Global Step: 39970   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:11:51,496-Speed 3335.56 samples/sec   Loss 7.3468   LearningRate 0.0704   Epoch: 3   Global Step: 39980   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:11:54,559-Speed 3343.71 samples/sec   Loss 7.3210   LearningRate 0.0704   Epoch: 3   Global Step: 39990   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:11:57,616-Speed 3351.26 samples/sec   Loss 7.3275   LearningRate 0.0704   Epoch: 3   Global Step: 40000   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:12:00,718-Speed 3301.37 samples/sec   Loss 7.3448   LearningRate 0.0704   Epoch: 3   Global Step: 40010   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:12:03,799-Speed 3324.75 samples/sec   Loss 7.2822   LearningRate 0.0704   Epoch: 3   Global Step: 40020   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:12:06,822-Speed 3389.45 samples/sec   Loss 7.3913   LearningRate 0.0704   Epoch: 3   Global Step: 40030   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:12:09,828-Speed 3406.51 samples/sec   Loss 7.3396   LearningRate 0.0704   Epoch: 3   Global Step: 40040   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:12:12,866-Speed 3372.12 samples/sec   Loss 7.4412   LearningRate 0.0704   Epoch: 3   Global Step: 40050   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:12:15,910-Speed 3365.40 samples/sec   Loss 7.2908   LearningRate 0.0703   Epoch: 3   Global Step: 40060   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:12:19,002-Speed 3312.12 samples/sec   Loss 7.3386   LearningRate 0.0703   Epoch: 3   Global Step: 40070   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:12:22,035-Speed 3377.60 samples/sec   Loss 7.3073   LearningRate 0.0703   Epoch: 3   Global Step: 40080   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:12:25,056-Speed 3391.50 samples/sec   Loss 7.3041   LearningRate 0.0703   Epoch: 3   Global Step: 40090   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:12:28,157-Speed 3302.38 samples/sec   Loss 7.2208   LearningRate 0.0703   Epoch: 3   Global Step: 40100   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:12:31,206-Speed 3359.66 samples/sec   Loss 7.3804   LearningRate 0.0703   Epoch: 3   Global Step: 40110   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:12:34,211-Speed 3408.55 samples/sec   Loss 7.2549   LearningRate 0.0703   Epoch: 3   Global Step: 40120   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:12:37,252-Speed 3368.08 samples/sec   Loss 7.2604   LearningRate 0.0703   Epoch: 3   Global Step: 40130   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:12:40,279-Speed 3384.57 samples/sec   Loss 7.3978   LearningRate 0.0703   Epoch: 3   Global Step: 40140   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:12:43,300-Speed 3390.95 samples/sec   Loss 7.1860   LearningRate 0.0703   Epoch: 3   Global Step: 40150   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:12:46,324-Speed 3387.07 samples/sec   Loss 7.2139   LearningRate 0.0703   Epoch: 3   Global Step: 40160   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:12:49,324-Speed 3414.44 samples/sec   Loss 7.3920   LearningRate 0.0703   Epoch: 3   Global Step: 40170   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:12:52,424-Speed 3303.62 samples/sec   Loss 7.3568   LearningRate 0.0703   Epoch: 3   Global Step: 40180   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:12:55,461-Speed 3372.88 samples/sec   Loss 7.2799   LearningRate 0.0703   Epoch: 3   Global Step: 40190   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:12:58,472-Speed 3402.59 samples/sec   Loss 7.3874   LearningRate 0.0703   Epoch: 3   Global Step: 40200   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:13:01,499-Speed 3384.12 samples/sec   Loss 7.4245   LearningRate 0.0702   Epoch: 3   Global Step: 40210   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:13:04,601-Speed 3301.66 samples/sec   Loss 7.3009   LearningRate 0.0702   Epoch: 3   Global Step: 40220   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:13:07,679-Speed 3328.67 samples/sec   Loss 7.3883   LearningRate 0.0702   Epoch: 3   Global Step: 40230   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:13:10,665-Speed 3430.22 samples/sec   Loss 7.3112   LearningRate 0.0702   Epoch: 3   Global Step: 40240   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:13:13,658-Speed 3421.80 samples/sec   Loss 7.2554   LearningRate 0.0702   Epoch: 3   Global Step: 40250   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:13:16,688-Speed 3381.43 samples/sec   Loss 7.3997   LearningRate 0.0702   Epoch: 3   Global Step: 40260   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:13:19,756-Speed 3338.28 samples/sec   Loss 7.2916   LearningRate 0.0702   Epoch: 3   Global Step: 40270   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:13:22,768-Speed 3401.46 samples/sec   Loss 7.3065   LearningRate 0.0702   Epoch: 3   Global Step: 40280   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:13:25,847-Speed 3326.37 samples/sec   Loss 7.4662   LearningRate 0.0702   Epoch: 3   Global Step: 40290   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:13:28,911-Speed 3343.29 samples/sec   Loss 7.3237   LearningRate 0.0702   Epoch: 3   Global Step: 40300   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:13:31,929-Speed 3394.21 samples/sec   Loss 7.3286   LearningRate 0.0702   Epoch: 3   Global Step: 40310   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:13:34,989-Speed 3346.90 samples/sec   Loss 7.2630   LearningRate 0.0702   Epoch: 3   Global Step: 40320   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:13:38,108-Speed 3283.94 samples/sec   Loss 7.2817   LearningRate 0.0702   Epoch: 3   Global Step: 40330   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:13:41,164-Speed 3352.10 samples/sec   Loss 7.4884   LearningRate 0.0702   Epoch: 3   Global Step: 40340   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:13:44,165-Speed 3413.27 samples/sec   Loss 7.3889   LearningRate 0.0702   Epoch: 3   Global Step: 40350   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:13:47,259-Speed 3311.46 samples/sec   Loss 7.3800   LearningRate 0.0701   Epoch: 3   Global Step: 40360   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:13:50,371-Speed 3291.00 samples/sec   Loss 7.3577   LearningRate 0.0701   Epoch: 3   Global Step: 40370   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:13:53,375-Speed 3410.37 samples/sec   Loss 7.3098   LearningRate 0.0701   Epoch: 3   Global Step: 40380   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:13:56,405-Speed 3380.24 samples/sec   Loss 7.4271   LearningRate 0.0701   Epoch: 3   Global Step: 40390   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:13:59,432-Speed 3383.81 samples/sec   Loss 7.3012   LearningRate 0.0701   Epoch: 3   Global Step: 40400   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:14:02,467-Speed 3375.86 samples/sec   Loss 7.3924   LearningRate 0.0701   Epoch: 3   Global Step: 40410   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:14:05,542-Speed 3330.25 samples/sec   Loss 7.5183   LearningRate 0.0701   Epoch: 3   Global Step: 40420   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:14:08,574-Speed 3378.57 samples/sec   Loss 7.3273   LearningRate 0.0701   Epoch: 3   Global Step: 40430   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:14:11,617-Speed 3366.70 samples/sec   Loss 7.3285   LearningRate 0.0701   Epoch: 3   Global Step: 40440   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:14:14,659-Speed 3367.55 samples/sec   Loss 7.3385   LearningRate 0.0701   Epoch: 3   Global Step: 40450   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:14:17,671-Speed 3400.34 samples/sec   Loss 7.3563   LearningRate 0.0701   Epoch: 3   Global Step: 40460   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:14:20,691-Speed 3392.39 samples/sec   Loss 7.3344   LearningRate 0.0701   Epoch: 3   Global Step: 40470   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:14:23,740-Speed 3359.34 samples/sec   Loss 7.3990   LearningRate 0.0701   Epoch: 3   Global Step: 40480   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:14:26,763-Speed 3388.29 samples/sec   Loss 7.3636   LearningRate 0.0701   Epoch: 3   Global Step: 40490   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:14:29,820-Speed 3350.86 samples/sec   Loss 7.1819   LearningRate 0.0701   Epoch: 3   Global Step: 40500   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:14:32,821-Speed 3413.34 samples/sec   Loss 7.4274   LearningRate 0.0700   Epoch: 3   Global Step: 40510   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:14:35,843-Speed 3390.19 samples/sec   Loss 7.4308   LearningRate 0.0700   Epoch: 3   Global Step: 40520   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:14:38,850-Speed 3406.60 samples/sec   Loss 7.3274   LearningRate 0.0700   Epoch: 3   Global Step: 40530   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:14:41,873-Speed 3388.25 samples/sec   Loss 7.4597   LearningRate 0.0700   Epoch: 3   Global Step: 40540   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:14:44,887-Speed 3398.39 samples/sec   Loss 7.3244   LearningRate 0.0700   Epoch: 3   Global Step: 40550   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:14:47,914-Speed 3384.15 samples/sec   Loss 7.3727   LearningRate 0.0700   Epoch: 3   Global Step: 40560   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:14:51,007-Speed 3310.95 samples/sec   Loss 7.3667   LearningRate 0.0700   Epoch: 3   Global Step: 40570   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:14:54,177-Speed 3231.32 samples/sec   Loss 7.3266   LearningRate 0.0700   Epoch: 3   Global Step: 40580   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:14:57,189-Speed 3401.09 samples/sec   Loss 7.2815   LearningRate 0.0700   Epoch: 3   Global Step: 40590   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:15:00,220-Speed 3380.29 samples/sec   Loss 7.4314   LearningRate 0.0700   Epoch: 3   Global Step: 40600   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:15:03,258-Speed 3371.79 samples/sec   Loss 7.4728   LearningRate 0.0700   Epoch: 3   Global Step: 40610   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:15:06,253-Speed 3419.37 samples/sec   Loss 7.5442   LearningRate 0.0700   Epoch: 3   Global Step: 40620   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:15:09,282-Speed 3382.11 samples/sec   Loss 7.3095   LearningRate 0.0700   Epoch: 3   Global Step: 40630   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:15:12,314-Speed 3392.25 samples/sec   Loss 7.3903   LearningRate 0.0700   Epoch: 3   Global Step: 40640   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:15:15,351-Speed 3373.54 samples/sec   Loss 7.4116   LearningRate 0.0700   Epoch: 3   Global Step: 40650   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:15:18,362-Speed 3400.99 samples/sec   Loss 7.4677   LearningRate 0.0699   Epoch: 3   Global Step: 40660   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:15:21,372-Speed 3403.81 samples/sec   Loss 7.3188   LearningRate 0.0699   Epoch: 3   Global Step: 40670   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:15:24,391-Speed 3393.23 samples/sec   Loss 7.3176   LearningRate 0.0699   Epoch: 3   Global Step: 40680   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:15:27,523-Speed 3269.57 samples/sec   Loss 7.3798   LearningRate 0.0699   Epoch: 3   Global Step: 40690   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:15:30,581-Speed 3350.38 samples/sec   Loss 7.4430   LearningRate 0.0699   Epoch: 3   Global Step: 40700   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:15:33,604-Speed 3387.95 samples/sec   Loss 7.4569   LearningRate 0.0699   Epoch: 3   Global Step: 40710   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:15:36,675-Speed 3335.04 samples/sec   Loss 7.3052   LearningRate 0.0699   Epoch: 3   Global Step: 40720   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:15:39,717-Speed 3367.90 samples/sec   Loss 7.3361   LearningRate 0.0699   Epoch: 3   Global Step: 40730   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:15:42,800-Speed 3322.47 samples/sec   Loss 7.3021   LearningRate 0.0699   Epoch: 3   Global Step: 40740   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:15:45,811-Speed 3402.61 samples/sec   Loss 7.3124   LearningRate 0.0699   Epoch: 3   Global Step: 40750   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:15:48,834-Speed 3387.82 samples/sec   Loss 7.4247   LearningRate 0.0699   Epoch: 3   Global Step: 40760   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:15:51,863-Speed 3381.40 samples/sec   Loss 7.3774   LearningRate 0.0699   Epoch: 3   Global Step: 40770   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:15:54,965-Speed 3302.92 samples/sec   Loss 7.3460   LearningRate 0.0699   Epoch: 3   Global Step: 40780   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:15:57,973-Speed 3405.98 samples/sec   Loss 7.3481   LearningRate 0.0699   Epoch: 3   Global Step: 40790   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:16:01,075-Speed 3301.35 samples/sec   Loss 7.3711   LearningRate 0.0698   Epoch: 3   Global Step: 40800   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:16:04,118-Speed 3365.90 samples/sec   Loss 7.3829   LearningRate 0.0698   Epoch: 3   Global Step: 40810   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:16:07,151-Speed 3378.18 samples/sec   Loss 7.4504   LearningRate 0.0698   Epoch: 3   Global Step: 40820   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:16:10,150-Speed 3415.46 samples/sec   Loss 7.3051   LearningRate 0.0698   Epoch: 3   Global Step: 40830   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:16:13,251-Speed 3302.92 samples/sec   Loss 7.3693   LearningRate 0.0698   Epoch: 3   Global Step: 40840   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:16:16,285-Speed 3375.56 samples/sec   Loss 7.3997   LearningRate 0.0698   Epoch: 3   Global Step: 40850   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:16:19,332-Speed 3362.11 samples/sec   Loss 7.3659   LearningRate 0.0698   Epoch: 3   Global Step: 40860   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:16:22,363-Speed 3379.39 samples/sec   Loss 7.5331   LearningRate 0.0698   Epoch: 3   Global Step: 40870   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:16:25,425-Speed 3345.69 samples/sec   Loss 7.3871   LearningRate 0.0698   Epoch: 3   Global Step: 40880   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:16:28,480-Speed 3352.97 samples/sec   Loss 7.3428   LearningRate 0.0698   Epoch: 3   Global Step: 40890   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:16:31,509-Speed 3381.04 samples/sec   Loss 7.3952   LearningRate 0.0698   Epoch: 3   Global Step: 40900   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:16:34,530-Speed 3390.88 samples/sec   Loss 7.4604   LearningRate 0.0698   Epoch: 3   Global Step: 40910   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:16:37,559-Speed 3381.12 samples/sec   Loss 7.4344   LearningRate 0.0698   Epoch: 3   Global Step: 40920   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:16:40,606-Speed 3362.66 samples/sec   Loss 7.4208   LearningRate 0.0698   Epoch: 3   Global Step: 40930   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:16:43,623-Speed 3395.13 samples/sec   Loss 7.4489   LearningRate 0.0698   Epoch: 3   Global Step: 40940   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:16:46,683-Speed 3346.73 samples/sec   Loss 7.2648   LearningRate 0.0697   Epoch: 3   Global Step: 40950   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:16:49,705-Speed 3389.50 samples/sec   Loss 7.4111   LearningRate 0.0697   Epoch: 3   Global Step: 40960   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:16:52,801-Speed 3309.08 samples/sec   Loss 7.4302   LearningRate 0.0697   Epoch: 3   Global Step: 40970   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:16:55,846-Speed 3363.97 samples/sec   Loss 7.3331   LearningRate 0.0697   Epoch: 3   Global Step: 40980   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:16:58,862-Speed 3395.75 samples/sec   Loss 7.4130   LearningRate 0.0697   Epoch: 3   Global Step: 40990   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:17:01,918-Speed 3352.04 samples/sec   Loss 7.4413   LearningRate 0.0697   Epoch: 3   Global Step: 41000   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:17:04,926-Speed 3405.39 samples/sec   Loss 7.4067   LearningRate 0.0697   Epoch: 3   Global Step: 41010   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:17:07,945-Speed 3392.98 samples/sec   Loss 7.3236   LearningRate 0.0697   Epoch: 3   Global Step: 41020   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:17:10,990-Speed 3364.26 samples/sec   Loss 7.4325   LearningRate 0.0697   Epoch: 3   Global Step: 41030   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:17:14,062-Speed 3334.47 samples/sec   Loss 7.3221   LearningRate 0.0697   Epoch: 3   Global Step: 41040   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:17:17,068-Speed 3407.52 samples/sec   Loss 7.4994   LearningRate 0.0697   Epoch: 3   Global Step: 41050   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:17:20,103-Speed 3375.58 samples/sec   Loss 7.4331   LearningRate 0.0697   Epoch: 3   Global Step: 41060   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:17:23,130-Speed 3383.18 samples/sec   Loss 7.4829   LearningRate 0.0697   Epoch: 3   Global Step: 41070   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:17:26,188-Speed 3350.65 samples/sec   Loss 7.4745   LearningRate 0.0697   Epoch: 3   Global Step: 41080   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:17:29,219-Speed 3379.54 samples/sec   Loss 7.4439   LearningRate 0.0697   Epoch: 3   Global Step: 41090   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:17:32,232-Speed 3399.67 samples/sec   Loss 7.4367   LearningRate 0.0696   Epoch: 3   Global Step: 41100   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:17:35,225-Speed 3421.78 samples/sec   Loss 7.5124   LearningRate 0.0696   Epoch: 3   Global Step: 41110   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:17:38,243-Speed 3394.11 samples/sec   Loss 7.4295   LearningRate 0.0696   Epoch: 3   Global Step: 41120   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:17:41,314-Speed 3335.41 samples/sec   Loss 7.5225   LearningRate 0.0696   Epoch: 3   Global Step: 41130   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:17:44,367-Speed 3355.06 samples/sec   Loss 7.4652   LearningRate 0.0696   Epoch: 3   Global Step: 41140   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:17:47,400-Speed 3377.29 samples/sec   Loss 7.3932   LearningRate 0.0696   Epoch: 3   Global Step: 41150   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:17:50,427-Speed 3383.52 samples/sec   Loss 7.4099   LearningRate 0.0696   Epoch: 3   Global Step: 41160   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:17:53,530-Speed 3302.02 samples/sec   Loss 7.4315   LearningRate 0.0696   Epoch: 3   Global Step: 41170   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:17:56,580-Speed 3358.50 samples/sec   Loss 7.4492   LearningRate 0.0696   Epoch: 3   Global Step: 41180   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:17:59,638-Speed 3349.51 samples/sec   Loss 7.4799   LearningRate 0.0696   Epoch: 3   Global Step: 41190   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:18:02,663-Speed 3385.80 samples/sec   Loss 7.4391   LearningRate 0.0696   Epoch: 3   Global Step: 41200   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:18:05,661-Speed 3417.95 samples/sec   Loss 7.3309   LearningRate 0.0696   Epoch: 3   Global Step: 41210   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:18:08,660-Speed 3415.19 samples/sec   Loss 7.4694   LearningRate 0.0696   Epoch: 3   Global Step: 41220   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:18:11,718-Speed 3349.64 samples/sec   Loss 7.4011   LearningRate 0.0696   Epoch: 3   Global Step: 41230   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:18:14,840-Speed 3281.05 samples/sec   Loss 7.4208   LearningRate 0.0696   Epoch: 3   Global Step: 41240   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:18:17,925-Speed 3320.24 samples/sec   Loss 7.5390   LearningRate 0.0695   Epoch: 3   Global Step: 41250   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:18:20,943-Speed 3392.86 samples/sec   Loss 7.4781   LearningRate 0.0695   Epoch: 3   Global Step: 41260   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:18:23,987-Speed 3365.66 samples/sec   Loss 7.5924   LearningRate 0.0695   Epoch: 3   Global Step: 41270   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:18:27,057-Speed 3336.19 samples/sec   Loss 7.4893   LearningRate 0.0695   Epoch: 3   Global Step: 41280   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:18:30,109-Speed 3356.16 samples/sec   Loss 7.3124   LearningRate 0.0695   Epoch: 3   Global Step: 41290   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:18:33,142-Speed 3377.28 samples/sec   Loss 7.4304   LearningRate 0.0695   Epoch: 3   Global Step: 41300   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:18:36,176-Speed 3376.24 samples/sec   Loss 7.2853   LearningRate 0.0695   Epoch: 3   Global Step: 41310   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:18:39,215-Speed 3370.46 samples/sec   Loss 7.3445   LearningRate 0.0695   Epoch: 3   Global Step: 41320   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:18:42,277-Speed 3345.93 samples/sec   Loss 7.4497   LearningRate 0.0695   Epoch: 3   Global Step: 41330   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:18:45,300-Speed 3388.72 samples/sec   Loss 7.3265   LearningRate 0.0695   Epoch: 3   Global Step: 41340   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:18:48,367-Speed 3340.08 samples/sec   Loss 7.4488   LearningRate 0.0695   Epoch: 3   Global Step: 41350   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:18:51,436-Speed 3337.97 samples/sec   Loss 7.4645   LearningRate 0.0695   Epoch: 3   Global Step: 41360   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:18:54,458-Speed 3388.62 samples/sec   Loss 7.4714   LearningRate 0.0695   Epoch: 3   Global Step: 41370   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:18:57,466-Speed 3406.08 samples/sec   Loss 7.5000   LearningRate 0.0695   Epoch: 3   Global Step: 41380   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:19:00,570-Speed 3299.60 samples/sec   Loss 7.4507   LearningRate 0.0695   Epoch: 3   Global Step: 41390   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:19:03,665-Speed 3310.43 samples/sec   Loss 7.5480   LearningRate 0.0694   Epoch: 3   Global Step: 41400   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:19:06,719-Speed 3353.45 samples/sec   Loss 7.3450   LearningRate 0.0694   Epoch: 3   Global Step: 41410   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:19:09,746-Speed 3384.95 samples/sec   Loss 7.4509   LearningRate 0.0694   Epoch: 3   Global Step: 41420   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:19:12,797-Speed 3356.69 samples/sec   Loss 7.4823   LearningRate 0.0694   Epoch: 3   Global Step: 41430   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:19:15,955-Speed 3244.20 samples/sec   Loss 7.4337   LearningRate 0.0694   Epoch: 3   Global Step: 41440   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:19:19,029-Speed 3331.54 samples/sec   Loss 7.5148   LearningRate 0.0694   Epoch: 3   Global Step: 41450   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:19:22,028-Speed 3415.38 samples/sec   Loss 7.3839   LearningRate 0.0694   Epoch: 3   Global Step: 41460   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:19:25,059-Speed 3380.45 samples/sec   Loss 7.4199   LearningRate 0.0694   Epoch: 3   Global Step: 41470   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:19:28,145-Speed 3319.39 samples/sec   Loss 7.5400   LearningRate 0.0694   Epoch: 3   Global Step: 41480   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:19:31,209-Speed 3343.31 samples/sec   Loss 7.4599   LearningRate 0.0694   Epoch: 3   Global Step: 41490   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:19:34,236-Speed 3383.07 samples/sec   Loss 7.4237   LearningRate 0.0694   Epoch: 3   Global Step: 41500   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:19:37,299-Speed 3343.90 samples/sec   Loss 7.4308   LearningRate 0.0694   Epoch: 3   Global Step: 41510   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:19:40,381-Speed 3324.56 samples/sec   Loss 7.4493   LearningRate 0.0694   Epoch: 3   Global Step: 41520   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:19:43,449-Speed 3337.65 samples/sec   Loss 7.5362   LearningRate 0.0694   Epoch: 3   Global Step: 41530   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:19:46,488-Speed 3371.76 samples/sec   Loss 7.4398   LearningRate 0.0694   Epoch: 3   Global Step: 41540   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:19:49,536-Speed 3360.40 samples/sec   Loss 7.5161   LearningRate 0.0693   Epoch: 3   Global Step: 41550   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:19:52,572-Speed 3373.50 samples/sec   Loss 7.3908   LearningRate 0.0693   Epoch: 3   Global Step: 41560   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:19:55,595-Speed 3388.57 samples/sec   Loss 7.5492   LearningRate 0.0693   Epoch: 3   Global Step: 41570   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:19:58,634-Speed 3370.76 samples/sec   Loss 7.4995   LearningRate 0.0693   Epoch: 3   Global Step: 41580   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:20:01,677-Speed 3366.00 samples/sec   Loss 7.4229   LearningRate 0.0693   Epoch: 3   Global Step: 41590   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:20:04,748-Speed 3334.63 samples/sec   Loss 7.3723   LearningRate 0.0693   Epoch: 3   Global Step: 41600   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:20:07,766-Speed 3394.46 samples/sec   Loss 7.4631   LearningRate 0.0693   Epoch: 3   Global Step: 41610   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:20:10,803-Speed 3372.98 samples/sec   Loss 7.4067   LearningRate 0.0693   Epoch: 3   Global Step: 41620   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:20:13,857-Speed 3353.70 samples/sec   Loss 7.4946   LearningRate 0.0693   Epoch: 3   Global Step: 41630   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:20:16,915-Speed 3349.56 samples/sec   Loss 7.3831   LearningRate 0.0693   Epoch: 3   Global Step: 41640   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:20:19,961-Speed 3362.99 samples/sec   Loss 7.6026   LearningRate 0.0693   Epoch: 3   Global Step: 41650   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:20:22,996-Speed 3375.76 samples/sec   Loss 7.5414   LearningRate 0.0693   Epoch: 3   Global Step: 41660   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:20:26,089-Speed 3310.65 samples/sec   Loss 7.3865   LearningRate 0.0693   Epoch: 3   Global Step: 41670   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:20:29,164-Speed 3332.23 samples/sec   Loss 7.5109   LearningRate 0.0693   Epoch: 3   Global Step: 41680   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:20:32,187-Speed 3388.23 samples/sec   Loss 7.5252   LearningRate 0.0693   Epoch: 3   Global Step: 41690   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:20:35,241-Speed 3354.25 samples/sec   Loss 7.5259   LearningRate 0.0692   Epoch: 3   Global Step: 41700   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:20:38,248-Speed 3405.75 samples/sec   Loss 7.5259   LearningRate 0.0692   Epoch: 3   Global Step: 41710   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:20:41,287-Speed 3370.85 samples/sec   Loss 7.4891   LearningRate 0.0692   Epoch: 3   Global Step: 41720   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:20:44,350-Speed 3344.45 samples/sec   Loss 7.3530   LearningRate 0.0692   Epoch: 3   Global Step: 41730   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:20:47,404-Speed 3353.68 samples/sec   Loss 7.5187   LearningRate 0.0692   Epoch: 3   Global Step: 41740   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:20:50,460-Speed 3352.54 samples/sec   Loss 7.4653   LearningRate 0.0692   Epoch: 3   Global Step: 41750   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:20:53,496-Speed 3373.30 samples/sec   Loss 7.3842   LearningRate 0.0692   Epoch: 3   Global Step: 41760   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:20:56,536-Speed 3369.95 samples/sec   Loss 7.5583   LearningRate 0.0692   Epoch: 3   Global Step: 41770   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:20:59,572-Speed 3373.19 samples/sec   Loss 7.5305   LearningRate 0.0692   Epoch: 3   Global Step: 41780   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:21:02,654-Speed 3323.63 samples/sec   Loss 7.4955   LearningRate 0.0692   Epoch: 3   Global Step: 41790   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:21:05,717-Speed 3344.62 samples/sec   Loss 7.4781   LearningRate 0.0692   Epoch: 3   Global Step: 41800   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:21:08,770-Speed 3354.98 samples/sec   Loss 7.4758   LearningRate 0.0692   Epoch: 3   Global Step: 41810   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:21:11,801-Speed 3379.09 samples/sec   Loss 7.5407   LearningRate 0.0692   Epoch: 3   Global Step: 41820   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:21:14,858-Speed 3351.42 samples/sec   Loss 7.4962   LearningRate 0.0692   Epoch: 3   Global Step: 41830   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:21:17,903-Speed 3364.08 samples/sec   Loss 7.5766   LearningRate 0.0692   Epoch: 3   Global Step: 41840   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:21:20,926-Speed 3388.30 samples/sec   Loss 7.3091   LearningRate 0.0691   Epoch: 3   Global Step: 41850   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:21:23,978-Speed 3355.82 samples/sec   Loss 7.5907   LearningRate 0.0691   Epoch: 3   Global Step: 41860   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:21:27,035-Speed 3350.64 samples/sec   Loss 7.3748   LearningRate 0.0691   Epoch: 3   Global Step: 41870   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:21:30,108-Speed 3333.67 samples/sec   Loss 7.4051   LearningRate 0.0691   Epoch: 3   Global Step: 41880   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:21:33,113-Speed 3408.17 samples/sec   Loss 7.4119   LearningRate 0.0691   Epoch: 3   Global Step: 41890   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:21:36,124-Speed 3402.20 samples/sec   Loss 7.5106   LearningRate 0.0691   Epoch: 3   Global Step: 41900   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:21:39,157-Speed 3377.78 samples/sec   Loss 7.5434   LearningRate 0.0691   Epoch: 3   Global Step: 41910   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:21:42,220-Speed 3344.27 samples/sec   Loss 7.3854   LearningRate 0.0691   Epoch: 3   Global Step: 41920   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:21:45,216-Speed 3418.84 samples/sec   Loss 7.5102   LearningRate 0.0691   Epoch: 3   Global Step: 41930   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:21:48,274-Speed 3349.57 samples/sec   Loss 7.4649   LearningRate 0.0691   Epoch: 3   Global Step: 41940   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:21:51,380-Speed 3297.79 samples/sec   Loss 7.4998   LearningRate 0.0691   Epoch: 3   Global Step: 41950   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:21:54,459-Speed 3327.46 samples/sec   Loss 7.4703   LearningRate 0.0691   Epoch: 3   Global Step: 41960   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:21:57,494-Speed 3374.87 samples/sec   Loss 7.4084   LearningRate 0.0691   Epoch: 3   Global Step: 41970   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:22:00,526-Speed 3378.70 samples/sec   Loss 7.4678   LearningRate 0.0691   Epoch: 3   Global Step: 41980   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:22:03,569-Speed 3365.49 samples/sec   Loss 7.4546   LearningRate 0.0691   Epoch: 3   Global Step: 41990   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:22:06,595-Speed 3385.71 samples/sec   Loss 7.5350   LearningRate 0.0690   Epoch: 3   Global Step: 42000   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:22:09,621-Speed 3385.01 samples/sec   Loss 7.5193   LearningRate 0.0690   Epoch: 3   Global Step: 42010   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:22:12,645-Speed 3387.65 samples/sec   Loss 7.5254   LearningRate 0.0690   Epoch: 3   Global Step: 42020   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:22:15,673-Speed 3382.71 samples/sec   Loss 7.5178   LearningRate 0.0690   Epoch: 3   Global Step: 42030   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:22:18,780-Speed 3296.11 samples/sec   Loss 7.5007   LearningRate 0.0690   Epoch: 3   Global Step: 42040   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:22:21,771-Speed 3424.68 samples/sec   Loss 7.3767   LearningRate 0.0690   Epoch: 3   Global Step: 42050   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:22:24,791-Speed 3391.74 samples/sec   Loss 7.5141   LearningRate 0.0690   Epoch: 3   Global Step: 42060   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:22:27,793-Speed 3412.54 samples/sec   Loss 7.4236   LearningRate 0.0690   Epoch: 3   Global Step: 42070   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:22:30,820-Speed 3383.61 samples/sec   Loss 7.5506   LearningRate 0.0690   Epoch: 3   Global Step: 42080   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:22:33,834-Speed 3398.59 samples/sec   Loss 7.4704   LearningRate 0.0690   Epoch: 3   Global Step: 42090   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:22:36,881-Speed 3362.36 samples/sec   Loss 7.5148   LearningRate 0.0690   Epoch: 3   Global Step: 42100   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:22:39,936-Speed 3352.48 samples/sec   Loss 7.4640   LearningRate 0.0690   Epoch: 3   Global Step: 42110   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:22:43,017-Speed 3325.36 samples/sec   Loss 7.5796   LearningRate 0.0690   Epoch: 3   Global Step: 42120   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:22:46,015-Speed 3416.49 samples/sec   Loss 7.4539   LearningRate 0.0690   Epoch: 3   Global Step: 42130   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:22:49,078-Speed 3343.75 samples/sec   Loss 7.5169   LearningRate 0.0690   Epoch: 3   Global Step: 42140   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:22:52,128-Speed 3359.12 samples/sec   Loss 7.4742   LearningRate 0.0689   Epoch: 3   Global Step: 42150   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:22:55,170-Speed 3367.54 samples/sec   Loss 7.5030   LearningRate 0.0689   Epoch: 3   Global Step: 42160   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:22:58,169-Speed 3415.50 samples/sec   Loss 7.4319   LearningRate 0.0689   Epoch: 3   Global Step: 42170   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:23:01,219-Speed 3357.90 samples/sec   Loss 7.5019   LearningRate 0.0689   Epoch: 3   Global Step: 42180   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:23:04,325-Speed 3298.07 samples/sec   Loss 7.6340   LearningRate 0.0689   Epoch: 3   Global Step: 42190   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:23:07,379-Speed 3353.83 samples/sec   Loss 7.4757   LearningRate 0.0689   Epoch: 3   Global Step: 42200   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:23:10,413-Speed 3378.00 samples/sec   Loss 7.5704   LearningRate 0.0689   Epoch: 3   Global Step: 42210   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:23:13,489-Speed 3330.03 samples/sec   Loss 7.4261   LearningRate 0.0689   Epoch: 3   Global Step: 42220   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:23:16,580-Speed 3314.37 samples/sec   Loss 7.4965   LearningRate 0.0689   Epoch: 3   Global Step: 42230   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:23:19,637-Speed 3350.12 samples/sec   Loss 7.6435   LearningRate 0.0689   Epoch: 3   Global Step: 42240   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:23:22,641-Speed 3410.48 samples/sec   Loss 7.7089   LearningRate 0.0689   Epoch: 3   Global Step: 42250   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:23:25,698-Speed 3351.00 samples/sec   Loss 7.6282   LearningRate 0.0689   Epoch: 3   Global Step: 42260   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:23:28,735-Speed 3372.36 samples/sec   Loss 7.5759   LearningRate 0.0689   Epoch: 3   Global Step: 42270   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:23:31,744-Speed 3404.43 samples/sec   Loss 7.4810   LearningRate 0.0689   Epoch: 3   Global Step: 42280   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:23:34,781-Speed 3372.37 samples/sec   Loss 7.4719   LearningRate 0.0689   Epoch: 3   Global Step: 42290   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:23:37,785-Speed 3410.16 samples/sec   Loss 7.4988   LearningRate 0.0688   Epoch: 3   Global Step: 42300   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:23:40,801-Speed 3396.68 samples/sec   Loss 7.5200   LearningRate 0.0688   Epoch: 3   Global Step: 42310   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:23:43,824-Speed 3388.76 samples/sec   Loss 7.4843   LearningRate 0.0688   Epoch: 3   Global Step: 42320   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:23:46,833-Speed 3404.28 samples/sec   Loss 7.4201   LearningRate 0.0688   Epoch: 3   Global Step: 42330   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:23:49,899-Speed 3341.29 samples/sec   Loss 7.4745   LearningRate 0.0688   Epoch: 3   Global Step: 42340   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:23:52,947-Speed 3360.75 samples/sec   Loss 7.5872   LearningRate 0.0688   Epoch: 3   Global Step: 42350   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:23:56,041-Speed 3310.63 samples/sec   Loss 7.4851   LearningRate 0.0688   Epoch: 3   Global Step: 42360   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:23:59,059-Speed 3393.97 samples/sec   Loss 7.4689   LearningRate 0.0688   Epoch: 3   Global Step: 42370   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:24:02,138-Speed 3326.89 samples/sec   Loss 7.5792   LearningRate 0.0688   Epoch: 3   Global Step: 42380   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:24:05,207-Speed 3337.65 samples/sec   Loss 7.6051   LearningRate 0.0688   Epoch: 3   Global Step: 42390   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:24:08,266-Speed 3349.02 samples/sec   Loss 7.5532   LearningRate 0.0688   Epoch: 3   Global Step: 42400   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:24:11,384-Speed 3284.64 samples/sec   Loss 7.4898   LearningRate 0.0688   Epoch: 3   Global Step: 42410   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:24:14,420-Speed 3373.84 samples/sec   Loss 7.4450   LearningRate 0.0688   Epoch: 3   Global Step: 42420   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:24:17,475-Speed 3353.46 samples/sec   Loss 7.4640   LearningRate 0.0688   Epoch: 3   Global Step: 42430   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:24:20,520-Speed 3363.51 samples/sec   Loss 7.6554   LearningRate 0.0688   Epoch: 3   Global Step: 42440   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:24:23,614-Speed 3310.61 samples/sec   Loss 7.5154   LearningRate 0.0687   Epoch: 3   Global Step: 42450   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:24:26,736-Speed 3281.32 samples/sec   Loss 7.5788   LearningRate 0.0687   Epoch: 3   Global Step: 42460   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:24:29,785-Speed 3359.42 samples/sec   Loss 7.4354   LearningRate 0.0687   Epoch: 3   Global Step: 42470   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:24:32,819-Speed 3376.60 samples/sec   Loss 7.5154   LearningRate 0.0687   Epoch: 3   Global Step: 42480   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:24:35,901-Speed 3322.91 samples/sec   Loss 7.6188   LearningRate 0.0687   Epoch: 3   Global Step: 42490   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:24:38,983-Speed 3324.29 samples/sec   Loss 7.5036   LearningRate 0.0687   Epoch: 3   Global Step: 42500   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:24:41,989-Speed 3407.32 samples/sec   Loss 7.5568   LearningRate 0.0687   Epoch: 3   Global Step: 42510   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:24:45,010-Speed 3390.74 samples/sec   Loss 7.4792   LearningRate 0.0687   Epoch: 3   Global Step: 42520   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:24:48,074-Speed 3342.54 samples/sec   Loss 7.5445   LearningRate 0.0687   Epoch: 3   Global Step: 42530   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:24:51,153-Speed 3326.50 samples/sec   Loss 7.5354   LearningRate 0.0687   Epoch: 3   Global Step: 42540   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:24:54,154-Speed 3413.67 samples/sec   Loss 7.5676   LearningRate 0.0687   Epoch: 3   Global Step: 42550   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:24:57,153-Speed 3415.64 samples/sec   Loss 7.4735   LearningRate 0.0687   Epoch: 3   Global Step: 42560   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:25:00,214-Speed 3346.51 samples/sec   Loss 7.5288   LearningRate 0.0687   Epoch: 3   Global Step: 42570   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:25:03,253-Speed 3370.07 samples/sec   Loss 7.5321   LearningRate 0.0687   Epoch: 3   Global Step: 42580   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:25:06,347-Speed 3310.76 samples/sec   Loss 7.4417   LearningRate 0.0687   Epoch: 3   Global Step: 42590   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:25:09,354-Speed 3406.94 samples/sec   Loss 7.5178   LearningRate 0.0686   Epoch: 3   Global Step: 42600   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:25:12,411-Speed 3350.31 samples/sec   Loss 7.4863   LearningRate 0.0686   Epoch: 3   Global Step: 42610   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:25:15,532-Speed 3282.48 samples/sec   Loss 7.4909   LearningRate 0.0686   Epoch: 3   Global Step: 42620   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:25:18,615-Speed 3322.52 samples/sec   Loss 7.6004   LearningRate 0.0686   Epoch: 3   Global Step: 42630   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:25:21,656-Speed 3368.15 samples/sec   Loss 7.4406   LearningRate 0.0686   Epoch: 3   Global Step: 42640   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:25:24,689-Speed 3377.87 samples/sec   Loss 7.5081   LearningRate 0.0686   Epoch: 3   Global Step: 42650   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:25:27,714-Speed 3385.24 samples/sec   Loss 7.5539   LearningRate 0.0686   Epoch: 3   Global Step: 42660   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:25:30,790-Speed 3330.05 samples/sec   Loss 7.4574   LearningRate 0.0686   Epoch: 3   Global Step: 42670   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:25:33,835-Speed 3364.62 samples/sec   Loss 7.5343   LearningRate 0.0686   Epoch: 3   Global Step: 42680   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:25:36,855-Speed 3392.03 samples/sec   Loss 7.5218   LearningRate 0.0686   Epoch: 3   Global Step: 42690   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:25:39,880-Speed 3386.42 samples/sec   Loss 7.5252   LearningRate 0.0686   Epoch: 3   Global Step: 42700   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:25:42,889-Speed 3403.82 samples/sec   Loss 7.5976   LearningRate 0.0686   Epoch: 3   Global Step: 42710   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:25:45,913-Speed 3387.42 samples/sec   Loss 7.4963   LearningRate 0.0686   Epoch: 3   Global Step: 42720   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:25:48,971-Speed 3349.80 samples/sec   Loss 7.5192   LearningRate 0.0686   Epoch: 3   Global Step: 42730   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:25:51,993-Speed 3389.18 samples/sec   Loss 7.5498   LearningRate 0.0686   Epoch: 3   Global Step: 42740   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:25:55,056-Speed 3344.79 samples/sec   Loss 7.4965   LearningRate 0.0685   Epoch: 3   Global Step: 42750   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:25:58,064-Speed 3405.46 samples/sec   Loss 7.4316   LearningRate 0.0685   Epoch: 3   Global Step: 42760   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:26:01,149-Speed 3319.88 samples/sec   Loss 7.6222   LearningRate 0.0685   Epoch: 3   Global Step: 42770   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:26:04,268-Speed 3285.02 samples/sec   Loss 7.5426   LearningRate 0.0685   Epoch: 3   Global Step: 42780   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:26:07,286-Speed 3393.67 samples/sec   Loss 7.4697   LearningRate 0.0685   Epoch: 3   Global Step: 42790   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:26:10,321-Speed 3375.44 samples/sec   Loss 7.4910   LearningRate 0.0685   Epoch: 3   Global Step: 42800   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:26:13,340-Speed 3392.18 samples/sec   Loss 7.5342   LearningRate 0.0685   Epoch: 3   Global Step: 42810   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:26:16,420-Speed 3326.38 samples/sec   Loss 7.6652   LearningRate 0.0685   Epoch: 3   Global Step: 42820   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:26:19,498-Speed 3328.02 samples/sec   Loss 7.5759   LearningRate 0.0685   Epoch: 3   Global Step: 42830   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:26:22,577-Speed 3326.91 samples/sec   Loss 7.5609   LearningRate 0.0685   Epoch: 3   Global Step: 42840   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:26:25,686-Speed 3295.07 samples/sec   Loss 7.5815   LearningRate 0.0685   Epoch: 3   Global Step: 42850   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:26:28,761-Speed 3330.89 samples/sec   Loss 7.5720   LearningRate 0.0685   Epoch: 3   Global Step: 42860   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:26:31,804-Speed 3365.86 samples/sec   Loss 7.4779   LearningRate 0.0685   Epoch: 3   Global Step: 42870   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:26:34,837-Speed 3377.09 samples/sec   Loss 7.5194   LearningRate 0.0685   Epoch: 3   Global Step: 42880   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:26:37,850-Speed 3399.76 samples/sec   Loss 7.6501   LearningRate 0.0685   Epoch: 3   Global Step: 42890   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:26:40,851-Speed 3413.92 samples/sec   Loss 7.4829   LearningRate 0.0684   Epoch: 3   Global Step: 42900   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:26:43,860-Speed 3403.49 samples/sec   Loss 7.3893   LearningRate 0.0684   Epoch: 3   Global Step: 42910   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:26:46,919-Speed 3349.37 samples/sec   Loss 7.5291   LearningRate 0.0684   Epoch: 3   Global Step: 42920   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-04-27 05:26:49,951-Speed 3378.49 samples/sec   Loss 7.5244   LearningRate 0.0684   Epoch: 3   Global Step: 42930   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:26:53,045-Speed 3310.75 samples/sec   Loss 7.5605   LearningRate 0.0684   Epoch: 3   Global Step: 42940   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:26:56,079-Speed 3376.24 samples/sec   Loss 7.5806   LearningRate 0.0684   Epoch: 3   Global Step: 42950   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:26:59,126-Speed 3361.49 samples/sec   Loss 7.5433   LearningRate 0.0684   Epoch: 3   Global Step: 42960   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:27:02,167-Speed 3368.94 samples/sec   Loss 7.5127   LearningRate 0.0684   Epoch: 3   Global Step: 42970   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:27:05,251-Speed 3321.28 samples/sec   Loss 7.5309   LearningRate 0.0684   Epoch: 3   Global Step: 42980   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:27:08,259-Speed 3405.47 samples/sec   Loss 7.5518   LearningRate 0.0684   Epoch: 3   Global Step: 42990   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:27:11,290-Speed 3378.96 samples/sec   Loss 7.4917   LearningRate 0.0684   Epoch: 3   Global Step: 43000   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:27:14,321-Speed 3380.07 samples/sec   Loss 7.6606   LearningRate 0.0684   Epoch: 3   Global Step: 43010   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:27:17,337-Speed 3395.39 samples/sec   Loss 7.6299   LearningRate 0.0684   Epoch: 3   Global Step: 43020   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:27:20,358-Speed 3390.77 samples/sec   Loss 7.5729   LearningRate 0.0684   Epoch: 3   Global Step: 43030   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:27:23,379-Speed 3390.52 samples/sec   Loss 7.5709   LearningRate 0.0684   Epoch: 3   Global Step: 43040   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:27:26,487-Speed 3296.17 samples/sec   Loss 7.5631   LearningRate 0.0683   Epoch: 3   Global Step: 43050   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:27:29,614-Speed 3275.53 samples/sec   Loss 7.5579   LearningRate 0.0683   Epoch: 3   Global Step: 43060   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:27:32,703-Speed 3316.49 samples/sec   Loss 7.3998   LearningRate 0.0683   Epoch: 3   Global Step: 43070   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:27:35,739-Speed 3374.59 samples/sec   Loss 7.6504   LearningRate 0.0683   Epoch: 3   Global Step: 43080   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:27:38,882-Speed 3258.90 samples/sec   Loss 7.4927   LearningRate 0.0683   Epoch: 3   Global Step: 43090   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:27:41,929-Speed 3361.68 samples/sec   Loss 7.4997   LearningRate 0.0683   Epoch: 3   Global Step: 43100   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:27:44,972-Speed 3366.33 samples/sec   Loss 7.4496   LearningRate 0.0683   Epoch: 3   Global Step: 43110   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:27:48,029-Speed 3350.55 samples/sec   Loss 7.4922   LearningRate 0.0683   Epoch: 3   Global Step: 43120   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:27:51,063-Speed 3376.14 samples/sec   Loss 7.4629   LearningRate 0.0683   Epoch: 3   Global Step: 43130   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:27:54,104-Speed 3368.21 samples/sec   Loss 7.4847   LearningRate 0.0683   Epoch: 3   Global Step: 43140   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:27:57,117-Speed 3399.38 samples/sec   Loss 7.5906   LearningRate 0.0683   Epoch: 3   Global Step: 43150   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:28:00,167-Speed 3358.42 samples/sec   Loss 7.6392   LearningRate 0.0683   Epoch: 3   Global Step: 43160   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:28:03,249-Speed 3323.96 samples/sec   Loss 7.5537   LearningRate 0.0683   Epoch: 3   Global Step: 43170   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:28:06,277-Speed 3383.01 samples/sec   Loss 7.6076   LearningRate 0.0683   Epoch: 3   Global Step: 43180   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:28:09,297-Speed 3391.11 samples/sec   Loss 7.5080   LearningRate 0.0683   Epoch: 3   Global Step: 43190   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:28:12,352-Speed 3353.74 samples/sec   Loss 7.4694   LearningRate 0.0682   Epoch: 3   Global Step: 43200   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:28:15,366-Speed 3398.51 samples/sec   Loss 7.4732   LearningRate 0.0682   Epoch: 3   Global Step: 43210   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:28:18,451-Speed 3320.56 samples/sec   Loss 7.5338   LearningRate 0.0682   Epoch: 3   Global Step: 43220   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:28:21,472-Speed 3390.91 samples/sec   Loss 7.5721   LearningRate 0.0682   Epoch: 3   Global Step: 43230   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:28:24,522-Speed 3358.08 samples/sec   Loss 7.5162   LearningRate 0.0682   Epoch: 3   Global Step: 43240   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:28:27,538-Speed 3396.30 samples/sec   Loss 7.5760   LearningRate 0.0682   Epoch: 3   Global Step: 43250   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:28:30,630-Speed 3313.19 samples/sec   Loss 7.5699   LearningRate 0.0682   Epoch: 3   Global Step: 43260   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:28:33,652-Speed 3389.15 samples/sec   Loss 7.5879   LearningRate 0.0682   Epoch: 3   Global Step: 43270   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:28:36,711-Speed 3349.34 samples/sec   Loss 7.4128   LearningRate 0.0682   Epoch: 3   Global Step: 43280   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:28:39,810-Speed 3305.62 samples/sec   Loss 7.4950   LearningRate 0.0682   Epoch: 3   Global Step: 43290   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:28:42,909-Speed 3304.94 samples/sec   Loss 7.5101   LearningRate 0.0682   Epoch: 3   Global Step: 43300   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:28:45,923-Speed 3398.82 samples/sec   Loss 7.5845   LearningRate 0.0682   Epoch: 3   Global Step: 43310   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:28:48,942-Speed 3392.43 samples/sec   Loss 7.5724   LearningRate 0.0682   Epoch: 3   Global Step: 43320   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:28:51,969-Speed 3384.81 samples/sec   Loss 7.5491   LearningRate 0.0682   Epoch: 3   Global Step: 43330   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:28:55,016-Speed 3361.82 samples/sec   Loss 7.5012   LearningRate 0.0682   Epoch: 3   Global Step: 43340   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-04-27 05:28:58,052-Speed 3373.05 samples/sec   Loss 7.5601   LearningRate 0.0681   Epoch: 3   Global Step: 43350   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:29:01,114-Speed 3345.73 samples/sec   Loss 7.5477   LearningRate 0.0681   Epoch: 3   Global Step: 43360   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:29:04,135-Speed 3390.57 samples/sec   Loss 7.5570   LearningRate 0.0681   Epoch: 3   Global Step: 43370   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:29:07,213-Speed 3327.96 samples/sec   Loss 7.5038   LearningRate 0.0681   Epoch: 3   Global Step: 43380   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:29:10,235-Speed 3389.59 samples/sec   Loss 7.5064   LearningRate 0.0681   Epoch: 3   Global Step: 43390   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:29:13,279-Speed 3364.41 samples/sec   Loss 7.5417   LearningRate 0.0681   Epoch: 3   Global Step: 43400   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:29:16,374-Speed 3310.39 samples/sec   Loss 7.4854   LearningRate 0.0681   Epoch: 3   Global Step: 43410   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:29:19,416-Speed 3367.05 samples/sec   Loss 7.3955   LearningRate 0.0681   Epoch: 3   Global Step: 43420   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:29:22,453-Speed 3373.15 samples/sec   Loss 7.4234   LearningRate 0.0681   Epoch: 3   Global Step: 43430   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:29:25,548-Speed 3308.78 samples/sec   Loss 7.7104   LearningRate 0.0681   Epoch: 3   Global Step: 43440   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-04-27 05:29:28,610-Speed 3345.47 samples/sec   Loss 7.5528   LearningRate 0.0681   Epoch: 3   Global Step: 43450   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:29:31,643-Speed 3378.00 samples/sec   Loss 7.4994   LearningRate 0.0681   Epoch: 3   Global Step: 43460   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:29:34,683-Speed 3368.73 samples/sec   Loss 7.5214   LearningRate 0.0681   Epoch: 3   Global Step: 43470   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:29:37,743-Speed 3347.83 samples/sec   Loss 7.4671   LearningRate 0.0681   Epoch: 3   Global Step: 43480   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:29:40,802-Speed 3348.69 samples/sec   Loss 7.6484   LearningRate 0.0681   Epoch: 3   Global Step: 43490   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:29:43,886-Speed 3321.28 samples/sec   Loss 7.4660   LearningRate 0.0680   Epoch: 3   Global Step: 43500   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:29:46,937-Speed 3357.13 samples/sec   Loss 7.5448   LearningRate 0.0680   Epoch: 3   Global Step: 43510   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:29:49,953-Speed 3396.99 samples/sec   Loss 7.5821   LearningRate 0.0680   Epoch: 3   Global Step: 43520   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:29:53,082-Speed 3272.81 samples/sec   Loss 7.5336   LearningRate 0.0680   Epoch: 3   Global Step: 43530   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:29:56,126-Speed 3365.66 samples/sec   Loss 7.5939   LearningRate 0.0680   Epoch: 3   Global Step: 43540   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:29:59,240-Speed 3289.40 samples/sec   Loss 7.5701   LearningRate 0.0680   Epoch: 3   Global Step: 43550   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:30:02,304-Speed 3343.05 samples/sec   Loss 7.5295   LearningRate 0.0680   Epoch: 3   Global Step: 43560   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:30:05,359-Speed 3352.81 samples/sec   Loss 7.6872   LearningRate 0.0680   Epoch: 3   Global Step: 43570   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:30:08,371-Speed 3400.55 samples/sec   Loss 7.5629   LearningRate 0.0680   Epoch: 3   Global Step: 43580   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:30:11,412-Speed 3368.81 samples/sec   Loss 7.6270   LearningRate 0.0680   Epoch: 3   Global Step: 43590   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:30:14,485-Speed 3333.35 samples/sec   Loss 7.5771   LearningRate 0.0680   Epoch: 3   Global Step: 43600   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:30:17,558-Speed 3332.86 samples/sec   Loss 7.5731   LearningRate 0.0680   Epoch: 3   Global Step: 43610   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:30:20,579-Speed 3390.88 samples/sec   Loss 7.6183   LearningRate 0.0680   Epoch: 3   Global Step: 43620   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:30:23,643-Speed 3343.87 samples/sec   Loss 7.5901   LearningRate 0.0680   Epoch: 3   Global Step: 43630   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:30:26,698-Speed 3352.76 samples/sec   Loss 7.5562   LearningRate 0.0680   Epoch: 3   Global Step: 43640   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:30:29,725-Speed 3383.34 samples/sec   Loss 7.5379   LearningRate 0.0679   Epoch: 3   Global Step: 43650   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:30:32,793-Speed 3338.75 samples/sec   Loss 7.5188   LearningRate 0.0679   Epoch: 3   Global Step: 43660   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:30:35,877-Speed 3321.52 samples/sec   Loss 7.4813   LearningRate 0.0679   Epoch: 3   Global Step: 43670   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:30:38,969-Speed 3313.09 samples/sec   Loss 7.5049   LearningRate 0.0679   Epoch: 3   Global Step: 43680   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:30:42,038-Speed 3336.90 samples/sec   Loss 7.5331   LearningRate 0.0679   Epoch: 3   Global Step: 43690   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:30:45,054-Speed 3396.60 samples/sec   Loss 7.5383   LearningRate 0.0679   Epoch: 3   Global Step: 43700   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:30:48,093-Speed 3370.46 samples/sec   Loss 7.6116   LearningRate 0.0679   Epoch: 3   Global Step: 43710   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:30:51,158-Speed 3342.34 samples/sec   Loss 7.5785   LearningRate 0.0679   Epoch: 3   Global Step: 43720   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:30:54,219-Speed 3346.03 samples/sec   Loss 7.5950   LearningRate 0.0679   Epoch: 3   Global Step: 43730   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:30:57,268-Speed 3359.54 samples/sec   Loss 7.5160   LearningRate 0.0679   Epoch: 3   Global Step: 43740   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:31:00,297-Speed 3381.98 samples/sec   Loss 7.6121   LearningRate 0.0679   Epoch: 3   Global Step: 43750   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:31:03,341-Speed 3364.93 samples/sec   Loss 7.5876   LearningRate 0.0679   Epoch: 3   Global Step: 43760   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:31:06,369-Speed 3383.27 samples/sec   Loss 7.4856   LearningRate 0.0679   Epoch: 3   Global Step: 43770   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:31:09,398-Speed 3382.00 samples/sec   Loss 7.5642   LearningRate 0.0679   Epoch: 3   Global Step: 43780   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:31:12,442-Speed 3365.09 samples/sec   Loss 7.6148   LearningRate 0.0679   Epoch: 3   Global Step: 43790   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:31:15,461-Speed 3392.59 samples/sec   Loss 7.4620   LearningRate 0.0678   Epoch: 3   Global Step: 43800   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:31:18,505-Speed 3365.79 samples/sec   Loss 7.5393   LearningRate 0.0678   Epoch: 3   Global Step: 43810   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:31:21,561-Speed 3351.49 samples/sec   Loss 7.5178   LearningRate 0.0678   Epoch: 3   Global Step: 43820   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:31:24,612-Speed 3356.78 samples/sec   Loss 7.4997   LearningRate 0.0678   Epoch: 3   Global Step: 43830   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:31:27,666-Speed 3353.90 samples/sec   Loss 7.4732   LearningRate 0.0678   Epoch: 3   Global Step: 43840   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:31:30,711-Speed 3365.04 samples/sec   Loss 7.4887   LearningRate 0.0678   Epoch: 3   Global Step: 43850   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:31:33,752-Speed 3368.73 samples/sec   Loss 7.5241   LearningRate 0.0678   Epoch: 3   Global Step: 43860   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:31:36,765-Speed 3398.97 samples/sec   Loss 7.5225   LearningRate 0.0678   Epoch: 3   Global Step: 43870   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:31:39,823-Speed 3349.35 samples/sec   Loss 7.5199   LearningRate 0.0678   Epoch: 3   Global Step: 43880   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:31:42,921-Speed 3306.75 samples/sec   Loss 7.4490   LearningRate 0.0678   Epoch: 3   Global Step: 43890   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:31:45,935-Speed 3399.36 samples/sec   Loss 7.5086   LearningRate 0.0678   Epoch: 3   Global Step: 43900   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:31:48,967-Speed 3377.99 samples/sec   Loss 7.5765   LearningRate 0.0678   Epoch: 3   Global Step: 43910   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:31:51,996-Speed 3381.88 samples/sec   Loss 7.5249   LearningRate 0.0678   Epoch: 3   Global Step: 43920   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:31:55,041-Speed 3364.25 samples/sec   Loss 7.6154   LearningRate 0.0678   Epoch: 3   Global Step: 43930   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:31:58,052-Speed 3401.44 samples/sec   Loss 7.4834   LearningRate 0.0678   Epoch: 3   Global Step: 43940   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:32:01,112-Speed 3347.45 samples/sec   Loss 7.4941   LearningRate 0.0677   Epoch: 3   Global Step: 43950   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:32:04,126-Speed 3397.87 samples/sec   Loss 7.5038   LearningRate 0.0677   Epoch: 3   Global Step: 43960   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:32:07,176-Speed 3359.06 samples/sec   Loss 7.5279   LearningRate 0.0677   Epoch: 3   Global Step: 43970   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:32:10,201-Speed 3385.84 samples/sec   Loss 7.6262   LearningRate 0.0677   Epoch: 3   Global Step: 43980   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:32:13,337-Speed 3266.57 samples/sec   Loss 7.6402   LearningRate 0.0677   Epoch: 3   Global Step: 43990   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:32:16,348-Speed 3401.42 samples/sec   Loss 7.5196   LearningRate 0.0677   Epoch: 3   Global Step: 44000   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:32:19,357-Speed 3404.19 samples/sec   Loss 7.6270   LearningRate 0.0677   Epoch: 3   Global Step: 44010   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:32:22,410-Speed 3355.83 samples/sec   Loss 7.5288   LearningRate 0.0677   Epoch: 3   Global Step: 44020   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:32:25,498-Speed 3317.15 samples/sec   Loss 7.6149   LearningRate 0.0677   Epoch: 3   Global Step: 44030   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:32:28,543-Speed 3364.17 samples/sec   Loss 7.5470   LearningRate 0.0677   Epoch: 3   Global Step: 44040   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:32:31,604-Speed 3345.81 samples/sec   Loss 7.5260   LearningRate 0.0677   Epoch: 3   Global Step: 44050   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:32:34,632-Speed 3383.35 samples/sec   Loss 7.6378   LearningRate 0.0677   Epoch: 3   Global Step: 44060   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:32:37,650-Speed 3393.77 samples/sec   Loss 7.5369   LearningRate 0.0677   Epoch: 3   Global Step: 44070   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:32:40,704-Speed 3354.56 samples/sec   Loss 7.4682   LearningRate 0.0677   Epoch: 3   Global Step: 44080   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:32:43,719-Speed 3396.74 samples/sec   Loss 7.5495   LearningRate 0.0677   Epoch: 3   Global Step: 44090   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:32:46,750-Speed 3380.05 samples/sec   Loss 7.5406   LearningRate 0.0676   Epoch: 3   Global Step: 44100   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:32:49,764-Speed 3398.22 samples/sec   Loss 7.4209   LearningRate 0.0676   Epoch: 3   Global Step: 44110   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:32:52,821-Speed 3351.46 samples/sec   Loss 7.4839   LearningRate 0.0676   Epoch: 3   Global Step: 44120   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:32:55,859-Speed 3371.75 samples/sec   Loss 7.5148   LearningRate 0.0676   Epoch: 3   Global Step: 44130   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:32:58,969-Speed 3293.65 samples/sec   Loss 7.5084   LearningRate 0.0676   Epoch: 3   Global Step: 44140   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:33:02,101-Speed 3270.22 samples/sec   Loss 7.5866   LearningRate 0.0676   Epoch: 3   Global Step: 44150   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:33:05,215-Speed 3289.00 samples/sec   Loss 7.5690   LearningRate 0.0676   Epoch: 3   Global Step: 44160   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:33:08,228-Speed 3400.54 samples/sec   Loss 7.4788   LearningRate 0.0676   Epoch: 3   Global Step: 44170   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:33:11,226-Speed 3416.37 samples/sec   Loss 7.5793   LearningRate 0.0676   Epoch: 3   Global Step: 44180   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:33:14,251-Speed 3386.54 samples/sec   Loss 7.5597   LearningRate 0.0676   Epoch: 3   Global Step: 44190   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:33:17,303-Speed 3356.27 samples/sec   Loss 7.6253   LearningRate 0.0676   Epoch: 3   Global Step: 44200   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:33:20,335-Speed 3378.02 samples/sec   Loss 7.5875   LearningRate 0.0676   Epoch: 3   Global Step: 44210   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:33:23,344-Speed 3404.20 samples/sec   Loss 7.5225   LearningRate 0.0676   Epoch: 3   Global Step: 44220   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:33:26,398-Speed 3354.39 samples/sec   Loss 7.4267   LearningRate 0.0676   Epoch: 3   Global Step: 44230   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:33:29,446-Speed 3360.81 samples/sec   Loss 7.6105   LearningRate 0.0676   Epoch: 3   Global Step: 44240   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:33:32,526-Speed 3325.46 samples/sec   Loss 7.5940   LearningRate 0.0675   Epoch: 3   Global Step: 44250   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:33:35,580-Speed 3354.19 samples/sec   Loss 7.6105   LearningRate 0.0675   Epoch: 3   Global Step: 44260   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:33:38,616-Speed 3373.79 samples/sec   Loss 7.5574   LearningRate 0.0675   Epoch: 3   Global Step: 44270   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:33:41,672-Speed 3352.53 samples/sec   Loss 7.5707   LearningRate 0.0675   Epoch: 3   Global Step: 44280   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:33:44,673-Speed 3412.90 samples/sec   Loss 7.4614   LearningRate 0.0675   Epoch: 3   Global Step: 44290   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:33:47,689-Speed 3396.19 samples/sec   Loss 7.5753   LearningRate 0.0675   Epoch: 3   Global Step: 44300   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:33:50,684-Speed 3420.28 samples/sec   Loss 7.5770   LearningRate 0.0675   Epoch: 3   Global Step: 44310   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:33:53,721-Speed 3372.07 samples/sec   Loss 7.5363   LearningRate 0.0675   Epoch: 3   Global Step: 44320   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:33:56,735-Speed 3399.36 samples/sec   Loss 7.5200   LearningRate 0.0675   Epoch: 3   Global Step: 44330   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:33:59,783-Speed 3360.66 samples/sec   Loss 7.6177   LearningRate 0.0675   Epoch: 3   Global Step: 44340   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:34:02,844-Speed 3346.75 samples/sec   Loss 7.5872   LearningRate 0.0675   Epoch: 3   Global Step: 44350   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:34:05,863-Speed 3392.93 samples/sec   Loss 7.5800   LearningRate 0.0675   Epoch: 3   Global Step: 44360   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:34:08,859-Speed 3418.86 samples/sec   Loss 7.5623   LearningRate 0.0675   Epoch: 3   Global Step: 44370   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:34:11,882-Speed 3388.17 samples/sec   Loss 7.5357   LearningRate 0.0675   Epoch: 3   Global Step: 44380   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:34:14,965-Speed 3323.20 samples/sec   Loss 7.6262   LearningRate 0.0675   Epoch: 3   Global Step: 44390   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:34:17,999-Speed 3375.73 samples/sec   Loss 7.5603   LearningRate 0.0674   Epoch: 3   Global Step: 44400   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:34:21,006-Speed 3406.14 samples/sec   Loss 7.6434   LearningRate 0.0674   Epoch: 3   Global Step: 44410   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:34:24,017-Speed 3402.04 samples/sec   Loss 7.5603   LearningRate 0.0674   Epoch: 3   Global Step: 44420   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:34:27,016-Speed 3416.15 samples/sec   Loss 7.7226   LearningRate 0.0674   Epoch: 3   Global Step: 44430   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:34:30,057-Speed 3368.21 samples/sec   Loss 7.4346   LearningRate 0.0674   Epoch: 3   Global Step: 44440   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:34:33,057-Speed 3414.10 samples/sec   Loss 7.4513   LearningRate 0.0674   Epoch: 3   Global Step: 44450   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:34:36,106-Speed 3360.23 samples/sec   Loss 7.5224   LearningRate 0.0674   Epoch: 3   Global Step: 44460   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:34:39,134-Speed 3382.62 samples/sec   Loss 7.6057   LearningRate 0.0674   Epoch: 3   Global Step: 44470   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:34:42,164-Speed 3380.54 samples/sec   Loss 7.4901   LearningRate 0.0674   Epoch: 3   Global Step: 44480   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:34:45,191-Speed 3383.94 samples/sec   Loss 7.4916   LearningRate 0.0674   Epoch: 3   Global Step: 44490   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:34:48,245-Speed 3354.30 samples/sec   Loss 7.6102   LearningRate 0.0674   Epoch: 3   Global Step: 44500   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:34:51,252-Speed 3406.50 samples/sec   Loss 7.5165   LearningRate 0.0674   Epoch: 3   Global Step: 44510   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:34:54,317-Speed 3342.22 samples/sec   Loss 7.6631   LearningRate 0.0674   Epoch: 3   Global Step: 44520   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:34:57,367-Speed 3358.59 samples/sec   Loss 7.5373   LearningRate 0.0674   Epoch: 3   Global Step: 44530   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:35:00,379-Speed 3401.14 samples/sec   Loss 7.3804   LearningRate 0.0674   Epoch: 3   Global Step: 44540   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:35:03,451-Speed 3333.76 samples/sec   Loss 7.5254   LearningRate 0.0673   Epoch: 3   Global Step: 44550   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:35:06,486-Speed 3375.91 samples/sec   Loss 7.6622   LearningRate 0.0673   Epoch: 3   Global Step: 44560   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:35:09,495-Speed 3403.23 samples/sec   Loss 7.6247   LearningRate 0.0673   Epoch: 3   Global Step: 44570   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:35:12,494-Speed 3416.24 samples/sec   Loss 7.5662   LearningRate 0.0673   Epoch: 3   Global Step: 44580   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:35:15,505-Speed 3402.57 samples/sec   Loss 7.6390   LearningRate 0.0673   Epoch: 3   Global Step: 44590   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:35:18,512-Speed 3405.73 samples/sec   Loss 7.5659   LearningRate 0.0673   Epoch: 3   Global Step: 44600   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:35:21,518-Speed 3408.14 samples/sec   Loss 7.4225   LearningRate 0.0673   Epoch: 3   Global Step: 44610   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:35:24,578-Speed 3347.39 samples/sec   Loss 7.6148   LearningRate 0.0673   Epoch: 3   Global Step: 44620   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:35:27,575-Speed 3417.75 samples/sec   Loss 7.5134   LearningRate 0.0673   Epoch: 3   Global Step: 44630   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:35:30,612-Speed 3373.60 samples/sec   Loss 7.5330   LearningRate 0.0673   Epoch: 3   Global Step: 44640   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:35:33,617-Speed 3408.36 samples/sec   Loss 7.5746   LearningRate 0.0673   Epoch: 3   Global Step: 44650   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:35:36,642-Speed 3385.97 samples/sec   Loss 7.5459   LearningRate 0.0673   Epoch: 3   Global Step: 44660   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:35:39,689-Speed 3361.51 samples/sec   Loss 7.6299   LearningRate 0.0673   Epoch: 3   Global Step: 44670   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:35:42,738-Speed 3359.84 samples/sec   Loss 7.5539   LearningRate 0.0673   Epoch: 3   Global Step: 44680   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:35:45,728-Speed 3425.72 samples/sec   Loss 7.6051   LearningRate 0.0673   Epoch: 3   Global Step: 44690   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:35:48,727-Speed 3415.20 samples/sec   Loss 7.6377   LearningRate 0.0673   Epoch: 3   Global Step: 44700   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:35:51,789-Speed 3345.73 samples/sec   Loss 7.4449   LearningRate 0.0672   Epoch: 3   Global Step: 44710   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:35:54,855-Speed 3341.34 samples/sec   Loss 7.4735   LearningRate 0.0672   Epoch: 3   Global Step: 44720   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:35:57,855-Speed 3413.87 samples/sec   Loss 7.4337   LearningRate 0.0672   Epoch: 3   Global Step: 44730   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:36:00,880-Speed 3386.05 samples/sec   Loss 7.5328   LearningRate 0.0672   Epoch: 3   Global Step: 44740   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:36:03,942-Speed 3345.08 samples/sec   Loss 7.6999   LearningRate 0.0672   Epoch: 3   Global Step: 44750   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:36:07,049-Speed 3297.80 samples/sec   Loss 7.6829   LearningRate 0.0672   Epoch: 3   Global Step: 44760   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:36:10,059-Speed 3402.63 samples/sec   Loss 7.6161   LearningRate 0.0672   Epoch: 3   Global Step: 44770   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:36:13,066-Speed 3407.02 samples/sec   Loss 7.6223   LearningRate 0.0672   Epoch: 3   Global Step: 44780   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:36:16,127-Speed 3346.73 samples/sec   Loss 7.4270   LearningRate 0.0672   Epoch: 3   Global Step: 44790   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:36:19,152-Speed 3385.50 samples/sec   Loss 7.6081   LearningRate 0.0672   Epoch: 3   Global Step: 44800   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:36:22,168-Speed 3396.13 samples/sec   Loss 7.5573   LearningRate 0.0672   Epoch: 3   Global Step: 44810   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:36:25,197-Speed 3381.88 samples/sec   Loss 7.5093   LearningRate 0.0672   Epoch: 3   Global Step: 44820   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:36:28,263-Speed 3341.52 samples/sec   Loss 7.5208   LearningRate 0.0672   Epoch: 3   Global Step: 44830   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:36:31,311-Speed 3359.76 samples/sec   Loss 7.5335   LearningRate 0.0672   Epoch: 3   Global Step: 44840   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:36:34,330-Speed 3393.35 samples/sec   Loss 7.5072   LearningRate 0.0672   Epoch: 3   Global Step: 44850   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:36:37,364-Speed 3377.18 samples/sec   Loss 7.5011   LearningRate 0.0671   Epoch: 3   Global Step: 44860   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:36:40,459-Speed 3308.99 samples/sec   Loss 7.5673   LearningRate 0.0671   Epoch: 3   Global Step: 44870   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:36:43,484-Speed 3386.40 samples/sec   Loss 7.6352   LearningRate 0.0671   Epoch: 3   Global Step: 44880   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:36:46,490-Speed 3407.86 samples/sec   Loss 7.5980   LearningRate 0.0671   Epoch: 3   Global Step: 44890   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:36:49,525-Speed 3374.81 samples/sec   Loss 7.5939   LearningRate 0.0671   Epoch: 3   Global Step: 44900   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:36:52,611-Speed 3318.85 samples/sec   Loss 7.5357   LearningRate 0.0671   Epoch: 3   Global Step: 44910   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:36:55,638-Speed 3385.29 samples/sec   Loss 7.5911   LearningRate 0.0671   Epoch: 3   Global Step: 44920   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:36:58,687-Speed 3358.62 samples/sec   Loss 7.5362   LearningRate 0.0671   Epoch: 3   Global Step: 44930   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:37:01,711-Speed 3387.73 samples/sec   Loss 7.4876   LearningRate 0.0671   Epoch: 3   Global Step: 44940   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:37:04,828-Speed 3285.94 samples/sec   Loss 7.6033   LearningRate 0.0671   Epoch: 3   Global Step: 44950   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:37:07,869-Speed 3368.71 samples/sec   Loss 7.5140   LearningRate 0.0671   Epoch: 3   Global Step: 44960   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:37:10,965-Speed 3308.46 samples/sec   Loss 7.4884   LearningRate 0.0671   Epoch: 3   Global Step: 44970   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:37:14,027-Speed 3345.23 samples/sec   Loss 7.5557   LearningRate 0.0671   Epoch: 3   Global Step: 44980   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:37:17,056-Speed 3381.61 samples/sec   Loss 7.5106   LearningRate 0.0671   Epoch: 3   Global Step: 44990   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:37:20,070-Speed 3399.11 samples/sec   Loss 7.5047   LearningRate 0.0671   Epoch: 3   Global Step: 45000   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:37:23,093-Speed 3388.36 samples/sec   Loss 7.6723   LearningRate 0.0670   Epoch: 3   Global Step: 45010   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:37:26,163-Speed 3335.89 samples/sec   Loss 7.5506   LearningRate 0.0670   Epoch: 3   Global Step: 45020   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:37:29,219-Speed 3352.82 samples/sec   Loss 7.6022   LearningRate 0.0670   Epoch: 3   Global Step: 45030   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:37:32,233-Speed 3397.95 samples/sec   Loss 7.5456   LearningRate 0.0670   Epoch: 3   Global Step: 45040   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:37:35,288-Speed 3352.92 samples/sec   Loss 7.5029   LearningRate 0.0670   Epoch: 3   Global Step: 45050   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:37:38,321-Speed 3377.40 samples/sec   Loss 7.5705   LearningRate 0.0670   Epoch: 3   Global Step: 45060   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:37:41,368-Speed 3361.65 samples/sec   Loss 7.5523   LearningRate 0.0670   Epoch: 3   Global Step: 45070   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:37:44,366-Speed 3416.80 samples/sec   Loss 7.5692   LearningRate 0.0670   Epoch: 3   Global Step: 45080   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:37:47,368-Speed 3412.42 samples/sec   Loss 7.4516   LearningRate 0.0670   Epoch: 3   Global Step: 45090   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:37:50,401-Speed 3377.67 samples/sec   Loss 7.5485   LearningRate 0.0670   Epoch: 3   Global Step: 45100   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:37:53,448-Speed 3362.32 samples/sec   Loss 7.6283   LearningRate 0.0670   Epoch: 3   Global Step: 45110   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:37:56,468-Speed 3390.79 samples/sec   Loss 7.5638   LearningRate 0.0670   Epoch: 3   Global Step: 45120   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:37:59,485-Speed 3395.51 samples/sec   Loss 7.6041   LearningRate 0.0670   Epoch: 3   Global Step: 45130   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:38:02,510-Speed 3385.94 samples/sec   Loss 7.5230   LearningRate 0.0670   Epoch: 3   Global Step: 45140   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:38:05,557-Speed 3362.48 samples/sec   Loss 7.5237   LearningRate 0.0670   Epoch: 3   Global Step: 45150   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:38:08,553-Speed 3418.09 samples/sec   Loss 7.4670   LearningRate 0.0669   Epoch: 3   Global Step: 45160   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:38:11,547-Speed 3421.69 samples/sec   Loss 7.5292   LearningRate 0.0669   Epoch: 3   Global Step: 45170   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:38:14,561-Speed 3398.43 samples/sec   Loss 7.6215   LearningRate 0.0669   Epoch: 3   Global Step: 45180   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:38:17,555-Speed 3421.97 samples/sec   Loss 7.5359   LearningRate 0.0669   Epoch: 3   Global Step: 45190   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:38:20,589-Speed 3376.20 samples/sec   Loss 7.5676   LearningRate 0.0669   Epoch: 3   Global Step: 45200   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:38:23,638-Speed 3358.85 samples/sec   Loss 7.5588   LearningRate 0.0669   Epoch: 3   Global Step: 45210   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:38:26,657-Speed 3393.08 samples/sec   Loss 7.6187   LearningRate 0.0669   Epoch: 3   Global Step: 45220   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:38:29,718-Speed 3346.85 samples/sec   Loss 7.6260   LearningRate 0.0669   Epoch: 3   Global Step: 45230   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:38:32,721-Speed 3411.06 samples/sec   Loss 7.5811   LearningRate 0.0669   Epoch: 3   Global Step: 45240   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:38:35,788-Speed 3340.13 samples/sec   Loss 7.5844   LearningRate 0.0669   Epoch: 3   Global Step: 45250   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:38:38,792-Speed 3408.84 samples/sec   Loss 7.5784   LearningRate 0.0669   Epoch: 3   Global Step: 45260   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:38:41,834-Speed 3367.17 samples/sec   Loss 7.6910   LearningRate 0.0669   Epoch: 3   Global Step: 45270   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:38:44,889-Speed 3353.82 samples/sec   Loss 7.5063   LearningRate 0.0669   Epoch: 3   Global Step: 45280   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:38:47,999-Speed 3294.19 samples/sec   Loss 7.5262   LearningRate 0.0669   Epoch: 3   Global Step: 45290   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:38:51,120-Speed 3281.73 samples/sec   Loss 7.5651   LearningRate 0.0669   Epoch: 3   Global Step: 45300   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:38:54,176-Speed 3351.98 samples/sec   Loss 7.4743   LearningRate 0.0668   Epoch: 3   Global Step: 45310   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:38:57,204-Speed 3382.87 samples/sec   Loss 7.5418   LearningRate 0.0668   Epoch: 3   Global Step: 45320   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:39:00,216-Speed 3400.70 samples/sec   Loss 7.6053   LearningRate 0.0668   Epoch: 3   Global Step: 45330   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:39:03,284-Speed 3338.05 samples/sec   Loss 7.5198   LearningRate 0.0668   Epoch: 3   Global Step: 45340   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:39:06,309-Speed 3387.17 samples/sec   Loss 7.5082   LearningRate 0.0668   Epoch: 3   Global Step: 45350   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:39:09,330-Speed 3390.03 samples/sec   Loss 7.5299   LearningRate 0.0668   Epoch: 3   Global Step: 45360   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:39:12,439-Speed 3294.97 samples/sec   Loss 7.5228   LearningRate 0.0668   Epoch: 3   Global Step: 45370   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:39:15,536-Speed 3307.62 samples/sec   Loss 7.6662   LearningRate 0.0668   Epoch: 3   Global Step: 45380   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:39:18,603-Speed 3340.08 samples/sec   Loss 7.5568   LearningRate 0.0668   Epoch: 3   Global Step: 45390   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:39:21,623-Speed 3391.89 samples/sec   Loss 7.5380   LearningRate 0.0668   Epoch: 3   Global Step: 45400   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:39:24,671-Speed 3361.26 samples/sec   Loss 7.4940   LearningRate 0.0668   Epoch: 3   Global Step: 45410   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:39:27,705-Speed 3376.51 samples/sec   Loss 7.6552   LearningRate 0.0668   Epoch: 3   Global Step: 45420   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:39:30,794-Speed 3314.88 samples/sec   Loss 7.4068   LearningRate 0.0668   Epoch: 3   Global Step: 45430   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:39:33,850-Speed 3352.03 samples/sec   Loss 7.5172   LearningRate 0.0668   Epoch: 3   Global Step: 45440   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:39:36,888-Speed 3372.13 samples/sec   Loss 7.5522   LearningRate 0.0668   Epoch: 3   Global Step: 45450   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:39:39,953-Speed 3341.57 samples/sec   Loss 7.6989   LearningRate 0.0667   Epoch: 3   Global Step: 45460   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:39:42,981-Speed 3383.23 samples/sec   Loss 7.4533   LearningRate 0.0667   Epoch: 3   Global Step: 45470   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:39:45,990-Speed 3404.98 samples/sec   Loss 7.4122   LearningRate 0.0667   Epoch: 3   Global Step: 45480   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:39:49,039-Speed 3359.32 samples/sec   Loss 7.5921   LearningRate 0.0667   Epoch: 3   Global Step: 45490   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:39:52,075-Speed 3374.47 samples/sec   Loss 7.4472   LearningRate 0.0667   Epoch: 3   Global Step: 45500   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:39:55,150-Speed 3331.91 samples/sec   Loss 7.6043   LearningRate 0.0667   Epoch: 3   Global Step: 45510   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:39:58,189-Speed 3370.63 samples/sec   Loss 7.6322   LearningRate 0.0667   Epoch: 3   Global Step: 45520   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:40:01,214-Speed 3386.27 samples/sec   Loss 7.3948   LearningRate 0.0667   Epoch: 3   Global Step: 45530   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:40:04,227-Speed 3399.59 samples/sec   Loss 7.5449   LearningRate 0.0667   Epoch: 3   Global Step: 45540   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:40:07,282-Speed 3352.86 samples/sec   Loss 7.5602   LearningRate 0.0667   Epoch: 3   Global Step: 45550   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:40:10,345-Speed 3344.28 samples/sec   Loss 7.4791   LearningRate 0.0667   Epoch: 3   Global Step: 45560   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:40:13,387-Speed 3367.28 samples/sec   Loss 7.4613   LearningRate 0.0667   Epoch: 3   Global Step: 45570   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:40:16,454-Speed 3339.66 samples/sec   Loss 7.5665   LearningRate 0.0667   Epoch: 3   Global Step: 45580   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:40:19,546-Speed 3312.79 samples/sec   Loss 7.5102   LearningRate 0.0667   Epoch: 3   Global Step: 45590   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:40:22,599-Speed 3355.18 samples/sec   Loss 7.5249   LearningRate 0.0667   Epoch: 3   Global Step: 45600   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:40:25,620-Speed 3390.84 samples/sec   Loss 7.5539   LearningRate 0.0667   Epoch: 3   Global Step: 45610   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:40:28,661-Speed 3368.57 samples/sec   Loss 7.4630   LearningRate 0.0666   Epoch: 3   Global Step: 45620   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:40:31,686-Speed 3387.21 samples/sec   Loss 7.5897   LearningRate 0.0666   Epoch: 3   Global Step: 45630   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:40:34,696-Speed 3402.82 samples/sec   Loss 7.5972   LearningRate 0.0666   Epoch: 3   Global Step: 45640   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:40:37,730-Speed 3375.99 samples/sec   Loss 7.5583   LearningRate 0.0666   Epoch: 3   Global Step: 45650   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:40:40,749-Speed 3392.71 samples/sec   Loss 7.4566   LearningRate 0.0666   Epoch: 3   Global Step: 45660   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:40:43,774-Speed 3386.35 samples/sec   Loss 7.4990   LearningRate 0.0666   Epoch: 3   Global Step: 45670   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:40:46,824-Speed 3358.66 samples/sec   Loss 7.4767   LearningRate 0.0666   Epoch: 3   Global Step: 45680   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:40:49,893-Speed 3336.86 samples/sec   Loss 7.6224   LearningRate 0.0666   Epoch: 3   Global Step: 45690   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:40:52,950-Speed 3351.31 samples/sec   Loss 7.4921   LearningRate 0.0666   Epoch: 3   Global Step: 45700   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:40:55,992-Speed 3367.11 samples/sec   Loss 7.5759   LearningRate 0.0666   Epoch: 3   Global Step: 45710   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:40:59,007-Speed 3397.62 samples/sec   Loss 7.6152   LearningRate 0.0666   Epoch: 3   Global Step: 45720   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:41:02,086-Speed 3326.88 samples/sec   Loss 7.4919   LearningRate 0.0666   Epoch: 3   Global Step: 45730   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:41:05,161-Speed 3331.13 samples/sec   Loss 7.6104   LearningRate 0.0666   Epoch: 3   Global Step: 45740   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:41:08,222-Speed 3346.29 samples/sec   Loss 7.5572   LearningRate 0.0666   Epoch: 3   Global Step: 45750   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:41:11,255-Speed 3376.81 samples/sec   Loss 7.6105   LearningRate 0.0666   Epoch: 3   Global Step: 45760   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:41:14,293-Speed 3371.84 samples/sec   Loss 7.5838   LearningRate 0.0665   Epoch: 3   Global Step: 45770   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:41:17,361-Speed 3338.88 samples/sec   Loss 7.6122   LearningRate 0.0665   Epoch: 3   Global Step: 45780   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:41:20,417-Speed 3351.32 samples/sec   Loss 7.6048   LearningRate 0.0665   Epoch: 3   Global Step: 45790   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:41:23,438-Speed 3390.66 samples/sec   Loss 7.4369   LearningRate 0.0665   Epoch: 3   Global Step: 45800   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:41:26,502-Speed 3343.79 samples/sec   Loss 7.4421   LearningRate 0.0665   Epoch: 3   Global Step: 45810   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:41:29,536-Speed 3375.41 samples/sec   Loss 7.5793   LearningRate 0.0665   Epoch: 3   Global Step: 45820   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:41:32,573-Speed 3372.91 samples/sec   Loss 7.4589   LearningRate 0.0665   Epoch: 3   Global Step: 45830   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:41:35,603-Speed 3380.76 samples/sec   Loss 7.6785   LearningRate 0.0665   Epoch: 3   Global Step: 45840   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:41:38,685-Speed 3324.11 samples/sec   Loss 7.6255   LearningRate 0.0665   Epoch: 3   Global Step: 45850   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:41:41,704-Speed 3393.11 samples/sec   Loss 7.5653   LearningRate 0.0665   Epoch: 3   Global Step: 45860   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:41:44,714-Speed 3403.32 samples/sec   Loss 7.6254   LearningRate 0.0665   Epoch: 3   Global Step: 45870   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:41:47,739-Speed 3386.28 samples/sec   Loss 7.4980   LearningRate 0.0665   Epoch: 3   Global Step: 45880   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:41:50,799-Speed 3347.69 samples/sec   Loss 7.6885   LearningRate 0.0665   Epoch: 3   Global Step: 45890   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:41:53,905-Speed 3298.62 samples/sec   Loss 7.4465   LearningRate 0.0665   Epoch: 3   Global Step: 45900   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:41:56,939-Speed 3375.70 samples/sec   Loss 7.4577   LearningRate 0.0665   Epoch: 3   Global Step: 45910   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:42:00,028-Speed 3315.82 samples/sec   Loss 7.4545   LearningRate 0.0664   Epoch: 3   Global Step: 45920   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:42:03,098-Speed 3336.81 samples/sec   Loss 7.4130   LearningRate 0.0664   Epoch: 3   Global Step: 45930   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:42:06,138-Speed 3369.16 samples/sec   Loss 7.4811   LearningRate 0.0664   Epoch: 3   Global Step: 45940   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:42:09,165-Speed 3384.70 samples/sec   Loss 7.5252   LearningRate 0.0664   Epoch: 3   Global Step: 45950   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:42:12,234-Speed 3336.86 samples/sec   Loss 7.4907   LearningRate 0.0664   Epoch: 3   Global Step: 45960   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:42:15,351-Speed 3286.28 samples/sec   Loss 7.5365   LearningRate 0.0664   Epoch: 3   Global Step: 45970   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:42:18,412-Speed 3347.10 samples/sec   Loss 7.5359   LearningRate 0.0664   Epoch: 3   Global Step: 45980   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:42:21,434-Speed 3389.81 samples/sec   Loss 7.5313   LearningRate 0.0664   Epoch: 3   Global Step: 45990   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:42:24,482-Speed 3360.28 samples/sec   Loss 7.4928   LearningRate 0.0664   Epoch: 3   Global Step: 46000   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:42:27,574-Speed 3313.61 samples/sec   Loss 7.3824   LearningRate 0.0664   Epoch: 3   Global Step: 46010   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:42:30,632-Speed 3348.52 samples/sec   Loss 7.5834   LearningRate 0.0664   Epoch: 3   Global Step: 46020   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:42:33,658-Speed 3385.36 samples/sec   Loss 7.4849   LearningRate 0.0664   Epoch: 3   Global Step: 46030   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:42:36,682-Speed 3387.66 samples/sec   Loss 7.6149   LearningRate 0.0664   Epoch: 3   Global Step: 46040   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:42:39,770-Speed 3316.87 samples/sec   Loss 7.5191   LearningRate 0.0664   Epoch: 3   Global Step: 46050   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:42:42,830-Speed 3347.61 samples/sec   Loss 7.4204   LearningRate 0.0664   Epoch: 3   Global Step: 46060   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:42:45,868-Speed 3372.23 samples/sec   Loss 7.5014   LearningRate 0.0663   Epoch: 3   Global Step: 46070   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:42:48,916-Speed 3360.00 samples/sec   Loss 7.6289   LearningRate 0.0663   Epoch: 3   Global Step: 46080   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:42:51,967-Speed 3358.13 samples/sec   Loss 7.3887   LearningRate 0.0663   Epoch: 3   Global Step: 46090   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:42:55,057-Speed 3314.87 samples/sec   Loss 7.4761   LearningRate 0.0663   Epoch: 3   Global Step: 46100   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:42:58,061-Speed 3409.33 samples/sec   Loss 7.5146   LearningRate 0.0663   Epoch: 3   Global Step: 46110   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:43:01,121-Speed 3347.84 samples/sec   Loss 7.6252   LearningRate 0.0663   Epoch: 3   Global Step: 46120   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:43:04,183-Speed 3345.22 samples/sec   Loss 7.5904   LearningRate 0.0663   Epoch: 3   Global Step: 46130   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:43:07,250-Speed 3340.75 samples/sec   Loss 7.5459   LearningRate 0.0663   Epoch: 3   Global Step: 46140   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:43:10,254-Speed 3409.43 samples/sec   Loss 7.4984   LearningRate 0.0663   Epoch: 3   Global Step: 46150   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:43:13,282-Speed 3382.92 samples/sec   Loss 7.4686   LearningRate 0.0663   Epoch: 3   Global Step: 46160   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:43:16,366-Speed 3321.56 samples/sec   Loss 7.5442   LearningRate 0.0663   Epoch: 3   Global Step: 46170   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:43:19,424-Speed 3349.20 samples/sec   Loss 7.4716   LearningRate 0.0663   Epoch: 3   Global Step: 46180   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:43:22,501-Speed 3329.13 samples/sec   Loss 7.5313   LearningRate 0.0663   Epoch: 3   Global Step: 46190   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:43:25,568-Speed 3340.20 samples/sec   Loss 7.5500   LearningRate 0.0663   Epoch: 3   Global Step: 46200   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:43:28,615-Speed 3361.81 samples/sec   Loss 7.7120   LearningRate 0.0663   Epoch: 3   Global Step: 46210   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:43:31,692-Speed 3328.66 samples/sec   Loss 7.5266   LearningRate 0.0663   Epoch: 3   Global Step: 46220   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:43:34,740-Speed 3360.60 samples/sec   Loss 7.6095   LearningRate 0.0662   Epoch: 3   Global Step: 46230   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:43:37,759-Speed 3393.18 samples/sec   Loss 7.5777   LearningRate 0.0662   Epoch: 3   Global Step: 46240   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:43:40,834-Speed 3330.44 samples/sec   Loss 7.5033   LearningRate 0.0662   Epoch: 3   Global Step: 46250   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:43:43,910-Speed 3330.25 samples/sec   Loss 7.3553   LearningRate 0.0662   Epoch: 3   Global Step: 46260   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:43:46,914-Speed 3410.37 samples/sec   Loss 7.3906   LearningRate 0.0662   Epoch: 3   Global Step: 46270   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:43:49,971-Speed 3351.32 samples/sec   Loss 7.5657   LearningRate 0.0662   Epoch: 3   Global Step: 46280   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:43:53,017-Speed 3362.58 samples/sec   Loss 7.5373   LearningRate 0.0662   Epoch: 3   Global Step: 46290   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:43:56,050-Speed 3376.82 samples/sec   Loss 7.4441   LearningRate 0.0662   Epoch: 3   Global Step: 46300   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:43:59,134-Speed 3322.03 samples/sec   Loss 7.4851   LearningRate 0.0662   Epoch: 3   Global Step: 46310   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:44:02,190-Speed 3351.25 samples/sec   Loss 7.5670   LearningRate 0.0662   Epoch: 3   Global Step: 46320   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:44:05,263-Speed 3334.10 samples/sec   Loss 7.5673   LearningRate 0.0662   Epoch: 3   Global Step: 46330   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:44:08,284-Speed 3390.36 samples/sec   Loss 7.4080   LearningRate 0.0662   Epoch: 3   Global Step: 46340   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:44:11,341-Speed 3350.18 samples/sec   Loss 7.4939   LearningRate 0.0662   Epoch: 3   Global Step: 46350   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:44:14,440-Speed 3305.61 samples/sec   Loss 7.5126   LearningRate 0.0662   Epoch: 3   Global Step: 46360   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:44:17,472-Speed 3379.35 samples/sec   Loss 7.5424   LearningRate 0.0662   Epoch: 3   Global Step: 46370   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:44:20,496-Speed 3386.65 samples/sec   Loss 7.5686   LearningRate 0.0661   Epoch: 3   Global Step: 46380   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:44:23,525-Speed 3382.17 samples/sec   Loss 7.6213   LearningRate 0.0661   Epoch: 3   Global Step: 46390   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:44:26,635-Speed 3293.15 samples/sec   Loss 7.4619   LearningRate 0.0661   Epoch: 3   Global Step: 46400   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:44:29,653-Speed 3394.29 samples/sec   Loss 7.4423   LearningRate 0.0661   Epoch: 3   Global Step: 46410   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:44:32,723-Speed 3336.71 samples/sec   Loss 7.4905   LearningRate 0.0661   Epoch: 3   Global Step: 46420   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:44:35,777-Speed 3353.65 samples/sec   Loss 7.4911   LearningRate 0.0661   Epoch: 3   Global Step: 46430   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:44:38,824-Speed 3361.61 samples/sec   Loss 7.5681   LearningRate 0.0661   Epoch: 3   Global Step: 46440   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:44:41,904-Speed 3325.69 samples/sec   Loss 7.5733   LearningRate 0.0661   Epoch: 3   Global Step: 46450   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:44:44,929-Speed 3386.00 samples/sec   Loss 7.5489   LearningRate 0.0661   Epoch: 3   Global Step: 46460   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:44:47,977-Speed 3361.27 samples/sec   Loss 7.4543   LearningRate 0.0661   Epoch: 3   Global Step: 46470   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:44:51,037-Speed 3347.03 samples/sec   Loss 7.4479   LearningRate 0.0661   Epoch: 3   Global Step: 46480   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:44:54,070-Speed 3377.70 samples/sec   Loss 7.4548   LearningRate 0.0661   Epoch: 3   Global Step: 46490   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:44:57,094-Speed 3386.55 samples/sec   Loss 7.4717   LearningRate 0.0661   Epoch: 3   Global Step: 46500   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:45:00,159-Speed 3342.09 samples/sec   Loss 7.4274   LearningRate 0.0661   Epoch: 3   Global Step: 46510   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:45:03,189-Speed 3380.83 samples/sec   Loss 7.6280   LearningRate 0.0661   Epoch: 3   Global Step: 46520   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:45:06,226-Speed 3372.86 samples/sec   Loss 7.3762   LearningRate 0.0660   Epoch: 3   Global Step: 46530   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:45:09,246-Speed 3392.14 samples/sec   Loss 7.5075   LearningRate 0.0660   Epoch: 3   Global Step: 46540   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:45:12,299-Speed 3354.58 samples/sec   Loss 7.4923   LearningRate 0.0660   Epoch: 3   Global Step: 46550   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:45:15,329-Speed 3381.20 samples/sec   Loss 7.3992   LearningRate 0.0660   Epoch: 3   Global Step: 46560   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:45:18,343-Speed 3398.06 samples/sec   Loss 7.6081   LearningRate 0.0660   Epoch: 3   Global Step: 46570   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:45:21,383-Speed 3369.12 samples/sec   Loss 7.4350   LearningRate 0.0660   Epoch: 3   Global Step: 46580   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:45:24,411-Speed 3383.17 samples/sec   Loss 7.4141   LearningRate 0.0660   Epoch: 3   Global Step: 46590   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:45:27,495-Speed 3321.06 samples/sec   Loss 7.5602   LearningRate 0.0660   Epoch: 3   Global Step: 46600   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:45:30,532-Speed 3372.87 samples/sec   Loss 7.4996   LearningRate 0.0660   Epoch: 3   Global Step: 46610   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:45:33,544-Speed 3400.54 samples/sec   Loss 7.5050   LearningRate 0.0660   Epoch: 3   Global Step: 46620   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:45:36,622-Speed 3328.67 samples/sec   Loss 7.3801   LearningRate 0.0660   Epoch: 3   Global Step: 46630   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:45:39,611-Speed 3426.83 samples/sec   Loss 7.5661   LearningRate 0.0660   Epoch: 3   Global Step: 46640   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:45:42,658-Speed 3362.38 samples/sec   Loss 7.4166   LearningRate 0.0660   Epoch: 3   Global Step: 46650   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:45:45,678-Speed 3391.97 samples/sec   Loss 7.4605   LearningRate 0.0660   Epoch: 3   Global Step: 46660   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:45:48,686-Speed 3404.99 samples/sec   Loss 7.4395   LearningRate 0.0660   Epoch: 3   Global Step: 46670   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:45:51,703-Speed 3395.71 samples/sec   Loss 7.5638   LearningRate 0.0659   Epoch: 3   Global Step: 46680   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:45:54,770-Speed 3339.40 samples/sec   Loss 7.4883   LearningRate 0.0659   Epoch: 3   Global Step: 46690   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:45:57,778-Speed 3405.99 samples/sec   Loss 7.5005   LearningRate 0.0659   Epoch: 3   Global Step: 46700   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:46:00,864-Speed 3319.29 samples/sec   Loss 7.6275   LearningRate 0.0659   Epoch: 3   Global Step: 46710   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:46:03,917-Speed 3354.73 samples/sec   Loss 7.4946   LearningRate 0.0659   Epoch: 3   Global Step: 46720   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:46:06,969-Speed 3356.46 samples/sec   Loss 7.4220   LearningRate 0.0659   Epoch: 3   Global Step: 46730   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:46:09,961-Speed 3424.32 samples/sec   Loss 7.5503   LearningRate 0.0659   Epoch: 3   Global Step: 46740   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:46:12,983-Speed 3388.89 samples/sec   Loss 7.4860   LearningRate 0.0659   Epoch: 3   Global Step: 46750   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:46:16,043-Speed 3347.83 samples/sec   Loss 7.4632   LearningRate 0.0659   Epoch: 3   Global Step: 46760   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:46:19,050-Speed 3406.94 samples/sec   Loss 7.4792   LearningRate 0.0659   Epoch: 3   Global Step: 46770   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:46:22,053-Speed 3410.84 samples/sec   Loss 7.5485   LearningRate 0.0659   Epoch: 3   Global Step: 46780   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:46:25,087-Speed 3375.88 samples/sec   Loss 7.5574   LearningRate 0.0659   Epoch: 3   Global Step: 46790   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:46:28,193-Speed 3298.02 samples/sec   Loss 7.4669   LearningRate 0.0659   Epoch: 3   Global Step: 46800   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:46:31,326-Speed 3269.26 samples/sec   Loss 7.5399   LearningRate 0.0659   Epoch: 3   Global Step: 46810   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:46:34,351-Speed 3386.47 samples/sec   Loss 7.4726   LearningRate 0.0659   Epoch: 3   Global Step: 46820   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:46:37,404-Speed 3354.63 samples/sec   Loss 7.5804   LearningRate 0.0659   Epoch: 3   Global Step: 46830   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:46:40,406-Speed 3412.70 samples/sec   Loss 7.3402   LearningRate 0.0658   Epoch: 3   Global Step: 46840   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:46:43,441-Speed 3374.97 samples/sec   Loss 7.4205   LearningRate 0.0658   Epoch: 3   Global Step: 46850   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:46:46,438-Speed 3418.06 samples/sec   Loss 7.5271   LearningRate 0.0658   Epoch: 3   Global Step: 46860   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:46:49,435-Speed 3417.43 samples/sec   Loss 7.5574   LearningRate 0.0658   Epoch: 3   Global Step: 46870   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:46:52,493-Speed 3350.08 samples/sec   Loss 7.5300   LearningRate 0.0658   Epoch: 3   Global Step: 46880   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:46:55,555-Speed 3346.04 samples/sec   Loss 7.4945   LearningRate 0.0658   Epoch: 3   Global Step: 46890   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:46:58,587-Speed 3378.11 samples/sec   Loss 7.4286   LearningRate 0.0658   Epoch: 3   Global Step: 46900   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:47:01,608-Speed 3390.59 samples/sec   Loss 7.5402   LearningRate 0.0658   Epoch: 3   Global Step: 46910   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:47:04,649-Speed 3368.60 samples/sec   Loss 7.6316   LearningRate 0.0658   Epoch: 3   Global Step: 46920   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:47:07,663-Speed 3398.98 samples/sec   Loss 7.5075   LearningRate 0.0658   Epoch: 3   Global Step: 46930   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:47:10,701-Speed 3371.45 samples/sec   Loss 7.5510   LearningRate 0.0658   Epoch: 3   Global Step: 46940   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:47:13,718-Speed 3395.74 samples/sec   Loss 7.6206   LearningRate 0.0658   Epoch: 3   Global Step: 46950   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:47:16,721-Speed 3410.61 samples/sec   Loss 7.5706   LearningRate 0.0658   Epoch: 3   Global Step: 46960   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:47:19,725-Speed 3409.33 samples/sec   Loss 7.3739   LearningRate 0.0658   Epoch: 3   Global Step: 46970   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:47:22,751-Speed 3385.88 samples/sec   Loss 7.5500   LearningRate 0.0658   Epoch: 3   Global Step: 46980   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:47:25,807-Speed 3351.82 samples/sec   Loss 7.4249   LearningRate 0.0657   Epoch: 3   Global Step: 46990   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:47:28,815-Speed 3405.60 samples/sec   Loss 7.4360   LearningRate 0.0657   Epoch: 3   Global Step: 47000   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:47:31,920-Speed 3299.77 samples/sec   Loss 7.6589   LearningRate 0.0657   Epoch: 3   Global Step: 47010   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:47:34,949-Speed 3382.03 samples/sec   Loss 7.5777   LearningRate 0.0657   Epoch: 3   Global Step: 47020   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:47:38,017-Speed 3338.29 samples/sec   Loss 7.3925   LearningRate 0.0657   Epoch: 3   Global Step: 47030   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:47:41,055-Speed 3372.43 samples/sec   Loss 7.4899   LearningRate 0.0657   Epoch: 3   Global Step: 47040   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:47:44,098-Speed 3365.47 samples/sec   Loss 7.4523   LearningRate 0.0657   Epoch: 3   Global Step: 47050   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:47:47,143-Speed 3363.70 samples/sec   Loss 7.4796   LearningRate 0.0657   Epoch: 3   Global Step: 47060   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:47:50,190-Speed 3361.95 samples/sec   Loss 7.5208   LearningRate 0.0657   Epoch: 3   Global Step: 47070   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:47:53,228-Speed 3372.28 samples/sec   Loss 7.3690   LearningRate 0.0657   Epoch: 3   Global Step: 47080   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:47:56,235-Speed 3405.62 samples/sec   Loss 7.5296   LearningRate 0.0657   Epoch: 3   Global Step: 47090   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:47:59,258-Speed 3389.33 samples/sec   Loss 7.6264   LearningRate 0.0657   Epoch: 3   Global Step: 47100   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:48:02,353-Speed 3309.26 samples/sec   Loss 7.4891   LearningRate 0.0657   Epoch: 3   Global Step: 47110   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:48:05,347-Speed 3421.83 samples/sec   Loss 7.3957   LearningRate 0.0657   Epoch: 3   Global Step: 47120   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:48:08,389-Speed 3366.57 samples/sec   Loss 7.4681   LearningRate 0.0657   Epoch: 3   Global Step: 47130   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:48:11,438-Speed 3360.49 samples/sec   Loss 7.5409   LearningRate 0.0656   Epoch: 3   Global Step: 47140   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:48:14,514-Speed 3329.64 samples/sec   Loss 7.4363   LearningRate 0.0656   Epoch: 3   Global Step: 47150   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:48:17,619-Speed 3298.92 samples/sec   Loss 7.4564   LearningRate 0.0656   Epoch: 3   Global Step: 47160   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:48:20,670-Speed 3357.91 samples/sec   Loss 7.5715   LearningRate 0.0656   Epoch: 3   Global Step: 47170   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:48:23,765-Speed 3309.52 samples/sec   Loss 7.5233   LearningRate 0.0656   Epoch: 3   Global Step: 47180   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:48:26,824-Speed 3348.77 samples/sec   Loss 7.5094   LearningRate 0.0656   Epoch: 3   Global Step: 47190   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:48:29,918-Speed 3310.94 samples/sec   Loss 7.4545   LearningRate 0.0656   Epoch: 3   Global Step: 47200   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:48:32,934-Speed 3396.27 samples/sec   Loss 7.3134   LearningRate 0.0656   Epoch: 3   Global Step: 47210   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:48:35,987-Speed 3355.26 samples/sec   Loss 7.4936   LearningRate 0.0656   Epoch: 3   Global Step: 47220   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:48:39,055-Speed 3338.41 samples/sec   Loss 7.4490   LearningRate 0.0656   Epoch: 3   Global Step: 47230   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:48:42,173-Speed 3285.25 samples/sec   Loss 7.5536   LearningRate 0.0656   Epoch: 3   Global Step: 47240   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:48:45,202-Speed 3382.23 samples/sec   Loss 7.5702   LearningRate 0.0656   Epoch: 3   Global Step: 47250   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:48:48,273-Speed 3335.68 samples/sec   Loss 7.4725   LearningRate 0.0656   Epoch: 3   Global Step: 47260   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:48:51,443-Speed 3230.90 samples/sec   Loss 7.5166   LearningRate 0.0656   Epoch: 3   Global Step: 47270   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:48:54,454-Speed 3402.03 samples/sec   Loss 7.4793   LearningRate 0.0656   Epoch: 3   Global Step: 47280   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:48:57,456-Speed 3411.97 samples/sec   Loss 7.4064   LearningRate 0.0656   Epoch: 3   Global Step: 47290   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:49:00,549-Speed 3312.77 samples/sec   Loss 7.5076   LearningRate 0.0655   Epoch: 3   Global Step: 47300   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:49:03,556-Speed 3406.26 samples/sec   Loss 7.4258   LearningRate 0.0655   Epoch: 3   Global Step: 47310   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:49:06,584-Speed 3383.16 samples/sec   Loss 7.4374   LearningRate 0.0655   Epoch: 3   Global Step: 47320   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:49:09,639-Speed 3353.30 samples/sec   Loss 7.4203   LearningRate 0.0655   Epoch: 3   Global Step: 47330   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:49:12,750-Speed 3291.86 samples/sec   Loss 7.3995   LearningRate 0.0655   Epoch: 3   Global Step: 47340   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:49:15,835-Speed 3321.29 samples/sec   Loss 7.3837   LearningRate 0.0655   Epoch: 3   Global Step: 47350   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:49:18,929-Speed 3310.54 samples/sec   Loss 7.5267   LearningRate 0.0655   Epoch: 3   Global Step: 47360   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:49:21,938-Speed 3403.77 samples/sec   Loss 7.5471   LearningRate 0.0655   Epoch: 3   Global Step: 47370   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:49:24,943-Speed 3409.33 samples/sec   Loss 7.3864   LearningRate 0.0655   Epoch: 3   Global Step: 47380   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:49:27,959-Speed 3396.00 samples/sec   Loss 7.5132   LearningRate 0.0655   Epoch: 3   Global Step: 47390   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:49:31,011-Speed 3356.85 samples/sec   Loss 7.4237   LearningRate 0.0655   Epoch: 3   Global Step: 47400   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:49:34,037-Speed 3384.08 samples/sec   Loss 7.5536   LearningRate 0.0655   Epoch: 3   Global Step: 47410   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:49:37,053-Speed 3397.32 samples/sec   Loss 7.5215   LearningRate 0.0655   Epoch: 3   Global Step: 47420   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:49:40,105-Speed 3355.99 samples/sec   Loss 7.4887   LearningRate 0.0655   Epoch: 3   Global Step: 47430   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:49:43,121-Speed 3396.52 samples/sec   Loss 7.4057   LearningRate 0.0655   Epoch: 3   Global Step: 47440   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:49:46,144-Speed 3387.75 samples/sec   Loss 7.4331   LearningRate 0.0654   Epoch: 3   Global Step: 47450   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:49:49,229-Speed 3320.55 samples/sec   Loss 7.3925   LearningRate 0.0654   Epoch: 3   Global Step: 47460   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:49:52,336-Speed 3297.31 samples/sec   Loss 7.5261   LearningRate 0.0654   Epoch: 3   Global Step: 47470   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:49:55,383-Speed 3361.52 samples/sec   Loss 7.4637   LearningRate 0.0654   Epoch: 3   Global Step: 47480   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:49:58,409-Speed 3385.62 samples/sec   Loss 7.5226   LearningRate 0.0654   Epoch: 3   Global Step: 47490   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:50:01,463-Speed 3353.61 samples/sec   Loss 7.4592   LearningRate 0.0654   Epoch: 3   Global Step: 47500   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:50:04,511-Speed 3360.77 samples/sec   Loss 7.3833   LearningRate 0.0654   Epoch: 3   Global Step: 47510   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:50:07,587-Speed 3330.38 samples/sec   Loss 7.5192   LearningRate 0.0654   Epoch: 3   Global Step: 47520   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:50:10,674-Speed 3317.96 samples/sec   Loss 7.4152   LearningRate 0.0654   Epoch: 3   Global Step: 47530   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:50:13,728-Speed 3353.86 samples/sec   Loss 7.4165   LearningRate 0.0654   Epoch: 3   Global Step: 47540   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:50:16,834-Speed 3298.14 samples/sec   Loss 7.4353   LearningRate 0.0654   Epoch: 3   Global Step: 47550   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:50:19,850-Speed 3395.78 samples/sec   Loss 7.4023   LearningRate 0.0654   Epoch: 3   Global Step: 47560   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:50:22,889-Speed 3371.10 samples/sec   Loss 7.4661   LearningRate 0.0654   Epoch: 3   Global Step: 47570   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:50:25,952-Speed 3343.69 samples/sec   Loss 7.5706   LearningRate 0.0654   Epoch: 3   Global Step: 47580   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:50:28,973-Speed 3391.22 samples/sec   Loss 7.5270   LearningRate 0.0654   Epoch: 3   Global Step: 47590   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:50:32,804-Speed 2673.50 samples/sec   Loss 7.5108   LearningRate 0.0653   Epoch: 3   Global Step: 47600   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:50:35,849-Speed 3364.05 samples/sec   Loss 7.4939   LearningRate 0.0653   Epoch: 3   Global Step: 47610   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:50:38,904-Speed 3351.76 samples/sec   Loss 7.4908   LearningRate 0.0653   Epoch: 3   Global Step: 47620   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:50:42,006-Speed 3302.54 samples/sec   Loss 7.4779   LearningRate 0.0653   Epoch: 3   Global Step: 47630   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:50:45,015-Speed 3403.84 samples/sec   Loss 7.4330   LearningRate 0.0653   Epoch: 3   Global Step: 47640   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:50:48,086-Speed 3336.15 samples/sec   Loss 7.4875   LearningRate 0.0653   Epoch: 3   Global Step: 47650   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:50:51,172-Speed 3318.78 samples/sec   Loss 7.3955   LearningRate 0.0653   Epoch: 3   Global Step: 47660   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:50:54,210-Speed 3372.66 samples/sec   Loss 7.4335   LearningRate 0.0653   Epoch: 3   Global Step: 47670   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:50:57,232-Speed 3388.69 samples/sec   Loss 7.3667   LearningRate 0.0653   Epoch: 3   Global Step: 47680   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:51:00,309-Speed 3329.36 samples/sec   Loss 7.4004   LearningRate 0.0653   Epoch: 3   Global Step: 47690   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:51:03,345-Speed 3374.76 samples/sec   Loss 7.5087   LearningRate 0.0653   Epoch: 3   Global Step: 47700   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:51:06,364-Speed 3393.09 samples/sec   Loss 7.5010   LearningRate 0.0653   Epoch: 3   Global Step: 47710   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:51:09,369-Speed 3408.24 samples/sec   Loss 7.2706   LearningRate 0.0653   Epoch: 3   Global Step: 47720   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:51:12,407-Speed 3371.52 samples/sec   Loss 7.4533   LearningRate 0.0653   Epoch: 3   Global Step: 47730   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:51:15,540-Speed 3269.15 samples/sec   Loss 7.4379   LearningRate 0.0653   Epoch: 3   Global Step: 47740   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:51:18,644-Speed 3300.59 samples/sec   Loss 7.2543   LearningRate 0.0653   Epoch: 3   Global Step: 47750   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:51:21,678-Speed 3375.94 samples/sec   Loss 7.3989   LearningRate 0.0652   Epoch: 3   Global Step: 47760   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:51:24,715-Speed 3372.49 samples/sec   Loss 7.3722   LearningRate 0.0652   Epoch: 3   Global Step: 47770   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:51:27,760-Speed 3364.52 samples/sec   Loss 7.3959   LearningRate 0.0652   Epoch: 3   Global Step: 47780   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:51:30,812-Speed 3355.44 samples/sec   Loss 7.4864   LearningRate 0.0652   Epoch: 3   Global Step: 47790   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:51:33,849-Speed 3373.42 samples/sec   Loss 7.3900   LearningRate 0.0652   Epoch: 3   Global Step: 47800   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:51:36,927-Speed 3327.30 samples/sec   Loss 7.5271   LearningRate 0.0652   Epoch: 3   Global Step: 47810   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:51:40,010-Speed 3322.57 samples/sec   Loss 7.4526   LearningRate 0.0652   Epoch: 3   Global Step: 47820   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:51:43,087-Speed 3329.20 samples/sec   Loss 7.4784   LearningRate 0.0652   Epoch: 3   Global Step: 47830   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:51:46,118-Speed 3379.26 samples/sec   Loss 7.3459   LearningRate 0.0652   Epoch: 3   Global Step: 47840   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:51:49,191-Speed 3333.62 samples/sec   Loss 7.5201   LearningRate 0.0652   Epoch: 3   Global Step: 47850   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:51:52,242-Speed 3357.06 samples/sec   Loss 7.4380   LearningRate 0.0652   Epoch: 3   Global Step: 47860   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:51:55,346-Speed 3300.19 samples/sec   Loss 7.4205   LearningRate 0.0652   Epoch: 3   Global Step: 47870   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:51:58,392-Speed 3363.29 samples/sec   Loss 7.4874   LearningRate 0.0652   Epoch: 3   Global Step: 47880   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:52:01,489-Speed 3306.98 samples/sec   Loss 7.3989   LearningRate 0.0652   Epoch: 3   Global Step: 47890   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:52:04,570-Speed 3325.72 samples/sec   Loss 7.2857   LearningRate 0.0652   Epoch: 3   Global Step: 47900   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:52:07,624-Speed 3353.36 samples/sec   Loss 7.4489   LearningRate 0.0651   Epoch: 3   Global Step: 47910   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:52:10,681-Speed 3350.97 samples/sec   Loss 7.4036   LearningRate 0.0651   Epoch: 3   Global Step: 47920   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:52:13,729-Speed 3360.60 samples/sec   Loss 7.3616   LearningRate 0.0651   Epoch: 3   Global Step: 47930   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:52:16,856-Speed 3276.68 samples/sec   Loss 7.4354   LearningRate 0.0651   Epoch: 3   Global Step: 47940   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:52:19,874-Speed 3393.25 samples/sec   Loss 7.3643   LearningRate 0.0651   Epoch: 3   Global Step: 47950   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:52:22,908-Speed 3377.18 samples/sec   Loss 7.5316   LearningRate 0.0651   Epoch: 3   Global Step: 47960   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:52:25,932-Speed 3386.26 samples/sec   Loss 7.5232   LearningRate 0.0651   Epoch: 3   Global Step: 47970   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:52:28,921-Speed 3427.56 samples/sec   Loss 7.4714   LearningRate 0.0651   Epoch: 3   Global Step: 47980   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:52:32,057-Speed 3266.55 samples/sec   Loss 7.4952   LearningRate 0.0651   Epoch: 3   Global Step: 47990   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:52:35,132-Speed 3330.93 samples/sec   Loss 7.4841   LearningRate 0.0651   Epoch: 3   Global Step: 48000   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:52:38,220-Speed 3316.59 samples/sec   Loss 7.3958   LearningRate 0.0651   Epoch: 3   Global Step: 48010   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:52:41,276-Speed 3351.53 samples/sec   Loss 7.4213   LearningRate 0.0651   Epoch: 3   Global Step: 48020   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:52:44,335-Speed 3349.77 samples/sec   Loss 7.4138   LearningRate 0.0651   Epoch: 3   Global Step: 48030   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:52:47,347-Speed 3400.15 samples/sec   Loss 7.3709   LearningRate 0.0651   Epoch: 3   Global Step: 48040   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:52:50,369-Speed 3390.60 samples/sec   Loss 7.4594   LearningRate 0.0651   Epoch: 3   Global Step: 48050   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:52:53,378-Speed 3403.99 samples/sec   Loss 7.4280   LearningRate 0.0651   Epoch: 3   Global Step: 48060   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:52:56,443-Speed 3342.04 samples/sec   Loss 7.4830   LearningRate 0.0650   Epoch: 3   Global Step: 48070   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:52:59,468-Speed 3386.62 samples/sec   Loss 7.5154   LearningRate 0.0650   Epoch: 3   Global Step: 48080   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:53:02,518-Speed 3357.52 samples/sec   Loss 7.4415   LearningRate 0.0650   Epoch: 3   Global Step: 48090   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:53:05,615-Speed 3308.11 samples/sec   Loss 7.4808   LearningRate 0.0650   Epoch: 3   Global Step: 48100   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:53:08,614-Speed 3415.29 samples/sec   Loss 7.4165   LearningRate 0.0650   Epoch: 3   Global Step: 48110   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:53:11,631-Speed 3395.08 samples/sec   Loss 7.5033   LearningRate 0.0650   Epoch: 3   Global Step: 48120   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:53:14,650-Speed 3393.14 samples/sec   Loss 7.5491   LearningRate 0.0650   Epoch: 3   Global Step: 48130   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:53:17,669-Speed 3392.67 samples/sec   Loss 7.3772   LearningRate 0.0650   Epoch: 3   Global Step: 48140   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:53:20,716-Speed 3362.01 samples/sec   Loss 7.4575   LearningRate 0.0650   Epoch: 3   Global Step: 48150   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:53:23,726-Speed 3402.99 samples/sec   Loss 7.4596   LearningRate 0.0650   Epoch: 3   Global Step: 48160   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:53:26,767-Speed 3368.82 samples/sec   Loss 7.4429   LearningRate 0.0650   Epoch: 3   Global Step: 48170   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:53:29,794-Speed 3383.56 samples/sec   Loss 7.4392   LearningRate 0.0650   Epoch: 3   Global Step: 48180   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:53:32,836-Speed 3367.52 samples/sec   Loss 7.4037   LearningRate 0.0650   Epoch: 3   Global Step: 48190   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:53:35,924-Speed 3317.45 samples/sec   Loss 7.3755   LearningRate 0.0650   Epoch: 3   Global Step: 48200   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:53:38,949-Speed 3386.06 samples/sec   Loss 7.2926   LearningRate 0.0650   Epoch: 3   Global Step: 48210   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:53:42,050-Speed 3302.65 samples/sec   Loss 7.2513   LearningRate 0.0649   Epoch: 3   Global Step: 48220   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:53:45,059-Speed 3404.01 samples/sec   Loss 7.3887   LearningRate 0.0649   Epoch: 3   Global Step: 48230   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:53:48,058-Speed 3416.10 samples/sec   Loss 7.3108   LearningRate 0.0649   Epoch: 3   Global Step: 48240   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:53:51,134-Speed 3329.64 samples/sec   Loss 7.3798   LearningRate 0.0649   Epoch: 3   Global Step: 48250   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:53:54,168-Speed 3376.42 samples/sec   Loss 7.5587   LearningRate 0.0649   Epoch: 3   Global Step: 48260   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 05:53:57,152-Speed 3433.73 samples/sec   Loss 7.5372   LearningRate 0.0649   Epoch: 3   Global Step: 48270   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:54:00,194-Speed 3366.99 samples/sec   Loss 7.3866   LearningRate 0.0649   Epoch: 3   Global Step: 48280   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:54:03,247-Speed 3355.50 samples/sec   Loss 7.4244   LearningRate 0.0649   Epoch: 3   Global Step: 48290   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:54:06,271-Speed 3386.65 samples/sec   Loss 7.4535   LearningRate 0.0649   Epoch: 3   Global Step: 48300   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:54:09,264-Speed 3423.28 samples/sec   Loss 7.3902   LearningRate 0.0649   Epoch: 3   Global Step: 48310   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:54:12,283-Speed 3392.78 samples/sec   Loss 7.2679   LearningRate 0.0649   Epoch: 3   Global Step: 48320   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:54:15,339-Speed 3352.18 samples/sec   Loss 7.4595   LearningRate 0.0649   Epoch: 3   Global Step: 48330   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:54:18,414-Speed 3330.64 samples/sec   Loss 7.3711   LearningRate 0.0649   Epoch: 3   Global Step: 48340   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:54:21,425-Speed 3402.55 samples/sec   Loss 7.5681   LearningRate 0.0649   Epoch: 3   Global Step: 48350   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:54:24,471-Speed 3362.63 samples/sec   Loss 7.4723   LearningRate 0.0649   Epoch: 3   Global Step: 48360   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:54:27,592-Speed 3281.73 samples/sec   Loss 7.3366   LearningRate 0.0648   Epoch: 3   Global Step: 48370   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:54:30,673-Speed 3324.89 samples/sec   Loss 7.3838   LearningRate 0.0648   Epoch: 3   Global Step: 48380   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:54:33,750-Speed 3328.35 samples/sec   Loss 7.4549   LearningRate 0.0648   Epoch: 3   Global Step: 48390   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:54:36,793-Speed 3366.46 samples/sec   Loss 7.3573   LearningRate 0.0648   Epoch: 3   Global Step: 48400   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:54:39,901-Speed 3296.08 samples/sec   Loss 7.5862   LearningRate 0.0648   Epoch: 3   Global Step: 48410   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:54:42,981-Speed 3325.77 samples/sec   Loss 7.4326   LearningRate 0.0648   Epoch: 3   Global Step: 48420   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:54:46,017-Speed 3373.70 samples/sec   Loss 7.3540   LearningRate 0.0648   Epoch: 3   Global Step: 48430   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:54:49,046-Speed 3382.38 samples/sec   Loss 7.4169   LearningRate 0.0648   Epoch: 3   Global Step: 48440   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:54:52,087-Speed 3367.93 samples/sec   Loss 7.5426   LearningRate 0.0648   Epoch: 3   Global Step: 48450   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:54:55,178-Speed 3314.02 samples/sec   Loss 7.3504   LearningRate 0.0648   Epoch: 3   Global Step: 48460   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:54:58,186-Speed 3405.28 samples/sec   Loss 7.4606   LearningRate 0.0648   Epoch: 3   Global Step: 48470   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:55:01,239-Speed 3354.53 samples/sec   Loss 7.4635   LearningRate 0.0648   Epoch: 3   Global Step: 48480   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:55:04,350-Speed 3291.99 samples/sec   Loss 7.4591   LearningRate 0.0648   Epoch: 3   Global Step: 48490   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:55:07,424-Speed 3332.86 samples/sec   Loss 7.5023   LearningRate 0.0648   Epoch: 3   Global Step: 48500   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:55:10,431-Speed 3406.01 samples/sec   Loss 7.3890   LearningRate 0.0648   Epoch: 3   Global Step: 48510   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:55:13,481-Speed 3359.23 samples/sec   Loss 7.4511   LearningRate 0.0648   Epoch: 3   Global Step: 48520   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:55:16,537-Speed 3351.55 samples/sec   Loss 7.4751   LearningRate 0.0647   Epoch: 3   Global Step: 48530   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:55:19,568-Speed 3379.23 samples/sec   Loss 7.3730   LearningRate 0.0647   Epoch: 3   Global Step: 48540   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:55:22,589-Speed 3391.16 samples/sec   Loss 7.4218   LearningRate 0.0647   Epoch: 3   Global Step: 48550   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:55:25,593-Speed 3410.38 samples/sec   Loss 7.3004   LearningRate 0.0647   Epoch: 3   Global Step: 48560   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:55:28,677-Speed 3320.56 samples/sec   Loss 7.3180   LearningRate 0.0647   Epoch: 3   Global Step: 48570   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:55:31,713-Speed 3374.40 samples/sec   Loss 7.3011   LearningRate 0.0647   Epoch: 3   Global Step: 48580   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:55:34,795-Speed 3323.05 samples/sec   Loss 7.4239   LearningRate 0.0647   Epoch: 3   Global Step: 48590   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:55:37,823-Speed 3382.73 samples/sec   Loss 7.4916   LearningRate 0.0647   Epoch: 3   Global Step: 48600   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:55:40,850-Speed 3384.75 samples/sec   Loss 7.3122   LearningRate 0.0647   Epoch: 3   Global Step: 48610   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:55:43,850-Speed 3414.60 samples/sec   Loss 7.3590   LearningRate 0.0647   Epoch: 3   Global Step: 48620   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:55:46,885-Speed 3374.68 samples/sec   Loss 7.4737   LearningRate 0.0647   Epoch: 3   Global Step: 48630   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:55:49,956-Speed 3335.65 samples/sec   Loss 7.4382   LearningRate 0.0647   Epoch: 3   Global Step: 48640   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:55:52,973-Speed 3395.21 samples/sec   Loss 7.4171   LearningRate 0.0647   Epoch: 3   Global Step: 48650   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:55:55,974-Speed 3413.23 samples/sec   Loss 7.4903   LearningRate 0.0647   Epoch: 3   Global Step: 48660   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:55:58,978-Speed 3410.24 samples/sec   Loss 7.4166   LearningRate 0.0647   Epoch: 3   Global Step: 48670   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:56:02,027-Speed 3359.47 samples/sec   Loss 7.3477   LearningRate 0.0646   Epoch: 3   Global Step: 48680   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:56:05,109-Speed 3322.92 samples/sec   Loss 7.3723   LearningRate 0.0646   Epoch: 3   Global Step: 48690   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:56:08,113-Speed 3410.62 samples/sec   Loss 7.4307   LearningRate 0.0646   Epoch: 3   Global Step: 48700   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:56:11,141-Speed 3382.47 samples/sec   Loss 7.3845   LearningRate 0.0646   Epoch: 3   Global Step: 48710   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:56:14,151-Speed 3402.87 samples/sec   Loss 7.4202   LearningRate 0.0646   Epoch: 3   Global Step: 48720   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:56:17,266-Speed 3288.99 samples/sec   Loss 7.4201   LearningRate 0.0646   Epoch: 3   Global Step: 48730   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:56:20,263-Speed 3418.41 samples/sec   Loss 7.4225   LearningRate 0.0646   Epoch: 3   Global Step: 48740   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:56:23,278-Speed 3396.69 samples/sec   Loss 7.2835   LearningRate 0.0646   Epoch: 3   Global Step: 48750   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:56:26,325-Speed 3362.61 samples/sec   Loss 7.3988   LearningRate 0.0646   Epoch: 3   Global Step: 48760   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:56:29,335-Speed 3402.87 samples/sec   Loss 7.5096   LearningRate 0.0646   Epoch: 3   Global Step: 48770   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:56:32,356-Speed 3391.01 samples/sec   Loss 7.2370   LearningRate 0.0646   Epoch: 3   Global Step: 48780   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:56:35,420-Speed 3342.78 samples/sec   Loss 7.4892   LearningRate 0.0646   Epoch: 3   Global Step: 48790   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:56:38,440-Speed 3392.16 samples/sec   Loss 7.4266   LearningRate 0.0646   Epoch: 3   Global Step: 48800   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:56:41,494-Speed 3354.10 samples/sec   Loss 7.4052   LearningRate 0.0646   Epoch: 3   Global Step: 48810   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:56:44,501-Speed 3406.42 samples/sec   Loss 7.3151   LearningRate 0.0646   Epoch: 3   Global Step: 48820   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:56:47,562-Speed 3346.64 samples/sec   Loss 7.3970   LearningRate 0.0646   Epoch: 3   Global Step: 48830   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:56:50,584-Speed 3390.00 samples/sec   Loss 7.3762   LearningRate 0.0645   Epoch: 3   Global Step: 48840   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:56:53,604-Speed 3391.52 samples/sec   Loss 7.3612   LearningRate 0.0645   Epoch: 3   Global Step: 48850   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:56:56,656-Speed 3356.02 samples/sec   Loss 7.3618   LearningRate 0.0645   Epoch: 3   Global Step: 48860   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:56:59,695-Speed 3370.69 samples/sec   Loss 7.3846   LearningRate 0.0645   Epoch: 3   Global Step: 48870   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:57:02,743-Speed 3360.76 samples/sec   Loss 7.4775   LearningRate 0.0645   Epoch: 3   Global Step: 48880   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:57:05,793-Speed 3357.52 samples/sec   Loss 7.4702   LearningRate 0.0645   Epoch: 3   Global Step: 48890   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:57:08,841-Speed 3361.09 samples/sec   Loss 7.4669   LearningRate 0.0645   Epoch: 3   Global Step: 48900   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:57:11,897-Speed 3351.72 samples/sec   Loss 7.3834   LearningRate 0.0645   Epoch: 3   Global Step: 48910   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:57:14,910-Speed 3400.34 samples/sec   Loss 7.4417   LearningRate 0.0645   Epoch: 3   Global Step: 48920   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:57:17,984-Speed 3332.07 samples/sec   Loss 7.3321   LearningRate 0.0645   Epoch: 3   Global Step: 48930   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:57:21,028-Speed 3364.96 samples/sec   Loss 7.6278   LearningRate 0.0645   Epoch: 3   Global Step: 48940   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:57:24,048-Speed 3392.05 samples/sec   Loss 7.4511   LearningRate 0.0645   Epoch: 3   Global Step: 48950   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:57:27,044-Speed 3419.63 samples/sec   Loss 7.3968   LearningRate 0.0645   Epoch: 3   Global Step: 48960   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:57:30,034-Speed 3426.07 samples/sec   Loss 7.3695   LearningRate 0.0645   Epoch: 3   Global Step: 48970   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:57:33,080-Speed 3362.27 samples/sec   Loss 7.3505   LearningRate 0.0645   Epoch: 3   Global Step: 48980   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:57:36,104-Speed 3387.08 samples/sec   Loss 7.3195   LearningRate 0.0644   Epoch: 3   Global Step: 48990   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:57:39,134-Speed 3380.54 samples/sec   Loss 7.3902   LearningRate 0.0644   Epoch: 3   Global Step: 49000   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:57:42,184-Speed 3358.39 samples/sec   Loss 7.3653   LearningRate 0.0644   Epoch: 3   Global Step: 49010   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:57:45,173-Speed 3427.99 samples/sec   Loss 7.4076   LearningRate 0.0644   Epoch: 3   Global Step: 49020   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:57:48,194-Speed 3390.23 samples/sec   Loss 7.4403   LearningRate 0.0644   Epoch: 3   Global Step: 49030   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:57:51,214-Speed 3391.94 samples/sec   Loss 7.4941   LearningRate 0.0644   Epoch: 3   Global Step: 49040   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:57:54,225-Speed 3401.60 samples/sec   Loss 7.3684   LearningRate 0.0644   Epoch: 3   Global Step: 49050   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:57:57,221-Speed 3419.50 samples/sec   Loss 7.4788   LearningRate 0.0644   Epoch: 3   Global Step: 49060   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:58:00,233-Speed 3399.78 samples/sec   Loss 7.4506   LearningRate 0.0644   Epoch: 3   Global Step: 49070   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:58:03,275-Speed 3367.82 samples/sec   Loss 7.2488   LearningRate 0.0644   Epoch: 3   Global Step: 49080   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:58:06,292-Speed 3395.53 samples/sec   Loss 7.4489   LearningRate 0.0644   Epoch: 3   Global Step: 49090   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:58:09,288-Speed 3419.13 samples/sec   Loss 7.3925   LearningRate 0.0644   Epoch: 3   Global Step: 49100   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:58:12,304-Speed 3395.79 samples/sec   Loss 7.3568   LearningRate 0.0644   Epoch: 3   Global Step: 49110   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:58:15,378-Speed 3332.74 samples/sec   Loss 7.4013   LearningRate 0.0644   Epoch: 3   Global Step: 49120   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:58:18,394-Speed 3395.45 samples/sec   Loss 7.3805   LearningRate 0.0644   Epoch: 3   Global Step: 49130   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:58:21,392-Speed 3416.80 samples/sec   Loss 7.3923   LearningRate 0.0644   Epoch: 3   Global Step: 49140   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:58:24,399-Speed 3407.09 samples/sec   Loss 7.4283   LearningRate 0.0643   Epoch: 3   Global Step: 49150   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:58:27,414-Speed 3397.48 samples/sec   Loss 7.4036   LearningRate 0.0643   Epoch: 3   Global Step: 49160   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:58:30,450-Speed 3373.70 samples/sec   Loss 7.5565   LearningRate 0.0643   Epoch: 3   Global Step: 49170   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:58:33,443-Speed 3422.06 samples/sec   Loss 7.3895   LearningRate 0.0643   Epoch: 3   Global Step: 49180   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:58:36,444-Speed 3413.97 samples/sec   Loss 7.4809   LearningRate 0.0643   Epoch: 3   Global Step: 49190   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:58:39,485-Speed 3368.03 samples/sec   Loss 7.2959   LearningRate 0.0643   Epoch: 3   Global Step: 49200   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:58:42,565-Speed 3325.26 samples/sec   Loss 7.4158   LearningRate 0.0643   Epoch: 3   Global Step: 49210   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:58:45,560-Speed 3420.82 samples/sec   Loss 7.3708   LearningRate 0.0643   Epoch: 3   Global Step: 49220   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:58:48,575-Speed 3397.22 samples/sec   Loss 7.3586   LearningRate 0.0643   Epoch: 3   Global Step: 49230   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:58:51,609-Speed 3376.50 samples/sec   Loss 7.2999   LearningRate 0.0643   Epoch: 3   Global Step: 49240   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:58:54,635-Speed 3385.15 samples/sec   Loss 7.4250   LearningRate 0.0643   Epoch: 3   Global Step: 49250   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 05:58:57,620-Speed 3432.23 samples/sec   Loss 7.5683   LearningRate 0.0643   Epoch: 3   Global Step: 49260   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:59:00,673-Speed 3354.76 samples/sec   Loss 7.3563   LearningRate 0.0643   Epoch: 3   Global Step: 49270   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:59:03,756-Speed 3321.68 samples/sec   Loss 7.4233   LearningRate 0.0643   Epoch: 3   Global Step: 49280   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:59:06,824-Speed 3338.97 samples/sec   Loss 7.3809   LearningRate 0.0643   Epoch: 3   Global Step: 49290   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:59:09,856-Speed 3378.91 samples/sec   Loss 7.3679   LearningRate 0.0642   Epoch: 3   Global Step: 49300   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:59:12,917-Speed 3346.28 samples/sec   Loss 7.4707   LearningRate 0.0642   Epoch: 3   Global Step: 49310   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:59:15,923-Speed 3407.76 samples/sec   Loss 7.3077   LearningRate 0.0642   Epoch: 3   Global Step: 49320   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:59:18,975-Speed 3355.76 samples/sec   Loss 7.4431   LearningRate 0.0642   Epoch: 3   Global Step: 49330   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:59:21,983-Speed 3405.54 samples/sec   Loss 7.3820   LearningRate 0.0642   Epoch: 3   Global Step: 49340   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:59:25,010-Speed 3383.38 samples/sec   Loss 7.4001   LearningRate 0.0642   Epoch: 3   Global Step: 49350   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 05:59:28,049-Speed 3371.06 samples/sec   Loss 7.3114   LearningRate 0.0642   Epoch: 3   Global Step: 49360   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:59:31,079-Speed 3380.93 samples/sec   Loss 7.4571   LearningRate 0.0642   Epoch: 3   Global Step: 49370   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:59:34,119-Speed 3369.36 samples/sec   Loss 7.3457   LearningRate 0.0642   Epoch: 3   Global Step: 49380   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:59:37,177-Speed 3349.02 samples/sec   Loss 7.4850   LearningRate 0.0642   Epoch: 3   Global Step: 49390   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:59:40,303-Speed 3277.05 samples/sec   Loss 7.3271   LearningRate 0.0642   Epoch: 3   Global Step: 49400   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:59:43,357-Speed 3354.32 samples/sec   Loss 7.3842   LearningRate 0.0642   Epoch: 3   Global Step: 49410   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:59:46,421-Speed 3342.88 samples/sec   Loss 7.2858   LearningRate 0.0642   Epoch: 3   Global Step: 49420   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:59:49,532-Speed 3293.61 samples/sec   Loss 7.3586   LearningRate 0.0642   Epoch: 3   Global Step: 49430   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:59:52,589-Speed 3350.92 samples/sec   Loss 7.2788   LearningRate 0.0642   Epoch: 3   Global Step: 49440   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:59:55,647-Speed 3348.90 samples/sec   Loss 7.4288   LearningRate 0.0642   Epoch: 3   Global Step: 49450   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 05:59:58,694-Speed 3362.04 samples/sec   Loss 7.3514   LearningRate 0.0641   Epoch: 3   Global Step: 49460   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:00:01,794-Speed 3304.37 samples/sec   Loss 7.3069   LearningRate 0.0641   Epoch: 3   Global Step: 49470   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:00:04,857-Speed 3344.10 samples/sec   Loss 7.4046   LearningRate 0.0641   Epoch: 3   Global Step: 49480   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:00:07,916-Speed 3349.46 samples/sec   Loss 7.4376   LearningRate 0.0641   Epoch: 3   Global Step: 49490   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:00:10,938-Speed 3389.55 samples/sec   Loss 7.3511   LearningRate 0.0641   Epoch: 3   Global Step: 49500   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:00:14,002-Speed 3343.17 samples/sec   Loss 7.3578   LearningRate 0.0641   Epoch: 3   Global Step: 49510   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:00:17,084-Speed 3323.52 samples/sec   Loss 7.4514   LearningRate 0.0641   Epoch: 3   Global Step: 49520   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:00:20,118-Speed 3376.12 samples/sec   Loss 7.3770   LearningRate 0.0641   Epoch: 3   Global Step: 49530   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:00:23,177-Speed 3348.48 samples/sec   Loss 7.3169   LearningRate 0.0641   Epoch: 3   Global Step: 49540   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:00:26,188-Speed 3401.59 samples/sec   Loss 7.4222   LearningRate 0.0641   Epoch: 3   Global Step: 49550   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:00:29,228-Speed 3369.35 samples/sec   Loss 7.4572   LearningRate 0.0641   Epoch: 3   Global Step: 49560   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:00:32,260-Speed 3378.89 samples/sec   Loss 7.2746   LearningRate 0.0641   Epoch: 3   Global Step: 49570   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:00:35,292-Speed 3378.76 samples/sec   Loss 7.4437   LearningRate 0.0641   Epoch: 3   Global Step: 49580   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:00:38,314-Speed 3389.42 samples/sec   Loss 7.3857   LearningRate 0.0641   Epoch: 3   Global Step: 49590   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:00:41,393-Speed 3326.61 samples/sec   Loss 7.3480   LearningRate 0.0641   Epoch: 3   Global Step: 49600   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:00:44,416-Speed 3388.88 samples/sec   Loss 7.4069   LearningRate 0.0640   Epoch: 3   Global Step: 49610   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:00:47,441-Speed 3386.49 samples/sec   Loss 7.4283   LearningRate 0.0640   Epoch: 3   Global Step: 49620   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:00:50,477-Speed 3373.66 samples/sec   Loss 7.5518   LearningRate 0.0640   Epoch: 3   Global Step: 49630   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:00:53,508-Speed 3380.09 samples/sec   Loss 7.4119   LearningRate 0.0640   Epoch: 3   Global Step: 49640   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:00:56,528-Speed 3391.94 samples/sec   Loss 7.3694   LearningRate 0.0640   Epoch: 3   Global Step: 49650   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:00:59,575-Speed 3361.10 samples/sec   Loss 7.4503   LearningRate 0.0640   Epoch: 3   Global Step: 49660   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:01:02,647-Speed 3334.94 samples/sec   Loss 7.2786   LearningRate 0.0640   Epoch: 3   Global Step: 49670   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:01:05,861-Speed 3186.82 samples/sec   Loss 7.4540   LearningRate 0.0640   Epoch: 3   Global Step: 49680   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:01:37,183-Speed 326.94 samples/sec   Loss 6.4911   LearningRate 0.0640   Epoch: 4   Global Step: 49690   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:01:40,643-Speed 2961.18 samples/sec   Loss 5.8580   LearningRate 0.0640   Epoch: 4   Global Step: 49700   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:01:43,788-Speed 3256.70 samples/sec   Loss 5.7370   LearningRate 0.0640   Epoch: 4   Global Step: 49710   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:01:46,827-Speed 3371.01 samples/sec   Loss 5.7527   LearningRate 0.0640   Epoch: 4   Global Step: 49720   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:01:49,916-Speed 3316.11 samples/sec   Loss 5.7446   LearningRate 0.0640   Epoch: 4   Global Step: 49730   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:01:53,023-Speed 3296.68 samples/sec   Loss 5.7072   LearningRate 0.0640   Epoch: 4   Global Step: 49740   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:01:56,292-Speed 3133.05 samples/sec   Loss 5.7311   LearningRate 0.0640   Epoch: 4   Global Step: 49750   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:01:59,608-Speed 3088.80 samples/sec   Loss 5.6475   LearningRate 0.0640   Epoch: 4   Global Step: 49760   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:02:02,667-Speed 3349.05 samples/sec   Loss 5.8023   LearningRate 0.0639   Epoch: 4   Global Step: 49770   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:02:06,426-Speed 2724.66 samples/sec   Loss 5.8583   LearningRate 0.0639   Epoch: 4   Global Step: 49780   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:02:09,446-Speed 3392.20 samples/sec   Loss 5.7017   LearningRate 0.0639   Epoch: 4   Global Step: 49790   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:02:12,533-Speed 3317.59 samples/sec   Loss 5.7644   LearningRate 0.0639   Epoch: 4   Global Step: 49800   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:02:15,573-Speed 3369.29 samples/sec   Loss 5.8430   LearningRate 0.0639   Epoch: 4   Global Step: 49810   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:02:18,634-Speed 3346.82 samples/sec   Loss 5.7297   LearningRate 0.0639   Epoch: 4   Global Step: 49820   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:02:21,688-Speed 3353.89 samples/sec   Loss 5.8352   LearningRate 0.0639   Epoch: 4   Global Step: 49830   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:02:24,779-Speed 3314.61 samples/sec   Loss 5.8415   LearningRate 0.0639   Epoch: 4   Global Step: 49840   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:02:27,826-Speed 3361.97 samples/sec   Loss 5.7908   LearningRate 0.0639   Epoch: 4   Global Step: 49850   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:02:30,876-Speed 3357.53 samples/sec   Loss 5.7888   LearningRate 0.0639   Epoch: 4   Global Step: 49860   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:02:33,909-Speed 3378.16 samples/sec   Loss 5.8243   LearningRate 0.0639   Epoch: 4   Global Step: 49870   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:02:37,084-Speed 3226.11 samples/sec   Loss 5.9023   LearningRate 0.0639   Epoch: 4   Global Step: 49880   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:02:40,126-Speed 3367.14 samples/sec   Loss 5.8562   LearningRate 0.0639   Epoch: 4   Global Step: 49890   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:02:43,215-Speed 3315.30 samples/sec   Loss 5.8236   LearningRate 0.0639   Epoch: 4   Global Step: 49900   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:02:46,294-Speed 3327.32 samples/sec   Loss 5.7990   LearningRate 0.0639   Epoch: 4   Global Step: 49910   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:02:49,371-Speed 3329.39 samples/sec   Loss 5.7256   LearningRate 0.0638   Epoch: 4   Global Step: 49920   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:02:52,427-Speed 3351.99 samples/sec   Loss 5.7038   LearningRate 0.0638   Epoch: 4   Global Step: 49930   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:02:55,477-Speed 3358.24 samples/sec   Loss 5.8472   LearningRate 0.0638   Epoch: 4   Global Step: 49940   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:02:58,533-Speed 3351.51 samples/sec   Loss 5.6893   LearningRate 0.0638   Epoch: 4   Global Step: 49950   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:03:01,582-Speed 3360.22 samples/sec   Loss 5.7444   LearningRate 0.0638   Epoch: 4   Global Step: 49960   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:03:04,660-Speed 3327.09 samples/sec   Loss 5.8077   LearningRate 0.0638   Epoch: 4   Global Step: 49970   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:03:07,681-Speed 3391.04 samples/sec   Loss 5.8358   LearningRate 0.0638   Epoch: 4   Global Step: 49980   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:03:10,723-Speed 3367.72 samples/sec   Loss 5.9078   LearningRate 0.0638   Epoch: 4   Global Step: 49990   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:03:13,769-Speed 3363.08 samples/sec   Loss 5.8033   LearningRate 0.0638   Epoch: 4   Global Step: 50000   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:03:16,805-Speed 3373.43 samples/sec   Loss 5.7877   LearningRate 0.0638   Epoch: 4   Global Step: 50010   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:03:19,863-Speed 3350.29 samples/sec   Loss 5.9369   LearningRate 0.0638   Epoch: 4   Global Step: 50020   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:03:22,897-Speed 3375.44 samples/sec   Loss 5.8323   LearningRate 0.0638   Epoch: 4   Global Step: 50030   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 06:03:25,907-Speed 3402.93 samples/sec   Loss 5.8858   LearningRate 0.0638   Epoch: 4   Global Step: 50040   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:03:28,945-Speed 3371.83 samples/sec   Loss 6.0556   LearningRate 0.0638   Epoch: 4   Global Step: 50050   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:03:31,976-Speed 3379.39 samples/sec   Loss 5.9424   LearningRate 0.0638   Epoch: 4   Global Step: 50060   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:03:35,007-Speed 3379.05 samples/sec   Loss 5.8885   LearningRate 0.0638   Epoch: 4   Global Step: 50070   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:03:38,069-Speed 3345.90 samples/sec   Loss 5.9255   LearningRate 0.0637   Epoch: 4   Global Step: 50080   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:03:42,967-Speed 2091.05 samples/sec   Loss 5.8794   LearningRate 0.0637   Epoch: 4   Global Step: 50090   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:03:45,987-Speed 3392.05 samples/sec   Loss 5.9570   LearningRate 0.0637   Epoch: 4   Global Step: 50100   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:03:49,027-Speed 3369.06 samples/sec   Loss 5.8983   LearningRate 0.0637   Epoch: 4   Global Step: 50110   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:03:52,081-Speed 3354.44 samples/sec   Loss 5.8665   LearningRate 0.0637   Epoch: 4   Global Step: 50120   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:03:55,106-Speed 3385.68 samples/sec   Loss 5.7419   LearningRate 0.0637   Epoch: 4   Global Step: 50130   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:03:58,134-Speed 3383.72 samples/sec   Loss 5.8973   LearningRate 0.0637   Epoch: 4   Global Step: 50140   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:04:01,211-Speed 3327.74 samples/sec   Loss 6.0351   LearningRate 0.0637   Epoch: 4   Global Step: 50150   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:04:04,245-Speed 3376.98 samples/sec   Loss 5.9115   LearningRate 0.0637   Epoch: 4   Global Step: 50160   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:04:07,300-Speed 3353.88 samples/sec   Loss 5.9294   LearningRate 0.0637   Epoch: 4   Global Step: 50170   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:04:10,308-Speed 3404.51 samples/sec   Loss 5.9407   LearningRate 0.0637   Epoch: 4   Global Step: 50180   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:04:13,360-Speed 3356.77 samples/sec   Loss 6.0514   LearningRate 0.0637   Epoch: 4   Global Step: 50190   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:04:16,381-Speed 3390.37 samples/sec   Loss 5.9900   LearningRate 0.0637   Epoch: 4   Global Step: 50200   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:04:19,419-Speed 3371.52 samples/sec   Loss 5.9036   LearningRate 0.0637   Epoch: 4   Global Step: 50210   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:04:22,448-Speed 3382.23 samples/sec   Loss 5.9126   LearningRate 0.0637   Epoch: 4   Global Step: 50220   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:04:25,478-Speed 3380.45 samples/sec   Loss 6.0146   LearningRate 0.0636   Epoch: 4   Global Step: 50230   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:04:28,532-Speed 3354.43 samples/sec   Loss 6.0266   LearningRate 0.0636   Epoch: 4   Global Step: 50240   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:04:31,562-Speed 3380.52 samples/sec   Loss 5.9863   LearningRate 0.0636   Epoch: 4   Global Step: 50250   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:04:34,567-Speed 3408.93 samples/sec   Loss 6.0619   LearningRate 0.0636   Epoch: 4   Global Step: 50260   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:04:37,610-Speed 3366.24 samples/sec   Loss 5.9595   LearningRate 0.0636   Epoch: 4   Global Step: 50270   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:04:40,684-Speed 3331.59 samples/sec   Loss 5.9951   LearningRate 0.0636   Epoch: 4   Global Step: 50280   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:04:43,719-Speed 3374.99 samples/sec   Loss 6.1210   LearningRate 0.0636   Epoch: 4   Global Step: 50290   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:04:46,749-Speed 3380.41 samples/sec   Loss 5.9592   LearningRate 0.0636   Epoch: 4   Global Step: 50300   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:04:49,784-Speed 3375.95 samples/sec   Loss 5.9981   LearningRate 0.0636   Epoch: 4   Global Step: 50310   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:04:52,824-Speed 3368.50 samples/sec   Loss 6.0093   LearningRate 0.0636   Epoch: 4   Global Step: 50320   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:04:55,862-Speed 3372.04 samples/sec   Loss 6.0605   LearningRate 0.0636   Epoch: 4   Global Step: 50330   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:04:58,863-Speed 3413.13 samples/sec   Loss 6.0509   LearningRate 0.0636   Epoch: 4   Global Step: 50340   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:05:01,944-Speed 3324.46 samples/sec   Loss 6.1337   LearningRate 0.0636   Epoch: 4   Global Step: 50350   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:05:05,025-Speed 3325.09 samples/sec   Loss 5.8619   LearningRate 0.0636   Epoch: 4   Global Step: 50360   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:05:08,086-Speed 3346.12 samples/sec   Loss 6.0491   LearningRate 0.0636   Epoch: 4   Global Step: 50370   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:05:11,105-Speed 3393.03 samples/sec   Loss 6.0307   LearningRate 0.0636   Epoch: 4   Global Step: 50380   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:05:14,117-Speed 3400.65 samples/sec   Loss 6.0242   LearningRate 0.0635   Epoch: 4   Global Step: 50390   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:05:17,195-Speed 3328.30 samples/sec   Loss 6.0469   LearningRate 0.0635   Epoch: 4   Global Step: 50400   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:05:20,240-Speed 3363.63 samples/sec   Loss 5.9578   LearningRate 0.0635   Epoch: 4   Global Step: 50410   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:05:23,277-Speed 3372.91 samples/sec   Loss 6.0104   LearningRate 0.0635   Epoch: 4   Global Step: 50420   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:05:26,321-Speed 3365.12 samples/sec   Loss 6.0126   LearningRate 0.0635   Epoch: 4   Global Step: 50430   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:05:29,390-Speed 3337.50 samples/sec   Loss 5.9742   LearningRate 0.0635   Epoch: 4   Global Step: 50440   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:05:32,412-Speed 3390.45 samples/sec   Loss 6.0262   LearningRate 0.0635   Epoch: 4   Global Step: 50450   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:05:35,446-Speed 3376.67 samples/sec   Loss 6.0799   LearningRate 0.0635   Epoch: 4   Global Step: 50460   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:05:38,463-Speed 3395.14 samples/sec   Loss 6.0607   LearningRate 0.0635   Epoch: 4   Global Step: 50470   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:05:41,517-Speed 3354.37 samples/sec   Loss 6.1063   LearningRate 0.0635   Epoch: 4   Global Step: 50480   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:05:44,577-Speed 3347.11 samples/sec   Loss 6.2387   LearningRate 0.0635   Epoch: 4   Global Step: 50490   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:05:47,617-Speed 3368.99 samples/sec   Loss 6.1418   LearningRate 0.0635   Epoch: 4   Global Step: 50500   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:05:50,649-Speed 3377.87 samples/sec   Loss 6.0129   LearningRate 0.0635   Epoch: 4   Global Step: 50510   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:05:53,696-Speed 3362.51 samples/sec   Loss 6.0128   LearningRate 0.0635   Epoch: 4   Global Step: 50520   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:05:56,705-Speed 3404.23 samples/sec   Loss 6.2077   LearningRate 0.0635   Epoch: 4   Global Step: 50530   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:05:59,732-Speed 3384.49 samples/sec   Loss 6.1466   LearningRate 0.0634   Epoch: 4   Global Step: 50540   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:06:02,783-Speed 3357.11 samples/sec   Loss 6.0423   LearningRate 0.0634   Epoch: 4   Global Step: 50550   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:06:05,800-Speed 3395.21 samples/sec   Loss 6.0318   LearningRate 0.0634   Epoch: 4   Global Step: 50560   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:06:08,818-Speed 3393.74 samples/sec   Loss 6.1336   LearningRate 0.0634   Epoch: 4   Global Step: 50570   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:06:11,837-Speed 3393.53 samples/sec   Loss 6.0878   LearningRate 0.0634   Epoch: 4   Global Step: 50580   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:06:14,919-Speed 3323.39 samples/sec   Loss 6.0995   LearningRate 0.0634   Epoch: 4   Global Step: 50590   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:06:17,970-Speed 3357.12 samples/sec   Loss 6.0726   LearningRate 0.0634   Epoch: 4   Global Step: 50600   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:06:21,000-Speed 3380.44 samples/sec   Loss 6.1050   LearningRate 0.0634   Epoch: 4   Global Step: 50610   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:06:24,066-Speed 3341.33 samples/sec   Loss 6.0826   LearningRate 0.0634   Epoch: 4   Global Step: 50620   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:06:27,108-Speed 3367.26 samples/sec   Loss 6.0434   LearningRate 0.0634   Epoch: 4   Global Step: 50630   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:06:30,153-Speed 3363.83 samples/sec   Loss 6.1412   LearningRate 0.0634   Epoch: 4   Global Step: 50640   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:06:33,177-Speed 3387.32 samples/sec   Loss 6.1133   LearningRate 0.0634   Epoch: 4   Global Step: 50650   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:06:36,302-Speed 3278.05 samples/sec   Loss 6.1419   LearningRate 0.0634   Epoch: 4   Global Step: 50660   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:06:39,381-Speed 3326.54 samples/sec   Loss 6.1661   LearningRate 0.0634   Epoch: 4   Global Step: 50670   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:06:42,437-Speed 3352.21 samples/sec   Loss 6.1844   LearningRate 0.0634   Epoch: 4   Global Step: 50680   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:06:45,476-Speed 3370.37 samples/sec   Loss 6.1510   LearningRate 0.0634   Epoch: 4   Global Step: 50690   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:06:48,502-Speed 3384.54 samples/sec   Loss 6.1400   LearningRate 0.0633   Epoch: 4   Global Step: 50700   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:06:51,526-Speed 3388.04 samples/sec   Loss 6.1695   LearningRate 0.0633   Epoch: 4   Global Step: 50710   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:06:54,582-Speed 3352.31 samples/sec   Loss 6.1160   LearningRate 0.0633   Epoch: 4   Global Step: 50720   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:06:57,593-Speed 3401.20 samples/sec   Loss 6.1574   LearningRate 0.0633   Epoch: 4   Global Step: 50730   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:07:00,680-Speed 3318.52 samples/sec   Loss 6.1836   LearningRate 0.0633   Epoch: 4   Global Step: 50740   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:07:03,742-Speed 3344.45 samples/sec   Loss 6.1269   LearningRate 0.0633   Epoch: 4   Global Step: 50750   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:07:06,791-Speed 3359.47 samples/sec   Loss 6.2116   LearningRate 0.0633   Epoch: 4   Global Step: 50760   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:07:09,811-Speed 3392.11 samples/sec   Loss 6.2507   LearningRate 0.0633   Epoch: 4   Global Step: 50770   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:07:12,861-Speed 3358.54 samples/sec   Loss 6.1259   LearningRate 0.0633   Epoch: 4   Global Step: 50780   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:07:15,924-Speed 3344.54 samples/sec   Loss 6.1708   LearningRate 0.0633   Epoch: 4   Global Step: 50790   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:07:18,998-Speed 3332.13 samples/sec   Loss 6.3231   LearningRate 0.0633   Epoch: 4   Global Step: 50800   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:07:22,061-Speed 3343.77 samples/sec   Loss 6.0979   LearningRate 0.0633   Epoch: 4   Global Step: 50810   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:07:25,070-Speed 3404.55 samples/sec   Loss 6.2398   LearningRate 0.0633   Epoch: 4   Global Step: 50820   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:07:28,158-Speed 3317.70 samples/sec   Loss 6.1852   LearningRate 0.0633   Epoch: 4   Global Step: 50830   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:07:31,266-Speed 3295.87 samples/sec   Loss 6.1817   LearningRate 0.0633   Epoch: 4   Global Step: 50840   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:07:34,369-Speed 3301.22 samples/sec   Loss 6.1519   LearningRate 0.0633   Epoch: 4   Global Step: 50850   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:07:37,487-Speed 3284.63 samples/sec   Loss 6.1136   LearningRate 0.0632   Epoch: 4   Global Step: 50860   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:07:40,596-Speed 3294.89 samples/sec   Loss 6.2571   LearningRate 0.0632   Epoch: 4   Global Step: 50870   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:07:43,623-Speed 3384.37 samples/sec   Loss 6.2671   LearningRate 0.0632   Epoch: 4   Global Step: 50880   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:07:46,647-Speed 3386.95 samples/sec   Loss 6.1101   LearningRate 0.0632   Epoch: 4   Global Step: 50890   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:07:49,694-Speed 3361.66 samples/sec   Loss 6.1542   LearningRate 0.0632   Epoch: 4   Global Step: 50900   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:07:52,754-Speed 3347.56 samples/sec   Loss 6.2145   LearningRate 0.0632   Epoch: 4   Global Step: 50910   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:07:55,765-Speed 3401.36 samples/sec   Loss 6.1343   LearningRate 0.0632   Epoch: 4   Global Step: 50920   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:07:58,791-Speed 3385.03 samples/sec   Loss 6.2091   LearningRate 0.0632   Epoch: 4   Global Step: 50930   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:08:01,838-Speed 3361.88 samples/sec   Loss 6.2584   LearningRate 0.0632   Epoch: 4   Global Step: 50940   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:08:04,897-Speed 3349.20 samples/sec   Loss 6.2616   LearningRate 0.0632   Epoch: 4   Global Step: 50950   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:08:07,932-Speed 3374.63 samples/sec   Loss 6.2888   LearningRate 0.0632   Epoch: 4   Global Step: 50960   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:08:10,969-Speed 3372.33 samples/sec   Loss 6.1620   LearningRate 0.0632   Epoch: 4   Global Step: 50970   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:08:14,010-Speed 3368.70 samples/sec   Loss 6.2789   LearningRate 0.0632   Epoch: 4   Global Step: 50980   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:08:17,105-Speed 3310.08 samples/sec   Loss 6.3284   LearningRate 0.0632   Epoch: 4   Global Step: 50990   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:08:20,162-Speed 3350.44 samples/sec   Loss 6.2204   LearningRate 0.0632   Epoch: 4   Global Step: 51000   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:08:23,266-Speed 3299.92 samples/sec   Loss 6.2074   LearningRate 0.0631   Epoch: 4   Global Step: 51010   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:08:26,376-Speed 3293.90 samples/sec   Loss 6.2491   LearningRate 0.0631   Epoch: 4   Global Step: 51020   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:08:29,428-Speed 3355.77 samples/sec   Loss 6.2740   LearningRate 0.0631   Epoch: 4   Global Step: 51030   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:08:32,487-Speed 3348.81 samples/sec   Loss 6.3253   LearningRate 0.0631   Epoch: 4   Global Step: 51040   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:08:35,532-Speed 3364.15 samples/sec   Loss 6.2658   LearningRate 0.0631   Epoch: 4   Global Step: 51050   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:08:38,546-Speed 3398.51 samples/sec   Loss 6.2937   LearningRate 0.0631   Epoch: 4   Global Step: 51060   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:08:41,694-Speed 3254.07 samples/sec   Loss 6.2920   LearningRate 0.0631   Epoch: 4   Global Step: 51070   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:08:44,756-Speed 3345.15 samples/sec   Loss 6.2969   LearningRate 0.0631   Epoch: 4   Global Step: 51080   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:08:47,814-Speed 3350.37 samples/sec   Loss 6.4103   LearningRate 0.0631   Epoch: 4   Global Step: 51090   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:08:50,848-Speed 3375.89 samples/sec   Loss 6.2967   LearningRate 0.0631   Epoch: 4   Global Step: 51100   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:08:53,915-Speed 3339.17 samples/sec   Loss 6.2460   LearningRate 0.0631   Epoch: 4   Global Step: 51110   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:08:56,917-Speed 3412.41 samples/sec   Loss 6.2709   LearningRate 0.0631   Epoch: 4   Global Step: 51120   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:08:59,979-Speed 3346.13 samples/sec   Loss 6.2938   LearningRate 0.0631   Epoch: 4   Global Step: 51130   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:09:03,103-Speed 3278.05 samples/sec   Loss 6.3529   LearningRate 0.0631   Epoch: 4   Global Step: 51140   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:09:06,125-Speed 3389.74 samples/sec   Loss 6.2585   LearningRate 0.0631   Epoch: 4   Global Step: 51150   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:09:09,119-Speed 3420.76 samples/sec   Loss 6.2987   LearningRate 0.0631   Epoch: 4   Global Step: 51160   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:09:12,177-Speed 3349.71 samples/sec   Loss 6.2658   LearningRate 0.0630   Epoch: 4   Global Step: 51170   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:09:15,191-Speed 3398.62 samples/sec   Loss 6.3588   LearningRate 0.0630   Epoch: 4   Global Step: 51180   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:09:18,257-Speed 3341.10 samples/sec   Loss 6.4071   LearningRate 0.0630   Epoch: 4   Global Step: 51190   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 06:09:21,269-Speed 3400.55 samples/sec   Loss 6.2310   LearningRate 0.0630   Epoch: 4   Global Step: 51200   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 06:09:24,285-Speed 3397.23 samples/sec   Loss 6.3028   LearningRate 0.0630   Epoch: 4   Global Step: 51210   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:09:27,421-Speed 3266.20 samples/sec   Loss 6.2873   LearningRate 0.0630   Epoch: 4   Global Step: 51220   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:09:30,480-Speed 3347.68 samples/sec   Loss 6.3279   LearningRate 0.0630   Epoch: 4   Global Step: 51230   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:09:33,534-Speed 3354.01 samples/sec   Loss 6.3746   LearningRate 0.0630   Epoch: 4   Global Step: 51240   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:09:36,596-Speed 3346.10 samples/sec   Loss 6.3619   LearningRate 0.0630   Epoch: 4   Global Step: 51250   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:09:39,664-Speed 3338.07 samples/sec   Loss 6.2948   LearningRate 0.0630   Epoch: 4   Global Step: 51260   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:09:42,741-Speed 3329.70 samples/sec   Loss 6.4472   LearningRate 0.0630   Epoch: 4   Global Step: 51270   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:09:45,753-Speed 3400.77 samples/sec   Loss 6.2252   LearningRate 0.0630   Epoch: 4   Global Step: 51280   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:09:48,807-Speed 3353.29 samples/sec   Loss 6.2702   LearningRate 0.0630   Epoch: 4   Global Step: 51290   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:09:51,860-Speed 3355.51 samples/sec   Loss 6.3873   LearningRate 0.0630   Epoch: 4   Global Step: 51300   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:09:54,917-Speed 3351.35 samples/sec   Loss 6.2154   LearningRate 0.0630   Epoch: 4   Global Step: 51310   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:09:57,948-Speed 3378.70 samples/sec   Loss 6.3105   LearningRate 0.0630   Epoch: 4   Global Step: 51320   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:10:00,976-Speed 3383.08 samples/sec   Loss 6.4002   LearningRate 0.0629   Epoch: 4   Global Step: 51330   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:10:04,029-Speed 3355.55 samples/sec   Loss 6.1793   LearningRate 0.0629   Epoch: 4   Global Step: 51340   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:10:07,066-Speed 3373.07 samples/sec   Loss 6.3227   LearningRate 0.0629   Epoch: 4   Global Step: 51350   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:10:10,059-Speed 3422.09 samples/sec   Loss 6.4258   LearningRate 0.0629   Epoch: 4   Global Step: 51360   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:10:13,143-Speed 3321.05 samples/sec   Loss 6.4273   LearningRate 0.0629   Epoch: 4   Global Step: 51370   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:10:16,208-Speed 3342.16 samples/sec   Loss 6.3158   LearningRate 0.0629   Epoch: 4   Global Step: 51380   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:10:19,225-Speed 3395.51 samples/sec   Loss 6.2834   LearningRate 0.0629   Epoch: 4   Global Step: 51390   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:10:22,259-Speed 3375.39 samples/sec   Loss 6.3046   LearningRate 0.0629   Epoch: 4   Global Step: 51400   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:10:25,312-Speed 3355.07 samples/sec   Loss 6.3444   LearningRate 0.0629   Epoch: 4   Global Step: 51410   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:10:28,373-Speed 3347.34 samples/sec   Loss 6.4690   LearningRate 0.0629   Epoch: 4   Global Step: 51420   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:10:31,417-Speed 3364.04 samples/sec   Loss 6.3435   LearningRate 0.0629   Epoch: 4   Global Step: 51430   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:10:34,449-Speed 3378.81 samples/sec   Loss 6.2625   LearningRate 0.0629   Epoch: 4   Global Step: 51440   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:10:37,480-Speed 3379.93 samples/sec   Loss 6.4097   LearningRate 0.0629   Epoch: 4   Global Step: 51450   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:10:40,552-Speed 3334.90 samples/sec   Loss 6.3961   LearningRate 0.0629   Epoch: 4   Global Step: 51460   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:10:43,608-Speed 3351.31 samples/sec   Loss 6.4085   LearningRate 0.0629   Epoch: 4   Global Step: 51470   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:10:46,625-Speed 3395.28 samples/sec   Loss 6.5130   LearningRate 0.0628   Epoch: 4   Global Step: 51480   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:10:49,694-Speed 3338.40 samples/sec   Loss 6.3251   LearningRate 0.0628   Epoch: 4   Global Step: 51490   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:10:52,739-Speed 3364.26 samples/sec   Loss 6.4515   LearningRate 0.0628   Epoch: 4   Global Step: 51500   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:10:55,809-Speed 3336.80 samples/sec   Loss 6.3762   LearningRate 0.0628   Epoch: 4   Global Step: 51510   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:10:58,840-Speed 3379.31 samples/sec   Loss 6.3981   LearningRate 0.0628   Epoch: 4   Global Step: 51520   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:11:01,881-Speed 3368.71 samples/sec   Loss 6.4151   LearningRate 0.0628   Epoch: 4   Global Step: 51530   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:11:04,953-Speed 3334.88 samples/sec   Loss 6.3375   LearningRate 0.0628   Epoch: 4   Global Step: 51540   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:11:08,044-Speed 3313.00 samples/sec   Loss 6.4093   LearningRate 0.0628   Epoch: 4   Global Step: 51550   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:11:11,089-Speed 3363.87 samples/sec   Loss 6.4683   LearningRate 0.0628   Epoch: 4   Global Step: 51560   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:11:14,162-Speed 3333.55 samples/sec   Loss 6.4480   LearningRate 0.0628   Epoch: 4   Global Step: 51570   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:11:17,264-Speed 3302.05 samples/sec   Loss 6.4171   LearningRate 0.0628   Epoch: 4   Global Step: 51580   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:11:20,275-Speed 3401.90 samples/sec   Loss 6.4657   LearningRate 0.0628   Epoch: 4   Global Step: 51590   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:11:23,295-Speed 3392.50 samples/sec   Loss 6.3220   LearningRate 0.0628   Epoch: 4   Global Step: 51600   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:11:26,350-Speed 3352.36 samples/sec   Loss 6.3928   LearningRate 0.0628   Epoch: 4   Global Step: 51610   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:11:29,405-Speed 3353.19 samples/sec   Loss 6.4697   LearningRate 0.0628   Epoch: 4   Global Step: 51620   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:11:32,430-Speed 3385.94 samples/sec   Loss 6.4406   LearningRate 0.0628   Epoch: 4   Global Step: 51630   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:11:35,474-Speed 3366.40 samples/sec   Loss 6.5548   LearningRate 0.0627   Epoch: 4   Global Step: 51640   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:11:38,525-Speed 3356.69 samples/sec   Loss 6.4838   LearningRate 0.0627   Epoch: 4   Global Step: 51650   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:11:41,525-Speed 3415.56 samples/sec   Loss 6.5232   LearningRate 0.0627   Epoch: 4   Global Step: 51660   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:11:44,583-Speed 3349.63 samples/sec   Loss 6.4704   LearningRate 0.0627   Epoch: 4   Global Step: 51670   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:11:47,665-Speed 3323.88 samples/sec   Loss 6.4459   LearningRate 0.0627   Epoch: 4   Global Step: 51680   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:11:50,686-Speed 3390.43 samples/sec   Loss 6.3117   LearningRate 0.0627   Epoch: 4   Global Step: 51690   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:11:53,745-Speed 3349.35 samples/sec   Loss 6.4160   LearningRate 0.0627   Epoch: 4   Global Step: 51700   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:11:56,737-Speed 3422.75 samples/sec   Loss 6.3922   LearningRate 0.0627   Epoch: 4   Global Step: 51710   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:11:59,809-Speed 3334.90 samples/sec   Loss 6.4649   LearningRate 0.0627   Epoch: 4   Global Step: 51720   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:12:02,857-Speed 3360.24 samples/sec   Loss 6.5111   LearningRate 0.0627   Epoch: 4   Global Step: 51730   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:12:05,857-Speed 3415.46 samples/sec   Loss 6.4206   LearningRate 0.0627   Epoch: 4   Global Step: 51740   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:12:08,898-Speed 3367.68 samples/sec   Loss 6.4608   LearningRate 0.0627   Epoch: 4   Global Step: 51750   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:12:11,955-Speed 3351.98 samples/sec   Loss 6.5915   LearningRate 0.0627   Epoch: 4   Global Step: 51760   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:12:15,015-Speed 3346.58 samples/sec   Loss 6.5215   LearningRate 0.0627   Epoch: 4   Global Step: 51770   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:12:18,085-Speed 3337.53 samples/sec   Loss 6.3249   LearningRate 0.0627   Epoch: 4   Global Step: 51780   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:12:21,134-Speed 3358.81 samples/sec   Loss 6.5361   LearningRate 0.0627   Epoch: 4   Global Step: 51790   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:12:24,168-Speed 3375.99 samples/sec   Loss 6.5466   LearningRate 0.0626   Epoch: 4   Global Step: 51800   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:12:27,201-Speed 3377.79 samples/sec   Loss 6.4231   LearningRate 0.0626   Epoch: 4   Global Step: 51810   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:12:30,240-Speed 3370.45 samples/sec   Loss 6.5309   LearningRate 0.0626   Epoch: 4   Global Step: 51820   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:12:33,271-Speed 3379.25 samples/sec   Loss 6.5105   LearningRate 0.0626   Epoch: 4   Global Step: 51830   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:12:36,322-Speed 3357.83 samples/sec   Loss 6.4955   LearningRate 0.0626   Epoch: 4   Global Step: 51840   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:12:39,411-Speed 3315.75 samples/sec   Loss 6.4185   LearningRate 0.0626   Epoch: 4   Global Step: 51850   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:12:42,536-Speed 3278.08 samples/sec   Loss 6.4363   LearningRate 0.0626   Epoch: 4   Global Step: 51860   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:12:45,553-Speed 3395.11 samples/sec   Loss 6.5339   LearningRate 0.0626   Epoch: 4   Global Step: 51870   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:12:48,590-Speed 3373.10 samples/sec   Loss 6.4904   LearningRate 0.0626   Epoch: 4   Global Step: 51880   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:12:51,647-Speed 3350.64 samples/sec   Loss 6.5449   LearningRate 0.0626   Epoch: 4   Global Step: 51890   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:12:54,714-Speed 3340.36 samples/sec   Loss 6.5674   LearningRate 0.0626   Epoch: 4   Global Step: 51900   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:12:57,744-Speed 3379.85 samples/sec   Loss 6.4623   LearningRate 0.0626   Epoch: 4   Global Step: 51910   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:13:00,813-Speed 3338.67 samples/sec   Loss 6.6449   LearningRate 0.0626   Epoch: 4   Global Step: 51920   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:13:03,872-Speed 3347.85 samples/sec   Loss 6.3682   LearningRate 0.0626   Epoch: 4   Global Step: 51930   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:13:06,901-Speed 3381.94 samples/sec   Loss 6.6305   LearningRate 0.0626   Epoch: 4   Global Step: 51940   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:13:09,919-Speed 3394.67 samples/sec   Loss 6.5420   LearningRate 0.0625   Epoch: 4   Global Step: 51950   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:13:12,968-Speed 3359.64 samples/sec   Loss 6.3897   LearningRate 0.0625   Epoch: 4   Global Step: 51960   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:13:15,971-Speed 3410.82 samples/sec   Loss 6.5447   LearningRate 0.0625   Epoch: 4   Global Step: 51970   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:13:19,007-Speed 3374.56 samples/sec   Loss 6.4871   LearningRate 0.0625   Epoch: 4   Global Step: 51980   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:13:22,006-Speed 3415.57 samples/sec   Loss 6.5692   LearningRate 0.0625   Epoch: 4   Global Step: 51990   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:13:25,054-Speed 3360.16 samples/sec   Loss 6.6062   LearningRate 0.0625   Epoch: 4   Global Step: 52000   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:13:28,125-Speed 3336.13 samples/sec   Loss 6.5434   LearningRate 0.0625   Epoch: 4   Global Step: 52010   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:13:31,175-Speed 3358.15 samples/sec   Loss 6.5575   LearningRate 0.0625   Epoch: 4   Global Step: 52020   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:13:34,186-Speed 3401.82 samples/sec   Loss 6.3896   LearningRate 0.0625   Epoch: 4   Global Step: 52030   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:13:37,240-Speed 3354.59 samples/sec   Loss 6.5944   LearningRate 0.0625   Epoch: 4   Global Step: 52040   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:13:40,323-Speed 3322.25 samples/sec   Loss 6.5863   LearningRate 0.0625   Epoch: 4   Global Step: 52050   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:13:43,393-Speed 3335.83 samples/sec   Loss 6.3844   LearningRate 0.0625   Epoch: 4   Global Step: 52060   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:13:46,407-Speed 3398.92 samples/sec   Loss 6.4886   LearningRate 0.0625   Epoch: 4   Global Step: 52070   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:13:49,408-Speed 3412.97 samples/sec   Loss 6.6188   LearningRate 0.0625   Epoch: 4   Global Step: 52080   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:13:52,480-Speed 3334.90 samples/sec   Loss 6.4312   LearningRate 0.0625   Epoch: 4   Global Step: 52090   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:13:55,526-Speed 3362.65 samples/sec   Loss 6.5764   LearningRate 0.0625   Epoch: 4   Global Step: 52100   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:13:58,581-Speed 3353.56 samples/sec   Loss 6.5868   LearningRate 0.0624   Epoch: 4   Global Step: 52110   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:14:01,686-Speed 3298.35 samples/sec   Loss 6.5248   LearningRate 0.0624   Epoch: 4   Global Step: 52120   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:14:04,811-Speed 3278.81 samples/sec   Loss 6.5138   LearningRate 0.0624   Epoch: 4   Global Step: 52130   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:14:07,837-Speed 3384.19 samples/sec   Loss 6.5172   LearningRate 0.0624   Epoch: 4   Global Step: 52140   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:14:10,857-Speed 3392.09 samples/sec   Loss 6.5577   LearningRate 0.0624   Epoch: 4   Global Step: 52150   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:14:13,920-Speed 3344.15 samples/sec   Loss 6.4440   LearningRate 0.0624   Epoch: 4   Global Step: 52160   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:14:16,979-Speed 3349.12 samples/sec   Loss 6.6316   LearningRate 0.0624   Epoch: 4   Global Step: 52170   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:14:20,059-Speed 3325.60 samples/sec   Loss 6.4865   LearningRate 0.0624   Epoch: 4   Global Step: 52180   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:14:23,102-Speed 3366.06 samples/sec   Loss 6.5725   LearningRate 0.0624   Epoch: 4   Global Step: 52190   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:14:26,177-Speed 3330.95 samples/sec   Loss 6.5611   LearningRate 0.0624   Epoch: 4   Global Step: 52200   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:14:29,244-Speed 3340.18 samples/sec   Loss 6.5025   LearningRate 0.0624   Epoch: 4   Global Step: 52210   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:14:32,311-Speed 3340.06 samples/sec   Loss 6.5266   LearningRate 0.0624   Epoch: 4   Global Step: 52220   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:14:35,317-Speed 3407.37 samples/sec   Loss 6.5269   LearningRate 0.0624   Epoch: 4   Global Step: 52230   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:14:38,329-Speed 3400.91 samples/sec   Loss 6.5953   LearningRate 0.0624   Epoch: 4   Global Step: 52240   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:14:41,362-Speed 3377.79 samples/sec   Loss 6.4939   LearningRate 0.0624   Epoch: 4   Global Step: 52250   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:14:44,360-Speed 3416.42 samples/sec   Loss 6.4184   LearningRate 0.0624   Epoch: 4   Global Step: 52260   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:14:47,436-Speed 3330.68 samples/sec   Loss 6.6780   LearningRate 0.0623   Epoch: 4   Global Step: 52270   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:14:50,519-Speed 3321.83 samples/sec   Loss 6.3826   LearningRate 0.0623   Epoch: 4   Global Step: 52280   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:14:53,626-Speed 3297.16 samples/sec   Loss 6.5166   LearningRate 0.0623   Epoch: 4   Global Step: 52290   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:14:56,686-Speed 3347.99 samples/sec   Loss 6.5324   LearningRate 0.0623   Epoch: 4   Global Step: 52300   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:14:59,745-Speed 3347.86 samples/sec   Loss 6.5606   LearningRate 0.0623   Epoch: 4   Global Step: 52310   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:15:02,848-Speed 3301.22 samples/sec   Loss 6.6353   LearningRate 0.0623   Epoch: 4   Global Step: 52320   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:15:05,907-Speed 3349.64 samples/sec   Loss 6.6835   LearningRate 0.0623   Epoch: 4   Global Step: 52330   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:15:08,903-Speed 3418.62 samples/sec   Loss 6.6224   LearningRate 0.0623   Epoch: 4   Global Step: 52340   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:15:11,935-Speed 3378.49 samples/sec   Loss 6.4453   LearningRate 0.0623   Epoch: 4   Global Step: 52350   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:15:14,967-Speed 3377.88 samples/sec   Loss 6.5218   LearningRate 0.0623   Epoch: 4   Global Step: 52360   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:15:18,052-Speed 3320.74 samples/sec   Loss 6.5273   LearningRate 0.0623   Epoch: 4   Global Step: 52370   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:15:21,076-Speed 3387.17 samples/sec   Loss 6.5745   LearningRate 0.0623   Epoch: 4   Global Step: 52380   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:15:24,132-Speed 3352.14 samples/sec   Loss 6.6037   LearningRate 0.0623   Epoch: 4   Global Step: 52390   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:15:27,179-Speed 3360.95 samples/sec   Loss 6.6340   LearningRate 0.0623   Epoch: 4   Global Step: 52400   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:15:30,237-Speed 3349.58 samples/sec   Loss 6.5461   LearningRate 0.0623   Epoch: 4   Global Step: 52410   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:15:33,288-Speed 3357.98 samples/sec   Loss 6.6451   LearningRate 0.0622   Epoch: 4   Global Step: 52420   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:15:36,388-Speed 3304.54 samples/sec   Loss 6.6230   LearningRate 0.0622   Epoch: 4   Global Step: 52430   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:15:39,463-Speed 3331.28 samples/sec   Loss 6.6438   LearningRate 0.0622   Epoch: 4   Global Step: 52440   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:15:42,498-Speed 3375.24 samples/sec   Loss 6.5948   LearningRate 0.0622   Epoch: 4   Global Step: 52450   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:15:45,511-Speed 3399.32 samples/sec   Loss 6.6332   LearningRate 0.0622   Epoch: 4   Global Step: 52460   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:15:48,536-Speed 3386.31 samples/sec   Loss 6.5620   LearningRate 0.0622   Epoch: 4   Global Step: 52470   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:15:51,570-Speed 3375.71 samples/sec   Loss 6.4674   LearningRate 0.0622   Epoch: 4   Global Step: 52480   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:15:54,614-Speed 3365.11 samples/sec   Loss 6.4848   LearningRate 0.0622   Epoch: 4   Global Step: 52490   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:15:57,641-Speed 3384.68 samples/sec   Loss 6.5501   LearningRate 0.0622   Epoch: 4   Global Step: 52500   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:16:00,767-Speed 3276.08 samples/sec   Loss 6.6624   LearningRate 0.0622   Epoch: 4   Global Step: 52510   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:16:03,770-Speed 3410.75 samples/sec   Loss 6.5005   LearningRate 0.0622   Epoch: 4   Global Step: 52520   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:16:06,789-Speed 3393.50 samples/sec   Loss 6.7577   LearningRate 0.0622   Epoch: 4   Global Step: 52530   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:16:09,813-Speed 3387.74 samples/sec   Loss 6.6664   LearningRate 0.0622   Epoch: 4   Global Step: 52540   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:16:12,830-Speed 3395.60 samples/sec   Loss 6.7037   LearningRate 0.0622   Epoch: 4   Global Step: 52550   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:16:15,830-Speed 3414.34 samples/sec   Loss 6.6636   LearningRate 0.0622   Epoch: 4   Global Step: 52560   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:16:18,896-Speed 3341.33 samples/sec   Loss 6.6014   LearningRate 0.0622   Epoch: 4   Global Step: 52570   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:16:21,953-Speed 3350.54 samples/sec   Loss 6.6686   LearningRate 0.0621   Epoch: 4   Global Step: 52580   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:16:25,007-Speed 3354.42 samples/sec   Loss 6.6359   LearningRate 0.0621   Epoch: 4   Global Step: 52590   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:16:28,033-Speed 3384.03 samples/sec   Loss 6.5551   LearningRate 0.0621   Epoch: 4   Global Step: 52600   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:16:31,074-Speed 3369.55 samples/sec   Loss 6.6503   LearningRate 0.0621   Epoch: 4   Global Step: 52610   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:16:34,103-Speed 3381.27 samples/sec   Loss 6.5444   LearningRate 0.0621   Epoch: 4   Global Step: 52620   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:16:37,193-Speed 3315.33 samples/sec   Loss 6.5383   LearningRate 0.0621   Epoch: 4   Global Step: 52630   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:16:40,219-Speed 3385.48 samples/sec   Loss 6.6157   LearningRate 0.0621   Epoch: 4   Global Step: 52640   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:16:43,276-Speed 3350.74 samples/sec   Loss 6.5336   LearningRate 0.0621   Epoch: 4   Global Step: 52650   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:16:46,314-Speed 3371.84 samples/sec   Loss 6.6646   LearningRate 0.0621   Epoch: 4   Global Step: 52660   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:16:49,408-Speed 3309.57 samples/sec   Loss 6.5835   LearningRate 0.0621   Epoch: 4   Global Step: 52670   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:16:52,444-Speed 3374.14 samples/sec   Loss 6.6556   LearningRate 0.0621   Epoch: 4   Global Step: 52680   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:16:55,444-Speed 3415.01 samples/sec   Loss 6.6327   LearningRate 0.0621   Epoch: 4   Global Step: 52690   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:16:58,438-Speed 3421.37 samples/sec   Loss 6.4961   LearningRate 0.0621   Epoch: 4   Global Step: 52700   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:17:01,484-Speed 3362.85 samples/sec   Loss 6.6698   LearningRate 0.0621   Epoch: 4   Global Step: 52710   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:17:04,535-Speed 3357.83 samples/sec   Loss 6.5600   LearningRate 0.0621   Epoch: 4   Global Step: 52720   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:17:07,590-Speed 3352.57 samples/sec   Loss 6.5210   LearningRate 0.0621   Epoch: 4   Global Step: 52730   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:17:10,599-Speed 3404.11 samples/sec   Loss 6.6028   LearningRate 0.0620   Epoch: 4   Global Step: 52740   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:17:13,659-Speed 3347.23 samples/sec   Loss 6.7227   LearningRate 0.0620   Epoch: 4   Global Step: 52750   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:17:16,662-Speed 3412.49 samples/sec   Loss 6.6006   LearningRate 0.0620   Epoch: 4   Global Step: 52760   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:17:19,697-Speed 3374.04 samples/sec   Loss 6.7280   LearningRate 0.0620   Epoch: 4   Global Step: 52770   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:17:22,738-Speed 3368.56 samples/sec   Loss 6.6007   LearningRate 0.0620   Epoch: 4   Global Step: 52780   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:17:25,750-Speed 3401.87 samples/sec   Loss 6.5982   LearningRate 0.0620   Epoch: 4   Global Step: 52790   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:17:28,746-Speed 3418.29 samples/sec   Loss 6.5630   LearningRate 0.0620   Epoch: 4   Global Step: 52800   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:17:31,789-Speed 3366.74 samples/sec   Loss 6.5698   LearningRate 0.0620   Epoch: 4   Global Step: 52810   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:17:34,848-Speed 3348.58 samples/sec   Loss 6.6464   LearningRate 0.0620   Epoch: 4   Global Step: 52820   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:17:37,874-Speed 3384.46 samples/sec   Loss 6.6042   LearningRate 0.0620   Epoch: 4   Global Step: 52830   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:17:40,898-Speed 3387.97 samples/sec   Loss 6.6064   LearningRate 0.0620   Epoch: 4   Global Step: 52840   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:17:43,901-Speed 3411.01 samples/sec   Loss 6.7389   LearningRate 0.0620   Epoch: 4   Global Step: 52850   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:17:46,923-Speed 3389.98 samples/sec   Loss 6.7301   LearningRate 0.0620   Epoch: 4   Global Step: 52860   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:17:50,006-Speed 3322.25 samples/sec   Loss 6.6167   LearningRate 0.0620   Epoch: 4   Global Step: 52870   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:17:53,063-Speed 3350.78 samples/sec   Loss 6.6601   LearningRate 0.0620   Epoch: 4   Global Step: 52880   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:17:56,117-Speed 3354.04 samples/sec   Loss 6.6906   LearningRate 0.0620   Epoch: 4   Global Step: 52890   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:17:59,188-Speed 3335.57 samples/sec   Loss 6.6407   LearningRate 0.0619   Epoch: 4   Global Step: 52900   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:18:02,221-Speed 3377.45 samples/sec   Loss 6.6183   LearningRate 0.0619   Epoch: 4   Global Step: 52910   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:18:05,230-Speed 3404.69 samples/sec   Loss 6.6986   LearningRate 0.0619   Epoch: 4   Global Step: 52920   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:18:08,226-Speed 3418.81 samples/sec   Loss 6.6679   LearningRate 0.0619   Epoch: 4   Global Step: 52930   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:18:11,275-Speed 3359.09 samples/sec   Loss 6.6127   LearningRate 0.0619   Epoch: 4   Global Step: 52940   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:18:14,341-Speed 3341.30 samples/sec   Loss 6.6502   LearningRate 0.0619   Epoch: 4   Global Step: 52950   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:18:17,365-Speed 3387.54 samples/sec   Loss 6.6295   LearningRate 0.0619   Epoch: 4   Global Step: 52960   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:18:20,366-Speed 3413.19 samples/sec   Loss 6.7360   LearningRate 0.0619   Epoch: 4   Global Step: 52970   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:18:23,413-Speed 3361.44 samples/sec   Loss 6.7470   LearningRate 0.0619   Epoch: 4   Global Step: 52980   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:18:26,484-Speed 3336.16 samples/sec   Loss 6.6278   LearningRate 0.0619   Epoch: 4   Global Step: 52990   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:18:29,519-Speed 3374.51 samples/sec   Loss 6.7165   LearningRate 0.0619   Epoch: 4   Global Step: 53000   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:18:32,542-Speed 3389.73 samples/sec   Loss 6.6252   LearningRate 0.0619   Epoch: 4   Global Step: 53010   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:18:35,584-Speed 3366.79 samples/sec   Loss 6.6728   LearningRate 0.0619   Epoch: 4   Global Step: 53020   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:18:38,676-Speed 3312.92 samples/sec   Loss 6.6165   LearningRate 0.0619   Epoch: 4   Global Step: 53030   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:18:41,756-Speed 3324.98 samples/sec   Loss 6.6882   LearningRate 0.0619   Epoch: 4   Global Step: 53040   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:18:44,759-Speed 3411.16 samples/sec   Loss 6.6242   LearningRate 0.0619   Epoch: 4   Global Step: 53050   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:18:47,791-Speed 3378.73 samples/sec   Loss 6.6871   LearningRate 0.0618   Epoch: 4   Global Step: 53060   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:18:50,895-Speed 3299.90 samples/sec   Loss 6.6545   LearningRate 0.0618   Epoch: 4   Global Step: 53070   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:18:53,959-Speed 3343.33 samples/sec   Loss 6.7056   LearningRate 0.0618   Epoch: 4   Global Step: 53080   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:18:57,008-Speed 3359.44 samples/sec   Loss 6.5633   LearningRate 0.0618   Epoch: 4   Global Step: 53090   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:19:00,029-Speed 3390.58 samples/sec   Loss 6.7089   LearningRate 0.0618   Epoch: 4   Global Step: 53100   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:19:03,093-Speed 3343.75 samples/sec   Loss 6.6911   LearningRate 0.0618   Epoch: 4   Global Step: 53110   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:19:06,115-Speed 3388.74 samples/sec   Loss 6.6997   LearningRate 0.0618   Epoch: 4   Global Step: 53120   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:19:09,120-Speed 3409.47 samples/sec   Loss 6.6224   LearningRate 0.0618   Epoch: 4   Global Step: 53130   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:19:12,136-Speed 3396.44 samples/sec   Loss 6.6003   LearningRate 0.0618   Epoch: 4   Global Step: 53140   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:19:15,203-Speed 3339.41 samples/sec   Loss 6.6368   LearningRate 0.0618   Epoch: 4   Global Step: 53150   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:19:18,303-Speed 3304.51 samples/sec   Loss 6.7129   LearningRate 0.0618   Epoch: 4   Global Step: 53160   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:19:21,317-Speed 3398.99 samples/sec   Loss 6.7478   LearningRate 0.0618   Epoch: 4   Global Step: 53170   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:19:24,398-Speed 3324.49 samples/sec   Loss 6.7546   LearningRate 0.0618   Epoch: 4   Global Step: 53180   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:19:27,497-Speed 3305.94 samples/sec   Loss 6.7183   LearningRate 0.0618   Epoch: 4   Global Step: 53190   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:19:30,570-Speed 3332.72 samples/sec   Loss 6.6873   LearningRate 0.0618   Epoch: 4   Global Step: 53200   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:19:33,571-Speed 3413.82 samples/sec   Loss 6.7375   LearningRate 0.0617   Epoch: 4   Global Step: 53210   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:19:36,637-Speed 3341.27 samples/sec   Loss 6.6505   LearningRate 0.0617   Epoch: 4   Global Step: 53220   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:19:39,711-Speed 3331.35 samples/sec   Loss 6.6702   LearningRate 0.0617   Epoch: 4   Global Step: 53230   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:19:42,747-Speed 3374.06 samples/sec   Loss 6.8159   LearningRate 0.0617   Epoch: 4   Global Step: 53240   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:19:45,771-Speed 3388.08 samples/sec   Loss 6.6095   LearningRate 0.0617   Epoch: 4   Global Step: 53250   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:19:48,798-Speed 3383.68 samples/sec   Loss 6.5472   LearningRate 0.0617   Epoch: 4   Global Step: 53260   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:19:51,884-Speed 3319.62 samples/sec   Loss 6.7021   LearningRate 0.0617   Epoch: 4   Global Step: 53270   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:19:54,950-Speed 3340.49 samples/sec   Loss 6.7360   LearningRate 0.0617   Epoch: 4   Global Step: 53280   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:19:57,955-Speed 3409.39 samples/sec   Loss 6.7148   LearningRate 0.0617   Epoch: 4   Global Step: 53290   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:20:00,974-Speed 3391.93 samples/sec   Loss 6.6544   LearningRate 0.0617   Epoch: 4   Global Step: 53300   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:20:04,044-Speed 3336.88 samples/sec   Loss 6.7017   LearningRate 0.0617   Epoch: 4   Global Step: 53310   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:20:07,068-Speed 3387.32 samples/sec   Loss 6.6050   LearningRate 0.0617   Epoch: 4   Global Step: 53320   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:20:10,074-Speed 3407.77 samples/sec   Loss 6.6211   LearningRate 0.0617   Epoch: 4   Global Step: 53330   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:20:13,143-Speed 3338.28 samples/sec   Loss 6.7565   LearningRate 0.0617   Epoch: 4   Global Step: 53340   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:20:16,221-Speed 3327.97 samples/sec   Loss 6.7200   LearningRate 0.0617   Epoch: 4   Global Step: 53350   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:20:19,295-Speed 3331.89 samples/sec   Loss 6.7495   LearningRate 0.0617   Epoch: 4   Global Step: 53360   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:20:22,300-Speed 3408.40 samples/sec   Loss 6.6112   LearningRate 0.0616   Epoch: 4   Global Step: 53370   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:20:25,312-Speed 3401.08 samples/sec   Loss 6.7133   LearningRate 0.0616   Epoch: 4   Global Step: 53380   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:20:28,395-Speed 3322.81 samples/sec   Loss 6.8569   LearningRate 0.0616   Epoch: 4   Global Step: 53390   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:20:31,414-Speed 3392.98 samples/sec   Loss 6.6806   LearningRate 0.0616   Epoch: 4   Global Step: 53400   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:20:34,482-Speed 3338.79 samples/sec   Loss 6.6819   LearningRate 0.0616   Epoch: 4   Global Step: 53410   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:20:37,554-Speed 3334.23 samples/sec   Loss 6.7400   LearningRate 0.0616   Epoch: 4   Global Step: 53420   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:20:40,609-Speed 3353.52 samples/sec   Loss 6.7259   LearningRate 0.0616   Epoch: 4   Global Step: 53430   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:20:43,671-Speed 3344.70 samples/sec   Loss 6.7451   LearningRate 0.0616   Epoch: 4   Global Step: 53440   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:20:46,665-Speed 3420.85 samples/sec   Loss 6.6623   LearningRate 0.0616   Epoch: 4   Global Step: 53450   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:20:49,752-Speed 3318.28 samples/sec   Loss 6.8562   LearningRate 0.0616   Epoch: 4   Global Step: 53460   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:20:52,791-Speed 3370.57 samples/sec   Loss 6.5693   LearningRate 0.0616   Epoch: 4   Global Step: 53470   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:20:55,854-Speed 3344.18 samples/sec   Loss 6.6679   LearningRate 0.0616   Epoch: 4   Global Step: 53480   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:20:58,891-Speed 3373.23 samples/sec   Loss 6.6423   LearningRate 0.0616   Epoch: 4   Global Step: 53490   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:21:01,968-Speed 3329.38 samples/sec   Loss 6.7911   LearningRate 0.0616   Epoch: 4   Global Step: 53500   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:21:05,056-Speed 3316.64 samples/sec   Loss 6.6038   LearningRate 0.0616   Epoch: 4   Global Step: 53510   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:21:08,084-Speed 3383.08 samples/sec   Loss 6.7124   LearningRate 0.0616   Epoch: 4   Global Step: 53520   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:21:11,081-Speed 3417.71 samples/sec   Loss 6.7854   LearningRate 0.0615   Epoch: 4   Global Step: 53530   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:21:14,170-Speed 3315.50 samples/sec   Loss 6.7184   LearningRate 0.0615   Epoch: 4   Global Step: 53540   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:21:17,238-Speed 3339.21 samples/sec   Loss 6.6434   LearningRate 0.0615   Epoch: 4   Global Step: 53550   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:21:20,267-Speed 3381.34 samples/sec   Loss 6.7538   LearningRate 0.0615   Epoch: 4   Global Step: 53560   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:21:23,364-Speed 3307.68 samples/sec   Loss 6.7280   LearningRate 0.0615   Epoch: 4   Global Step: 53570   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:21:26,457-Speed 3312.05 samples/sec   Loss 6.8778   LearningRate 0.0615   Epoch: 4   Global Step: 53580   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:21:29,513-Speed 3351.37 samples/sec   Loss 6.7981   LearningRate 0.0615   Epoch: 4   Global Step: 53590   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:21:32,516-Speed 3411.61 samples/sec   Loss 6.7482   LearningRate 0.0615   Epoch: 4   Global Step: 53600   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:21:35,601-Speed 3320.48 samples/sec   Loss 6.6538   LearningRate 0.0615   Epoch: 4   Global Step: 53610   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:21:38,646-Speed 3364.08 samples/sec   Loss 6.7008   LearningRate 0.0615   Epoch: 4   Global Step: 53620   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:21:41,686-Speed 3368.55 samples/sec   Loss 6.7213   LearningRate 0.0615   Epoch: 4   Global Step: 53630   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:21:44,712-Speed 3385.93 samples/sec   Loss 6.7242   LearningRate 0.0615   Epoch: 4   Global Step: 53640   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:21:47,782-Speed 3336.91 samples/sec   Loss 6.8224   LearningRate 0.0615   Epoch: 4   Global Step: 53650   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:21:50,817-Speed 3375.10 samples/sec   Loss 6.6882   LearningRate 0.0615   Epoch: 4   Global Step: 53660   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:21:53,858-Speed 3368.24 samples/sec   Loss 6.8121   LearningRate 0.0615   Epoch: 4   Global Step: 53670   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:21:56,853-Speed 3419.46 samples/sec   Loss 6.6688   LearningRate 0.0615   Epoch: 4   Global Step: 53680   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:21:59,887-Speed 3376.51 samples/sec   Loss 6.7579   LearningRate 0.0614   Epoch: 4   Global Step: 53690   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:22:02,957-Speed 3336.94 samples/sec   Loss 6.6243   LearningRate 0.0614   Epoch: 4   Global Step: 53700   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:22:05,967-Speed 3402.70 samples/sec   Loss 6.6259   LearningRate 0.0614   Epoch: 4   Global Step: 53710   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:22:08,981-Speed 3398.81 samples/sec   Loss 6.6580   LearningRate 0.0614   Epoch: 4   Global Step: 53720   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:22:11,998-Speed 3394.85 samples/sec   Loss 6.7690   LearningRate 0.0614   Epoch: 4   Global Step: 53730   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:22:14,982-Speed 3433.91 samples/sec   Loss 6.7174   LearningRate 0.0614   Epoch: 4   Global Step: 53740   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:22:17,996-Speed 3398.23 samples/sec   Loss 6.7255   LearningRate 0.0614   Epoch: 4   Global Step: 53750   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:22:21,023-Speed 3384.39 samples/sec   Loss 6.7017   LearningRate 0.0614   Epoch: 4   Global Step: 53760   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:22:24,043-Speed 3391.31 samples/sec   Loss 6.8105   LearningRate 0.0614   Epoch: 4   Global Step: 53770   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:22:27,091-Speed 3361.09 samples/sec   Loss 6.7868   LearningRate 0.0614   Epoch: 4   Global Step: 53780   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:22:30,111-Speed 3391.52 samples/sec   Loss 6.8100   LearningRate 0.0614   Epoch: 4   Global Step: 53790   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:22:33,152-Speed 3368.45 samples/sec   Loss 6.7448   LearningRate 0.0614   Epoch: 4   Global Step: 53800   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:22:36,218-Speed 3340.89 samples/sec   Loss 6.7787   LearningRate 0.0614   Epoch: 4   Global Step: 53810   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:22:39,260-Speed 3367.12 samples/sec   Loss 6.7858   LearningRate 0.0614   Epoch: 4   Global Step: 53820   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:22:42,273-Speed 3399.70 samples/sec   Loss 6.7507   LearningRate 0.0614   Epoch: 4   Global Step: 53830   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:22:45,307-Speed 3376.45 samples/sec   Loss 6.5959   LearningRate 0.0614   Epoch: 4   Global Step: 53840   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:22:48,364-Speed 3351.06 samples/sec   Loss 6.7770   LearningRate 0.0613   Epoch: 4   Global Step: 53850   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:22:51,453-Speed 3316.19 samples/sec   Loss 6.6932   LearningRate 0.0613   Epoch: 4   Global Step: 53860   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:22:54,486-Speed 3377.47 samples/sec   Loss 6.8256   LearningRate 0.0613   Epoch: 4   Global Step: 53870   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:22:57,500-Speed 3398.77 samples/sec   Loss 6.7208   LearningRate 0.0613   Epoch: 4   Global Step: 53880   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:23:00,553-Speed 3354.80 samples/sec   Loss 6.7550   LearningRate 0.0613   Epoch: 4   Global Step: 53890   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:23:03,585-Speed 3378.76 samples/sec   Loss 6.8410   LearningRate 0.0613   Epoch: 4   Global Step: 53900   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:23:06,592-Speed 3406.46 samples/sec   Loss 6.7931   LearningRate 0.0613   Epoch: 4   Global Step: 53910   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:23:09,583-Speed 3424.68 samples/sec   Loss 6.7551   LearningRate 0.0613   Epoch: 4   Global Step: 53920   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:23:12,650-Speed 3339.18 samples/sec   Loss 6.6210   LearningRate 0.0613   Epoch: 4   Global Step: 53930   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:23:15,718-Speed 3339.06 samples/sec   Loss 6.7153   LearningRate 0.0613   Epoch: 4   Global Step: 53940   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:23:18,780-Speed 3345.44 samples/sec   Loss 6.7619   LearningRate 0.0613   Epoch: 4   Global Step: 53950   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:23:21,783-Speed 3410.54 samples/sec   Loss 6.7466   LearningRate 0.0613   Epoch: 4   Global Step: 53960   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:23:24,789-Speed 3408.29 samples/sec   Loss 6.8397   LearningRate 0.0613   Epoch: 4   Global Step: 53970   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:23:27,816-Speed 3383.54 samples/sec   Loss 6.8322   LearningRate 0.0613   Epoch: 4   Global Step: 53980   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:23:30,836-Speed 3392.39 samples/sec   Loss 6.8203   LearningRate 0.0613   Epoch: 4   Global Step: 53990   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:23:33,881-Speed 3364.14 samples/sec   Loss 6.8876   LearningRate 0.0613   Epoch: 4   Global Step: 54000   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:23:36,974-Speed 3311.82 samples/sec   Loss 6.8210   LearningRate 0.0612   Epoch: 4   Global Step: 54010   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:23:40,017-Speed 3365.81 samples/sec   Loss 6.8049   LearningRate 0.0612   Epoch: 4   Global Step: 54020   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:23:43,155-Speed 3263.74 samples/sec   Loss 6.8333   LearningRate 0.0612   Epoch: 4   Global Step: 54030   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:23:46,175-Speed 3392.78 samples/sec   Loss 6.7921   LearningRate 0.0612   Epoch: 4   Global Step: 54040   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:23:49,266-Speed 3313.35 samples/sec   Loss 6.8464   LearningRate 0.0612   Epoch: 4   Global Step: 54050   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:23:52,330-Speed 3343.08 samples/sec   Loss 6.7626   LearningRate 0.0612   Epoch: 4   Global Step: 54060   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:23:55,364-Speed 3376.48 samples/sec   Loss 6.7767   LearningRate 0.0612   Epoch: 4   Global Step: 54070   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:23:58,429-Speed 3341.42 samples/sec   Loss 6.7596   LearningRate 0.0612   Epoch: 4   Global Step: 54080   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:24:01,495-Speed 3341.78 samples/sec   Loss 6.8625   LearningRate 0.0612   Epoch: 4   Global Step: 54090   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:24:04,545-Speed 3358.13 samples/sec   Loss 6.8132   LearningRate 0.0612   Epoch: 4   Global Step: 54100   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:24:07,618-Speed 3332.92 samples/sec   Loss 6.8850   LearningRate 0.0612   Epoch: 4   Global Step: 54110   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:24:10,665-Speed 3361.93 samples/sec   Loss 6.7419   LearningRate 0.0612   Epoch: 4   Global Step: 54120   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:24:13,697-Speed 3378.65 samples/sec   Loss 6.7911   LearningRate 0.0612   Epoch: 4   Global Step: 54130   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:24:16,743-Speed 3362.26 samples/sec   Loss 6.8285   LearningRate 0.0612   Epoch: 4   Global Step: 54140   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:24:19,763-Speed 3391.50 samples/sec   Loss 6.8302   LearningRate 0.0612   Epoch: 4   Global Step: 54150   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:24:22,772-Speed 3404.63 samples/sec   Loss 6.8241   LearningRate 0.0611   Epoch: 4   Global Step: 54160   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:24:25,766-Speed 3421.49 samples/sec   Loss 6.8684   LearningRate 0.0611   Epoch: 4   Global Step: 54170   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:24:28,824-Speed 3349.32 samples/sec   Loss 6.8416   LearningRate 0.0611   Epoch: 4   Global Step: 54180   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:24:31,927-Speed 3301.88 samples/sec   Loss 6.8553   LearningRate 0.0611   Epoch: 4   Global Step: 54190   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:24:34,941-Speed 3397.69 samples/sec   Loss 6.7763   LearningRate 0.0611   Epoch: 4   Global Step: 54200   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:24:38,039-Speed 3306.75 samples/sec   Loss 6.9252   LearningRate 0.0611   Epoch: 4   Global Step: 54210   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:24:41,122-Speed 3322.89 samples/sec   Loss 6.8336   LearningRate 0.0611   Epoch: 4   Global Step: 54220   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:24:44,140-Speed 3393.49 samples/sec   Loss 6.8921   LearningRate 0.0611   Epoch: 4   Global Step: 54230   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:24:47,165-Speed 3386.62 samples/sec   Loss 6.7608   LearningRate 0.0611   Epoch: 4   Global Step: 54240   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:24:50,245-Speed 3325.46 samples/sec   Loss 6.8681   LearningRate 0.0611   Epoch: 4   Global Step: 54250   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:24:53,336-Speed 3313.79 samples/sec   Loss 6.7591   LearningRate 0.0611   Epoch: 4   Global Step: 54260   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:24:56,344-Speed 3405.97 samples/sec   Loss 6.7915   LearningRate 0.0611   Epoch: 4   Global Step: 54270   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:24:59,353-Speed 3403.57 samples/sec   Loss 6.8847   LearningRate 0.0611   Epoch: 4   Global Step: 54280   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:25:02,404-Speed 3358.14 samples/sec   Loss 6.8980   LearningRate 0.0611   Epoch: 4   Global Step: 54290   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:25:05,433-Speed 3380.90 samples/sec   Loss 6.8601   LearningRate 0.0611   Epoch: 4   Global Step: 54300   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:25:08,467-Speed 3375.99 samples/sec   Loss 6.8559   LearningRate 0.0611   Epoch: 4   Global Step: 54310   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:25:11,484-Speed 3395.37 samples/sec   Loss 6.7877   LearningRate 0.0610   Epoch: 4   Global Step: 54320   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:25:14,534-Speed 3358.62 samples/sec   Loss 6.8065   LearningRate 0.0610   Epoch: 4   Global Step: 54330   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:25:17,574-Speed 3369.52 samples/sec   Loss 6.7694   LearningRate 0.0610   Epoch: 4   Global Step: 54340   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:25:20,587-Speed 3399.99 samples/sec   Loss 6.8129   LearningRate 0.0610   Epoch: 4   Global Step: 54350   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:25:23,585-Speed 3416.08 samples/sec   Loss 6.7973   LearningRate 0.0610   Epoch: 4   Global Step: 54360   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:25:26,632-Speed 3361.83 samples/sec   Loss 6.7639   LearningRate 0.0610   Epoch: 4   Global Step: 54370   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:25:29,641-Speed 3404.19 samples/sec   Loss 6.6843   LearningRate 0.0610   Epoch: 4   Global Step: 54380   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:25:32,673-Speed 3378.58 samples/sec   Loss 6.8349   LearningRate 0.0610   Epoch: 4   Global Step: 54390   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:25:35,685-Speed 3400.73 samples/sec   Loss 6.7463   LearningRate 0.0610   Epoch: 4   Global Step: 54400   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:25:38,721-Speed 3374.33 samples/sec   Loss 6.9029   LearningRate 0.0610   Epoch: 4   Global Step: 54410   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:25:41,805-Speed 3321.44 samples/sec   Loss 6.6719   LearningRate 0.0610   Epoch: 4   Global Step: 54420   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:25:44,821-Speed 3395.97 samples/sec   Loss 6.8509   LearningRate 0.0610   Epoch: 4   Global Step: 54430   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:25:47,852-Speed 3379.80 samples/sec   Loss 6.9617   LearningRate 0.0610   Epoch: 4   Global Step: 54440   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:25:50,882-Speed 3380.38 samples/sec   Loss 6.6932   LearningRate 0.0610   Epoch: 4   Global Step: 54450   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:25:53,967-Speed 3320.44 samples/sec   Loss 6.8162   LearningRate 0.0610   Epoch: 4   Global Step: 54460   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:25:57,002-Speed 3375.10 samples/sec   Loss 6.8308   LearningRate 0.0610   Epoch: 4   Global Step: 54470   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:26:00,013-Speed 3402.31 samples/sec   Loss 6.8478   LearningRate 0.0609   Epoch: 4   Global Step: 54480   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:26:03,079-Speed 3340.75 samples/sec   Loss 6.8113   LearningRate 0.0609   Epoch: 4   Global Step: 54490   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:26:06,101-Speed 3388.83 samples/sec   Loss 6.8763   LearningRate 0.0609   Epoch: 4   Global Step: 54500   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:26:09,112-Speed 3402.72 samples/sec   Loss 6.7882   LearningRate 0.0609   Epoch: 4   Global Step: 54510   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:26:12,146-Speed 3375.84 samples/sec   Loss 6.8869   LearningRate 0.0609   Epoch: 4   Global Step: 54520   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:26:15,198-Speed 3356.03 samples/sec   Loss 6.7613   LearningRate 0.0609   Epoch: 4   Global Step: 54530   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:26:18,290-Speed 3312.76 samples/sec   Loss 6.8042   LearningRate 0.0609   Epoch: 4   Global Step: 54540   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:26:21,285-Speed 3420.61 samples/sec   Loss 6.8150   LearningRate 0.0609   Epoch: 4   Global Step: 54550   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:26:24,290-Speed 3408.52 samples/sec   Loss 6.8870   LearningRate 0.0609   Epoch: 4   Global Step: 54560   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:26:27,289-Speed 3415.71 samples/sec   Loss 6.7079   LearningRate 0.0609   Epoch: 4   Global Step: 54570   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:26:30,329-Speed 3369.98 samples/sec   Loss 6.7501   LearningRate 0.0609   Epoch: 4   Global Step: 54580   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:26:33,345-Speed 3396.39 samples/sec   Loss 6.7403   LearningRate 0.0609   Epoch: 4   Global Step: 54590   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:26:36,380-Speed 3375.18 samples/sec   Loss 6.8993   LearningRate 0.0609   Epoch: 4   Global Step: 54600   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:26:39,383-Speed 3410.64 samples/sec   Loss 6.6977   LearningRate 0.0609   Epoch: 4   Global Step: 54610   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:26:42,402-Speed 3393.33 samples/sec   Loss 6.6458   LearningRate 0.0609   Epoch: 4   Global Step: 54620   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:26:45,408-Speed 3407.24 samples/sec   Loss 6.6515   LearningRate 0.0609   Epoch: 4   Global Step: 54630   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:26:48,429-Speed 3390.98 samples/sec   Loss 6.9202   LearningRate 0.0608   Epoch: 4   Global Step: 54640   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:26:51,459-Speed 3380.86 samples/sec   Loss 6.8175   LearningRate 0.0608   Epoch: 4   Global Step: 54650   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:26:54,504-Speed 3363.59 samples/sec   Loss 6.9099   LearningRate 0.0608   Epoch: 4   Global Step: 54660   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:26:57,503-Speed 3416.08 samples/sec   Loss 6.9320   LearningRate 0.0608   Epoch: 4   Global Step: 54670   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:27:00,552-Speed 3359.20 samples/sec   Loss 6.8069   LearningRate 0.0608   Epoch: 4   Global Step: 54680   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:27:03,589-Speed 3373.42 samples/sec   Loss 6.8481   LearningRate 0.0608   Epoch: 4   Global Step: 54690   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:27:06,667-Speed 3327.74 samples/sec   Loss 6.7409   LearningRate 0.0608   Epoch: 4   Global Step: 54700   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:27:09,675-Speed 3405.35 samples/sec   Loss 6.7882   LearningRate 0.0608   Epoch: 4   Global Step: 54710   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:27:12,737-Speed 3345.04 samples/sec   Loss 6.8015   LearningRate 0.0608   Epoch: 4   Global Step: 54720   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:27:15,875-Speed 3264.21 samples/sec   Loss 6.7850   LearningRate 0.0608   Epoch: 4   Global Step: 54730   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:27:18,995-Speed 3283.11 samples/sec   Loss 6.9141   LearningRate 0.0608   Epoch: 4   Global Step: 54740   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:27:22,010-Speed 3397.48 samples/sec   Loss 6.8934   LearningRate 0.0608   Epoch: 4   Global Step: 54750   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:27:25,123-Speed 3290.49 samples/sec   Loss 6.8373   LearningRate 0.0608   Epoch: 4   Global Step: 54760   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:27:28,207-Speed 3321.69 samples/sec   Loss 6.8481   LearningRate 0.0608   Epoch: 4   Global Step: 54770   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:27:31,310-Speed 3301.23 samples/sec   Loss 6.8484   LearningRate 0.0608   Epoch: 4   Global Step: 54780   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:27:34,352-Speed 3367.29 samples/sec   Loss 6.9815   LearningRate 0.0608   Epoch: 4   Global Step: 54790   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:27:37,470-Speed 3285.43 samples/sec   Loss 6.9470   LearningRate 0.0607   Epoch: 4   Global Step: 54800   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:27:40,524-Speed 3354.34 samples/sec   Loss 6.8486   LearningRate 0.0607   Epoch: 4   Global Step: 54810   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:27:43,648-Speed 3278.34 samples/sec   Loss 6.8334   LearningRate 0.0607   Epoch: 4   Global Step: 54820   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:27:46,664-Speed 3396.67 samples/sec   Loss 6.8410   LearningRate 0.0607   Epoch: 4   Global Step: 54830   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:27:49,704-Speed 3369.47 samples/sec   Loss 6.7820   LearningRate 0.0607   Epoch: 4   Global Step: 54840   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:27:52,852-Speed 3253.48 samples/sec   Loss 6.9031   LearningRate 0.0607   Epoch: 4   Global Step: 54850   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:27:55,864-Speed 3401.36 samples/sec   Loss 6.7258   LearningRate 0.0607   Epoch: 4   Global Step: 54860   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:27:58,906-Speed 3366.52 samples/sec   Loss 6.8123   LearningRate 0.0607   Epoch: 4   Global Step: 54870   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:28:01,968-Speed 3345.74 samples/sec   Loss 6.8087   LearningRate 0.0607   Epoch: 4   Global Step: 54880   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:28:05,054-Speed 3319.10 samples/sec   Loss 6.8071   LearningRate 0.0607   Epoch: 4   Global Step: 54890   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 06:28:08,086-Speed 3378.91 samples/sec   Loss 6.8148   LearningRate 0.0607   Epoch: 4   Global Step: 54900   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-04-27 06:28:11,120-Speed 3376.01 samples/sec   Loss 6.9082   LearningRate 0.0607   Epoch: 4   Global Step: 54910   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:28:14,203-Speed 3322.32 samples/sec   Loss 6.7856   LearningRate 0.0607   Epoch: 4   Global Step: 54920   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:28:17,265-Speed 3344.84 samples/sec   Loss 6.7327   LearningRate 0.0607   Epoch: 4   Global Step: 54930   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:28:20,268-Speed 3411.55 samples/sec   Loss 6.9449   LearningRate 0.0607   Epoch: 4   Global Step: 54940   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:28:23,332-Speed 3343.40 samples/sec   Loss 6.9213   LearningRate 0.0607   Epoch: 4   Global Step: 54950   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:28:26,355-Speed 3388.73 samples/sec   Loss 6.8140   LearningRate 0.0606   Epoch: 4   Global Step: 54960   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:28:29,448-Speed 3311.00 samples/sec   Loss 6.8562   LearningRate 0.0606   Epoch: 4   Global Step: 54970   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:28:32,522-Speed 3332.75 samples/sec   Loss 6.7971   LearningRate 0.0606   Epoch: 4   Global Step: 54980   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:28:35,571-Speed 3359.34 samples/sec   Loss 6.8553   LearningRate 0.0606   Epoch: 4   Global Step: 54990   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:28:38,695-Speed 3278.51 samples/sec   Loss 6.8688   LearningRate 0.0606   Epoch: 4   Global Step: 55000   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:28:41,811-Speed 3287.67 samples/sec   Loss 6.7569   LearningRate 0.0606   Epoch: 4   Global Step: 55010   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:28:44,871-Speed 3346.75 samples/sec   Loss 6.8997   LearningRate 0.0606   Epoch: 4   Global Step: 55020   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:28:47,947-Speed 3330.17 samples/sec   Loss 6.8070   LearningRate 0.0606   Epoch: 4   Global Step: 55030   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-04-27 06:28:50,977-Speed 3380.63 samples/sec   Loss 6.9622   LearningRate 0.0606   Epoch: 4   Global Step: 55040   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:28:54,048-Speed 3335.15 samples/sec   Loss 6.8301   LearningRate 0.0606   Epoch: 4   Global Step: 55050   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:28:57,050-Speed 3412.78 samples/sec   Loss 6.8390   LearningRate 0.0606   Epoch: 4   Global Step: 55060   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:29:00,042-Speed 3423.31 samples/sec   Loss 6.8647   LearningRate 0.0606   Epoch: 4   Global Step: 55070   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:29:03,139-Speed 3307.96 samples/sec   Loss 6.9625   LearningRate 0.0606   Epoch: 4   Global Step: 55080   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:29:06,225-Speed 3318.35 samples/sec   Loss 6.7607   LearningRate 0.0606   Epoch: 4   Global Step: 55090   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:29:09,239-Speed 3399.47 samples/sec   Loss 6.7323   LearningRate 0.0606   Epoch: 4   Global Step: 55100   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:29:12,330-Speed 3313.48 samples/sec   Loss 6.8832   LearningRate 0.0606   Epoch: 4   Global Step: 55110   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:29:15,401-Speed 3336.12 samples/sec   Loss 6.9589   LearningRate 0.0605   Epoch: 4   Global Step: 55120   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:29:18,450-Speed 3358.56 samples/sec   Loss 6.8875   LearningRate 0.0605   Epoch: 4   Global Step: 55130   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-04-27 06:29:21,478-Speed 3383.01 samples/sec   Loss 6.8928   LearningRate 0.0605   Epoch: 4   Global Step: 55140   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:29:24,538-Speed 3347.44 samples/sec   Loss 6.7525   LearningRate 0.0605   Epoch: 4   Global Step: 55150   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-04-27 06:29:27,629-Speed 3314.34 samples/sec   Loss 6.9212   LearningRate 0.0605   Epoch: 4   Global Step: 55160   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:29:30,672-Speed 3365.70 samples/sec   Loss 6.8991   LearningRate 0.0605   Epoch: 4   Global Step: 55170   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:29:33,761-Speed 3316.50 samples/sec   Loss 6.9627   LearningRate 0.0605   Epoch: 4   Global Step: 55180   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:29:36,834-Speed 3333.85 samples/sec   Loss 6.8404   LearningRate 0.0605   Epoch: 4   Global Step: 55190   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:29:39,906-Speed 3333.40 samples/sec   Loss 6.8001   LearningRate 0.0605   Epoch: 4   Global Step: 55200   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:29:42,968-Speed 3345.91 samples/sec   Loss 6.8053   LearningRate 0.0605   Epoch: 4   Global Step: 55210   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:29:45,972-Speed 3410.40 samples/sec   Loss 6.9141   LearningRate 0.0605   Epoch: 4   Global Step: 55220   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:29:49,013-Speed 3367.93 samples/sec   Loss 6.8452   LearningRate 0.0605   Epoch: 4   Global Step: 55230   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:29:52,122-Speed 3295.30 samples/sec   Loss 6.7308   LearningRate 0.0605   Epoch: 4   Global Step: 55240   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:29:55,161-Speed 3370.78 samples/sec   Loss 6.8417   LearningRate 0.0605   Epoch: 4   Global Step: 55250   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:29:58,204-Speed 3365.36 samples/sec   Loss 6.8964   LearningRate 0.0605   Epoch: 4   Global Step: 55260   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:30:01,244-Speed 3369.54 samples/sec   Loss 6.8247   LearningRate 0.0605   Epoch: 4   Global Step: 55270   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:30:04,271-Speed 3384.67 samples/sec   Loss 6.8380   LearningRate 0.0604   Epoch: 4   Global Step: 55280   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:30:07,342-Speed 3335.67 samples/sec   Loss 6.9271   LearningRate 0.0604   Epoch: 4   Global Step: 55290   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:30:10,350-Speed 3405.63 samples/sec   Loss 6.8625   LearningRate 0.0604   Epoch: 4   Global Step: 55300   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:30:13,399-Speed 3359.49 samples/sec   Loss 6.8619   LearningRate 0.0604   Epoch: 4   Global Step: 55310   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:30:16,479-Speed 3325.96 samples/sec   Loss 6.8798   LearningRate 0.0604   Epoch: 4   Global Step: 55320   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:30:19,528-Speed 3359.27 samples/sec   Loss 6.7474   LearningRate 0.0604   Epoch: 4   Global Step: 55330   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:30:22,549-Speed 3390.43 samples/sec   Loss 6.8970   LearningRate 0.0604   Epoch: 4   Global Step: 55340   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:30:25,610-Speed 3346.07 samples/sec   Loss 6.8912   LearningRate 0.0604   Epoch: 4   Global Step: 55350   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:30:28,658-Speed 3360.63 samples/sec   Loss 6.8852   LearningRate 0.0604   Epoch: 4   Global Step: 55360   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:30:31,691-Speed 3377.68 samples/sec   Loss 6.7683   LearningRate 0.0604   Epoch: 4   Global Step: 55370   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:30:34,718-Speed 3383.90 samples/sec   Loss 6.6415   LearningRate 0.0604   Epoch: 4   Global Step: 55380   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:30:37,774-Speed 3352.82 samples/sec   Loss 6.9350   LearningRate 0.0604   Epoch: 4   Global Step: 55390   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:30:40,816-Speed 3367.18 samples/sec   Loss 6.8546   LearningRate 0.0604   Epoch: 4   Global Step: 55400   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:30:43,859-Speed 3365.07 samples/sec   Loss 6.8692   LearningRate 0.0604   Epoch: 4   Global Step: 55410   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:30:46,922-Speed 3345.01 samples/sec   Loss 6.8319   LearningRate 0.0604   Epoch: 4   Global Step: 55420   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:30:50,068-Speed 3255.59 samples/sec   Loss 6.8684   LearningRate 0.0604   Epoch: 4   Global Step: 55430   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:30:53,108-Speed 3369.34 samples/sec   Loss 6.9289   LearningRate 0.0603   Epoch: 4   Global Step: 55440   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:30:56,142-Speed 3375.66 samples/sec   Loss 6.8925   LearningRate 0.0603   Epoch: 4   Global Step: 55450   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:30:59,152-Speed 3404.31 samples/sec   Loss 6.8443   LearningRate 0.0603   Epoch: 4   Global Step: 55460   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:31:02,210-Speed 3350.14 samples/sec   Loss 6.8663   LearningRate 0.0603   Epoch: 4   Global Step: 55470   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:31:05,310-Speed 3303.88 samples/sec   Loss 6.8326   LearningRate 0.0603   Epoch: 4   Global Step: 55480   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:31:08,336-Speed 3385.21 samples/sec   Loss 6.9410   LearningRate 0.0603   Epoch: 4   Global Step: 55490   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:31:11,380-Speed 3365.22 samples/sec   Loss 6.8358   LearningRate 0.0603   Epoch: 4   Global Step: 55500   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:31:14,428-Speed 3360.12 samples/sec   Loss 6.7013   LearningRate 0.0603   Epoch: 4   Global Step: 55510   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:31:17,473-Speed 3364.54 samples/sec   Loss 6.9503   LearningRate 0.0603   Epoch: 4   Global Step: 55520   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:31:20,492-Speed 3392.26 samples/sec   Loss 6.8862   LearningRate 0.0603   Epoch: 4   Global Step: 55530   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:31:23,523-Speed 3379.64 samples/sec   Loss 6.8157   LearningRate 0.0603   Epoch: 4   Global Step: 55540   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:31:26,549-Speed 3385.83 samples/sec   Loss 6.7684   LearningRate 0.0603   Epoch: 4   Global Step: 55550   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:31:29,645-Speed 3307.97 samples/sec   Loss 6.8686   LearningRate 0.0603   Epoch: 4   Global Step: 55560   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:31:32,661-Speed 3395.76 samples/sec   Loss 6.8279   LearningRate 0.0603   Epoch: 4   Global Step: 55570   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:31:35,721-Speed 3348.22 samples/sec   Loss 6.9316   LearningRate 0.0603   Epoch: 4   Global Step: 55580   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:31:38,818-Speed 3307.62 samples/sec   Loss 6.7737   LearningRate 0.0603   Epoch: 4   Global Step: 55590   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:31:41,897-Speed 3326.29 samples/sec   Loss 6.8395   LearningRate 0.0602   Epoch: 4   Global Step: 55600   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:31:44,937-Speed 3369.36 samples/sec   Loss 6.9260   LearningRate 0.0602   Epoch: 4   Global Step: 55610   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:31:47,947-Speed 3403.85 samples/sec   Loss 6.7795   LearningRate 0.0602   Epoch: 4   Global Step: 55620   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:31:51,022-Speed 3330.06 samples/sec   Loss 6.9177   LearningRate 0.0602   Epoch: 4   Global Step: 55630   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:31:54,119-Speed 3308.41 samples/sec   Loss 6.7946   LearningRate 0.0602   Epoch: 4   Global Step: 55640   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:31:57,181-Speed 3344.68 samples/sec   Loss 6.7777   LearningRate 0.0602   Epoch: 4   Global Step: 55650   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:32:00,239-Speed 3350.33 samples/sec   Loss 6.8495   LearningRate 0.0602   Epoch: 4   Global Step: 55660   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:32:03,323-Speed 3320.52 samples/sec   Loss 6.8856   LearningRate 0.0602   Epoch: 4   Global Step: 55670   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:32:06,340-Speed 3395.87 samples/sec   Loss 6.9335   LearningRate 0.0602   Epoch: 4   Global Step: 55680   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:32:09,359-Speed 3392.36 samples/sec   Loss 6.7794   LearningRate 0.0602   Epoch: 4   Global Step: 55690   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:32:12,403-Speed 3366.30 samples/sec   Loss 6.9090   LearningRate 0.0602   Epoch: 4   Global Step: 55700   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:32:15,432-Speed 3381.89 samples/sec   Loss 6.7232   LearningRate 0.0602   Epoch: 4   Global Step: 55710   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:32:18,472-Speed 3368.97 samples/sec   Loss 6.9114   LearningRate 0.0602   Epoch: 4   Global Step: 55720   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:32:21,483-Speed 3402.11 samples/sec   Loss 6.9574   LearningRate 0.0602   Epoch: 4   Global Step: 55730   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:32:24,513-Speed 3380.95 samples/sec   Loss 6.7792   LearningRate 0.0602   Epoch: 4   Global Step: 55740   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:32:27,571-Speed 3349.10 samples/sec   Loss 6.8384   LearningRate 0.0602   Epoch: 4   Global Step: 55750   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:32:30,616-Speed 3363.91 samples/sec   Loss 6.7983   LearningRate 0.0601   Epoch: 4   Global Step: 55760   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:32:33,628-Speed 3400.50 samples/sec   Loss 6.8304   LearningRate 0.0601   Epoch: 4   Global Step: 55770   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:32:36,696-Speed 3339.33 samples/sec   Loss 6.7561   LearningRate 0.0601   Epoch: 4   Global Step: 55780   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:32:39,750-Speed 3353.89 samples/sec   Loss 6.8576   LearningRate 0.0601   Epoch: 4   Global Step: 55790   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:32:42,795-Speed 3363.80 samples/sec   Loss 6.9402   LearningRate 0.0601   Epoch: 4   Global Step: 55800   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:32:45,827-Speed 3378.87 samples/sec   Loss 6.8693   LearningRate 0.0601   Epoch: 4   Global Step: 55810   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:32:48,917-Speed 3315.22 samples/sec   Loss 6.9055   LearningRate 0.0601   Epoch: 4   Global Step: 55820   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:32:52,007-Speed 3314.70 samples/sec   Loss 6.8822   LearningRate 0.0601   Epoch: 4   Global Step: 55830   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:32:55,025-Speed 3394.39 samples/sec   Loss 6.7994   LearningRate 0.0601   Epoch: 4   Global Step: 55840   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:32:58,043-Speed 3393.51 samples/sec   Loss 6.8663   LearningRate 0.0601   Epoch: 4   Global Step: 55850   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:33:01,079-Speed 3374.80 samples/sec   Loss 6.8251   LearningRate 0.0601   Epoch: 4   Global Step: 55860   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:33:04,157-Speed 3327.05 samples/sec   Loss 6.8797   LearningRate 0.0601   Epoch: 4   Global Step: 55870   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:33:07,216-Speed 3349.21 samples/sec   Loss 6.9622   LearningRate 0.0601   Epoch: 4   Global Step: 55880   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:33:10,205-Speed 3425.98 samples/sec   Loss 6.7625   LearningRate 0.0601   Epoch: 4   Global Step: 55890   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:33:13,226-Speed 3391.23 samples/sec   Loss 6.8308   LearningRate 0.0601   Epoch: 4   Global Step: 55900   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:33:16,289-Speed 3343.99 samples/sec   Loss 6.8104   LearningRate 0.0601   Epoch: 4   Global Step: 55910   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:33:19,409-Speed 3283.78 samples/sec   Loss 6.8693   LearningRate 0.0600   Epoch: 4   Global Step: 55920   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:33:22,469-Speed 3347.94 samples/sec   Loss 6.8224   LearningRate 0.0600   Epoch: 4   Global Step: 55930   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:33:25,622-Speed 3248.24 samples/sec   Loss 6.9010   LearningRate 0.0600   Epoch: 4   Global Step: 55940   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:33:28,710-Speed 3317.24 samples/sec   Loss 6.8719   LearningRate 0.0600   Epoch: 4   Global Step: 55950   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:33:31,729-Speed 3393.17 samples/sec   Loss 6.7447   LearningRate 0.0600   Epoch: 4   Global Step: 55960   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:33:34,755-Speed 3384.37 samples/sec   Loss 6.9583   LearningRate 0.0600   Epoch: 4   Global Step: 55970   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:33:37,845-Speed 3314.84 samples/sec   Loss 6.9575   LearningRate 0.0600   Epoch: 4   Global Step: 55980   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:33:40,922-Speed 3329.33 samples/sec   Loss 6.7777   LearningRate 0.0600   Epoch: 4   Global Step: 55990   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:33:43,982-Speed 3346.96 samples/sec   Loss 6.8724   LearningRate 0.0600   Epoch: 4   Global Step: 56000   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:33:47,026-Speed 3365.86 samples/sec   Loss 6.8693   LearningRate 0.0600   Epoch: 4   Global Step: 56010   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:33:50,117-Speed 3314.09 samples/sec   Loss 6.8637   LearningRate 0.0600   Epoch: 4   Global Step: 56020   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:33:53,140-Speed 3388.73 samples/sec   Loss 6.7948   LearningRate 0.0600   Epoch: 4   Global Step: 56030   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:33:56,192-Speed 3355.60 samples/sec   Loss 6.7401   LearningRate 0.0600   Epoch: 4   Global Step: 56040   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:33:59,202-Speed 3403.30 samples/sec   Loss 6.9986   LearningRate 0.0600   Epoch: 4   Global Step: 56050   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:34:02,300-Speed 3306.77 samples/sec   Loss 6.8838   LearningRate 0.0600   Epoch: 4   Global Step: 56060   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:34:05,381-Speed 3324.20 samples/sec   Loss 6.9940   LearningRate 0.0600   Epoch: 4   Global Step: 56070   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:34:08,434-Speed 3355.04 samples/sec   Loss 7.0090   LearningRate 0.0599   Epoch: 4   Global Step: 56080   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:34:11,496-Speed 3345.83 samples/sec   Loss 6.9033   LearningRate 0.0599   Epoch: 4   Global Step: 56090   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:34:14,621-Speed 3277.24 samples/sec   Loss 6.9203   LearningRate 0.0599   Epoch: 4   Global Step: 56100   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:34:17,651-Speed 3381.42 samples/sec   Loss 6.9473   LearningRate 0.0599   Epoch: 4   Global Step: 56110   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:34:20,694-Speed 3365.84 samples/sec   Loss 6.9254   LearningRate 0.0599   Epoch: 4   Global Step: 56120   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:34:23,789-Speed 3308.85 samples/sec   Loss 6.9042   LearningRate 0.0599   Epoch: 4   Global Step: 56130   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:34:26,849-Speed 3348.25 samples/sec   Loss 6.9536   LearningRate 0.0599   Epoch: 4   Global Step: 56140   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:34:29,873-Speed 3386.97 samples/sec   Loss 6.8644   LearningRate 0.0599   Epoch: 4   Global Step: 56150   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:34:32,926-Speed 3355.58 samples/sec   Loss 6.9083   LearningRate 0.0599   Epoch: 4   Global Step: 56160   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:34:35,946-Speed 3391.91 samples/sec   Loss 6.9082   LearningRate 0.0599   Epoch: 4   Global Step: 56170   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:34:38,994-Speed 3360.70 samples/sec   Loss 6.7213   LearningRate 0.0599   Epoch: 4   Global Step: 56180   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:34:42,060-Speed 3340.91 samples/sec   Loss 6.8043   LearningRate 0.0599   Epoch: 4   Global Step: 56190   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:34:45,097-Speed 3372.36 samples/sec   Loss 6.9065   LearningRate 0.0599   Epoch: 4   Global Step: 56200   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:34:48,154-Speed 3350.89 samples/sec   Loss 6.7853   LearningRate 0.0599   Epoch: 4   Global Step: 56210   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:34:51,218-Speed 3343.76 samples/sec   Loss 6.9160   LearningRate 0.0599   Epoch: 4   Global Step: 56220   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:34:54,264-Speed 3362.19 samples/sec   Loss 6.9789   LearningRate 0.0599   Epoch: 4   Global Step: 56230   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:34:57,300-Speed 3374.01 samples/sec   Loss 6.9549   LearningRate 0.0598   Epoch: 4   Global Step: 56240   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:35:00,358-Speed 3349.86 samples/sec   Loss 6.8831   LearningRate 0.0598   Epoch: 4   Global Step: 56250   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:35:03,408-Speed 3358.11 samples/sec   Loss 6.7880   LearningRate 0.0598   Epoch: 4   Global Step: 56260   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:35:06,436-Speed 3383.22 samples/sec   Loss 6.7779   LearningRate 0.0598   Epoch: 4   Global Step: 56270   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:35:09,455-Speed 3392.62 samples/sec   Loss 6.9586   LearningRate 0.0598   Epoch: 4   Global Step: 56280   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:35:12,509-Speed 3354.37 samples/sec   Loss 6.9367   LearningRate 0.0598   Epoch: 4   Global Step: 56290   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:35:15,551-Speed 3367.60 samples/sec   Loss 6.8211   LearningRate 0.0598   Epoch: 4   Global Step: 56300   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:35:18,602-Speed 3357.59 samples/sec   Loss 6.9260   LearningRate 0.0598   Epoch: 4   Global Step: 56310   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:35:21,603-Speed 3412.90 samples/sec   Loss 6.8549   LearningRate 0.0598   Epoch: 4   Global Step: 56320   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:35:24,675-Speed 3334.61 samples/sec   Loss 6.7991   LearningRate 0.0598   Epoch: 4   Global Step: 56330   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:35:27,699-Speed 3386.29 samples/sec   Loss 6.8801   LearningRate 0.0598   Epoch: 4   Global Step: 56340   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:35:30,796-Speed 3308.30 samples/sec   Loss 6.9206   LearningRate 0.0598   Epoch: 4   Global Step: 56350   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:35:33,808-Speed 3399.88 samples/sec   Loss 6.8905   LearningRate 0.0598   Epoch: 4   Global Step: 56360   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:35:36,900-Speed 3313.54 samples/sec   Loss 6.8635   LearningRate 0.0598   Epoch: 4   Global Step: 56370   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:35:40,017-Speed 3285.79 samples/sec   Loss 6.8160   LearningRate 0.0598   Epoch: 4   Global Step: 56380   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:35:43,080-Speed 3344.64 samples/sec   Loss 6.8544   LearningRate 0.0598   Epoch: 4   Global Step: 56390   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:35:46,143-Speed 3344.84 samples/sec   Loss 6.9341   LearningRate 0.0597   Epoch: 4   Global Step: 56400   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:35:49,178-Speed 3375.14 samples/sec   Loss 6.8144   LearningRate 0.0597   Epoch: 4   Global Step: 56410   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:35:52,250-Speed 3333.81 samples/sec   Loss 6.8996   LearningRate 0.0597   Epoch: 4   Global Step: 56420   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:35:55,322-Speed 3334.74 samples/sec   Loss 6.8197   LearningRate 0.0597   Epoch: 4   Global Step: 56430   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:35:58,342-Speed 3391.53 samples/sec   Loss 6.7500   LearningRate 0.0597   Epoch: 4   Global Step: 56440   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:36:01,378-Speed 3374.60 samples/sec   Loss 6.8444   LearningRate 0.0597   Epoch: 4   Global Step: 56450   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:36:04,419-Speed 3368.04 samples/sec   Loss 6.9248   LearningRate 0.0597   Epoch: 4   Global Step: 56460   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:36:07,456-Speed 3372.68 samples/sec   Loss 6.8427   LearningRate 0.0597   Epoch: 4   Global Step: 56470   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:36:10,598-Speed 3259.70 samples/sec   Loss 6.8348   LearningRate 0.0597   Epoch: 4   Global Step: 56480   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:36:13,642-Speed 3365.42 samples/sec   Loss 6.8356   LearningRate 0.0597   Epoch: 4   Global Step: 56490   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:36:16,691-Speed 3359.66 samples/sec   Loss 6.8717   LearningRate 0.0597   Epoch: 4   Global Step: 56500   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:36:19,732-Speed 3367.35 samples/sec   Loss 6.9760   LearningRate 0.0597   Epoch: 4   Global Step: 56510   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:36:22,788-Speed 3352.42 samples/sec   Loss 6.7945   LearningRate 0.0597   Epoch: 4   Global Step: 56520   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:36:25,820-Speed 3378.20 samples/sec   Loss 6.9099   LearningRate 0.0597   Epoch: 4   Global Step: 56530   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:36:28,830-Speed 3403.52 samples/sec   Loss 6.8928   LearningRate 0.0597   Epoch: 4   Global Step: 56540   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:36:31,854-Speed 3386.73 samples/sec   Loss 6.9169   LearningRate 0.0597   Epoch: 4   Global Step: 56550   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:36:34,906-Speed 3356.61 samples/sec   Loss 6.8044   LearningRate 0.0596   Epoch: 4   Global Step: 56560   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:36:37,944-Speed 3371.24 samples/sec   Loss 6.9329   LearningRate 0.0596   Epoch: 4   Global Step: 56570   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:36:41,007-Speed 3344.48 samples/sec   Loss 6.8740   LearningRate 0.0596   Epoch: 4   Global Step: 56580   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:36:44,007-Speed 3414.63 samples/sec   Loss 6.7588   LearningRate 0.0596   Epoch: 4   Global Step: 56590   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:36:47,085-Speed 3328.06 samples/sec   Loss 6.8257   LearningRate 0.0596   Epoch: 4   Global Step: 56600   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:36:50,158-Speed 3333.05 samples/sec   Loss 6.9156   LearningRate 0.0596   Epoch: 4   Global Step: 56610   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:36:53,257-Speed 3305.26 samples/sec   Loss 6.8031   LearningRate 0.0596   Epoch: 4   Global Step: 56620   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:36:56,314-Speed 3351.20 samples/sec   Loss 6.7380   LearningRate 0.0596   Epoch: 4   Global Step: 56630   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:36:59,332-Speed 3393.31 samples/sec   Loss 6.8816   LearningRate 0.0596   Epoch: 4   Global Step: 56640   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:37:02,349-Speed 3395.18 samples/sec   Loss 6.8021   LearningRate 0.0596   Epoch: 4   Global Step: 56650   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:37:05,398-Speed 3360.23 samples/sec   Loss 6.9154   LearningRate 0.0596   Epoch: 4   Global Step: 56660   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:37:08,426-Speed 3382.01 samples/sec   Loss 6.9752   LearningRate 0.0596   Epoch: 4   Global Step: 56670   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:37:11,466-Speed 3369.85 samples/sec   Loss 6.8030   LearningRate 0.0596   Epoch: 4   Global Step: 56680   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:37:14,498-Speed 3379.16 samples/sec   Loss 6.8639   LearningRate 0.0596   Epoch: 4   Global Step: 56690   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:37:17,589-Speed 3313.31 samples/sec   Loss 6.8495   LearningRate 0.0596   Epoch: 4   Global Step: 56700   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:37:20,633-Speed 3364.92 samples/sec   Loss 6.8736   LearningRate 0.0596   Epoch: 4   Global Step: 56710   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:37:23,640-Speed 3406.39 samples/sec   Loss 6.8665   LearningRate 0.0595   Epoch: 4   Global Step: 56720   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:37:26,657-Speed 3395.19 samples/sec   Loss 6.8550   LearningRate 0.0595   Epoch: 4   Global Step: 56730   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:37:29,683-Speed 3385.08 samples/sec   Loss 6.9461   LearningRate 0.0595   Epoch: 4   Global Step: 56740   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:37:32,678-Speed 3419.89 samples/sec   Loss 6.8103   LearningRate 0.0595   Epoch: 4   Global Step: 56750   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:37:35,687-Speed 3404.15 samples/sec   Loss 6.7819   LearningRate 0.0595   Epoch: 4   Global Step: 56760   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:37:38,792-Speed 3298.95 samples/sec   Loss 6.9466   LearningRate 0.0595   Epoch: 4   Global Step: 56770   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:37:41,858-Speed 3341.26 samples/sec   Loss 6.8486   LearningRate 0.0595   Epoch: 4   Global Step: 56780   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:37:44,875-Speed 3394.81 samples/sec   Loss 6.9466   LearningRate 0.0595   Epoch: 4   Global Step: 56790   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:37:47,913-Speed 3371.56 samples/sec   Loss 6.8361   LearningRate 0.0595   Epoch: 4   Global Step: 56800   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:37:50,953-Speed 3370.35 samples/sec   Loss 7.0021   LearningRate 0.0595   Epoch: 4   Global Step: 56810   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:37:53,992-Speed 3370.09 samples/sec   Loss 6.9548   LearningRate 0.0595   Epoch: 4   Global Step: 56820   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:37:57,013-Speed 3390.64 samples/sec   Loss 6.9252   LearningRate 0.0595   Epoch: 4   Global Step: 56830   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:38:00,062-Speed 3360.15 samples/sec   Loss 6.8626   LearningRate 0.0595   Epoch: 4   Global Step: 56840   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:38:03,122-Speed 3347.35 samples/sec   Loss 6.9823   LearningRate 0.0595   Epoch: 4   Global Step: 56850   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:38:06,144-Speed 3389.38 samples/sec   Loss 6.7828   LearningRate 0.0595   Epoch: 4   Global Step: 56860   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:38:09,141-Speed 3417.19 samples/sec   Loss 6.8360   LearningRate 0.0595   Epoch: 4   Global Step: 56870   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:38:12,165-Speed 3387.52 samples/sec   Loss 6.9465   LearningRate 0.0594   Epoch: 4   Global Step: 56880   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:38:15,189-Speed 3387.49 samples/sec   Loss 7.0022   LearningRate 0.0594   Epoch: 4   Global Step: 56890   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:38:18,299-Speed 3293.44 samples/sec   Loss 6.7278   LearningRate 0.0594   Epoch: 4   Global Step: 56900   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:38:21,300-Speed 3414.30 samples/sec   Loss 6.9336   LearningRate 0.0594   Epoch: 4   Global Step: 56910   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:38:24,324-Speed 3387.23 samples/sec   Loss 6.8985   LearningRate 0.0594   Epoch: 4   Global Step: 56920   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:38:27,332-Speed 3405.44 samples/sec   Loss 6.8500   LearningRate 0.0594   Epoch: 4   Global Step: 56930   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:38:30,374-Speed 3366.93 samples/sec   Loss 6.9627   LearningRate 0.0594   Epoch: 4   Global Step: 56940   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:38:33,393-Speed 3393.47 samples/sec   Loss 6.8556   LearningRate 0.0594   Epoch: 4   Global Step: 56950   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:38:36,400-Speed 3405.69 samples/sec   Loss 6.9445   LearningRate 0.0594   Epoch: 4   Global Step: 56960   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:38:39,477-Speed 3328.39 samples/sec   Loss 6.8507   LearningRate 0.0594   Epoch: 4   Global Step: 56970   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:38:42,505-Speed 3383.62 samples/sec   Loss 6.8479   LearningRate 0.0594   Epoch: 4   Global Step: 56980   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:38:45,509-Speed 3410.11 samples/sec   Loss 6.8370   LearningRate 0.0594   Epoch: 4   Global Step: 56990   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:38:48,538-Speed 3382.01 samples/sec   Loss 6.8558   LearningRate 0.0594   Epoch: 4   Global Step: 57000   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:38:51,582-Speed 3364.78 samples/sec   Loss 6.9309   LearningRate 0.0594   Epoch: 4   Global Step: 57010   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:38:54,636-Speed 3354.93 samples/sec   Loss 6.9882   LearningRate 0.0594   Epoch: 4   Global Step: 57020   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:38:57,637-Speed 3413.45 samples/sec   Loss 7.0360   LearningRate 0.0594   Epoch: 4   Global Step: 57030   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:39:00,662-Speed 3385.77 samples/sec   Loss 6.9467   LearningRate 0.0593   Epoch: 4   Global Step: 57040   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:39:03,726-Speed 3343.40 samples/sec   Loss 6.9056   LearningRate 0.0593   Epoch: 4   Global Step: 57050   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:39:06,779-Speed 3354.68 samples/sec   Loss 6.8194   LearningRate 0.0593   Epoch: 4   Global Step: 57060   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:39:09,779-Speed 3415.26 samples/sec   Loss 6.9165   LearningRate 0.0593   Epoch: 4   Global Step: 57070   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:39:12,813-Speed 3375.10 samples/sec   Loss 6.8669   LearningRate 0.0593   Epoch: 4   Global Step: 57080   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:39:15,900-Speed 3318.13 samples/sec   Loss 6.9303   LearningRate 0.0593   Epoch: 4   Global Step: 57090   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:39:18,963-Speed 3344.41 samples/sec   Loss 6.9495   LearningRate 0.0593   Epoch: 4   Global Step: 57100   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:39:21,982-Speed 3392.96 samples/sec   Loss 6.7895   LearningRate 0.0593   Epoch: 4   Global Step: 57110   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:39:25,016-Speed 3376.23 samples/sec   Loss 7.0131   LearningRate 0.0593   Epoch: 4   Global Step: 57120   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:39:28,088-Speed 3335.13 samples/sec   Loss 6.9065   LearningRate 0.0593   Epoch: 4   Global Step: 57130   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:39:31,170-Speed 3323.04 samples/sec   Loss 6.8802   LearningRate 0.0593   Epoch: 4   Global Step: 57140   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:39:34,190-Speed 3391.80 samples/sec   Loss 7.0122   LearningRate 0.0593   Epoch: 4   Global Step: 57150   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:39:37,264-Speed 3332.02 samples/sec   Loss 6.9166   LearningRate 0.0593   Epoch: 4   Global Step: 57160   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:39:40,314-Speed 3358.61 samples/sec   Loss 6.9173   LearningRate 0.0593   Epoch: 4   Global Step: 57170   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:39:43,338-Speed 3387.65 samples/sec   Loss 7.0148   LearningRate 0.0593   Epoch: 4   Global Step: 57180   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:39:46,348-Speed 3402.15 samples/sec   Loss 6.9372   LearningRate 0.0593   Epoch: 4   Global Step: 57190   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:39:49,376-Speed 3383.06 samples/sec   Loss 6.9183   LearningRate 0.0593   Epoch: 4   Global Step: 57200   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:39:52,464-Speed 3317.53 samples/sec   Loss 6.8626   LearningRate 0.0592   Epoch: 4   Global Step: 57210   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:39:55,576-Speed 3291.56 samples/sec   Loss 6.8967   LearningRate 0.0592   Epoch: 4   Global Step: 57220   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:39:58,599-Speed 3388.59 samples/sec   Loss 6.9552   LearningRate 0.0592   Epoch: 4   Global Step: 57230   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:40:01,705-Speed 3297.64 samples/sec   Loss 6.8808   LearningRate 0.0592   Epoch: 4   Global Step: 57240   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:40:04,757-Speed 3356.81 samples/sec   Loss 6.9182   LearningRate 0.0592   Epoch: 4   Global Step: 57250   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:40:07,777-Speed 3391.49 samples/sec   Loss 6.9337   LearningRate 0.0592   Epoch: 4   Global Step: 57260   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:40:10,799-Speed 3390.07 samples/sec   Loss 7.0449   LearningRate 0.0592   Epoch: 4   Global Step: 57270   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:40:13,801-Speed 3412.15 samples/sec   Loss 6.8060   LearningRate 0.0592   Epoch: 4   Global Step: 57280   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:40:16,868-Speed 3338.70 samples/sec   Loss 6.9837   LearningRate 0.0592   Epoch: 4   Global Step: 57290   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:40:19,938-Speed 3336.61 samples/sec   Loss 6.9635   LearningRate 0.0592   Epoch: 4   Global Step: 57300   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:40:22,929-Speed 3425.22 samples/sec   Loss 6.7937   LearningRate 0.0592   Epoch: 4   Global Step: 57310   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:40:25,956-Speed 3384.11 samples/sec   Loss 6.8213   LearningRate 0.0592   Epoch: 4   Global Step: 57320   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:40:29,005-Speed 3360.13 samples/sec   Loss 6.8366   LearningRate 0.0592   Epoch: 4   Global Step: 57330   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:40:32,034-Speed 3382.03 samples/sec   Loss 6.9293   LearningRate 0.0592   Epoch: 4   Global Step: 57340   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:40:35,067-Speed 3376.64 samples/sec   Loss 6.8616   LearningRate 0.0592   Epoch: 4   Global Step: 57350   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:40:38,098-Speed 3379.90 samples/sec   Loss 6.8543   LearningRate 0.0592   Epoch: 4   Global Step: 57360   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:40:41,189-Speed 3313.72 samples/sec   Loss 6.7445   LearningRate 0.0591   Epoch: 4   Global Step: 57370   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:40:44,228-Speed 3370.08 samples/sec   Loss 6.8440   LearningRate 0.0591   Epoch: 4   Global Step: 57380   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:40:47,265-Speed 3373.16 samples/sec   Loss 6.9705   LearningRate 0.0591   Epoch: 4   Global Step: 57390   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:40:50,336-Speed 3335.13 samples/sec   Loss 6.8303   LearningRate 0.0591   Epoch: 4   Global Step: 57400   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:40:53,393-Speed 3351.56 samples/sec   Loss 6.7961   LearningRate 0.0591   Epoch: 4   Global Step: 57410   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:40:56,412-Speed 3392.87 samples/sec   Loss 6.8745   LearningRate 0.0591   Epoch: 4   Global Step: 57420   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:40:59,473-Speed 3346.00 samples/sec   Loss 6.8051   LearningRate 0.0591   Epoch: 4   Global Step: 57430   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:41:02,524-Speed 3357.07 samples/sec   Loss 6.8703   LearningRate 0.0591   Epoch: 4   Global Step: 57440   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:41:05,567-Speed 3366.87 samples/sec   Loss 6.8938   LearningRate 0.0591   Epoch: 4   Global Step: 57450   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:41:08,615-Speed 3360.20 samples/sec   Loss 6.8587   LearningRate 0.0591   Epoch: 4   Global Step: 57460   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:41:11,718-Speed 3301.22 samples/sec   Loss 6.8307   LearningRate 0.0591   Epoch: 4   Global Step: 57470   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:41:14,833-Speed 3288.76 samples/sec   Loss 6.8493   LearningRate 0.0591   Epoch: 4   Global Step: 57480   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:41:17,866-Speed 3376.27 samples/sec   Loss 6.8251   LearningRate 0.0591   Epoch: 4   Global Step: 57490   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:41:20,903-Speed 3372.99 samples/sec   Loss 6.8614   LearningRate 0.0591   Epoch: 4   Global Step: 57500   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:41:23,960-Speed 3351.56 samples/sec   Loss 6.8651   LearningRate 0.0591   Epoch: 4   Global Step: 57510   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:41:27,029-Speed 3337.42 samples/sec   Loss 7.0051   LearningRate 0.0591   Epoch: 4   Global Step: 57520   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:41:30,077-Speed 3360.39 samples/sec   Loss 6.8821   LearningRate 0.0590   Epoch: 4   Global Step: 57530   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:41:33,094-Speed 3394.99 samples/sec   Loss 6.9618   LearningRate 0.0590   Epoch: 4   Global Step: 57540   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:41:36,115-Speed 3391.50 samples/sec   Loss 7.0310   LearningRate 0.0590   Epoch: 4   Global Step: 57550   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:41:39,135-Speed 3391.38 samples/sec   Loss 6.8843   LearningRate 0.0590   Epoch: 4   Global Step: 57560   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:41:42,142-Speed 3407.17 samples/sec   Loss 6.8917   LearningRate 0.0590   Epoch: 4   Global Step: 57570   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:41:45,154-Speed 3400.76 samples/sec   Loss 6.9679   LearningRate 0.0590   Epoch: 4   Global Step: 57580   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:41:48,266-Speed 3291.91 samples/sec   Loss 6.9818   LearningRate 0.0590   Epoch: 4   Global Step: 57590   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:41:51,328-Speed 3344.42 samples/sec   Loss 6.9053   LearningRate 0.0590   Epoch: 4   Global Step: 57600   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:41:54,356-Speed 3383.29 samples/sec   Loss 6.9377   LearningRate 0.0590   Epoch: 4   Global Step: 57610   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:41:57,393-Speed 3372.60 samples/sec   Loss 6.9953   LearningRate 0.0590   Epoch: 4   Global Step: 57620   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:42:00,424-Speed 3379.98 samples/sec   Loss 6.9316   LearningRate 0.0590   Epoch: 4   Global Step: 57630   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:42:03,524-Speed 3303.88 samples/sec   Loss 6.8980   LearningRate 0.0590   Epoch: 4   Global Step: 57640   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:42:06,605-Speed 3324.60 samples/sec   Loss 6.8541   LearningRate 0.0590   Epoch: 4   Global Step: 57650   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:42:09,621-Speed 3396.52 samples/sec   Loss 6.8734   LearningRate 0.0590   Epoch: 4   Global Step: 57660   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:42:12,627-Speed 3407.65 samples/sec   Loss 6.8648   LearningRate 0.0590   Epoch: 4   Global Step: 57670   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:42:15,646-Speed 3393.25 samples/sec   Loss 6.8789   LearningRate 0.0590   Epoch: 4   Global Step: 57680   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:42:18,670-Speed 3386.87 samples/sec   Loss 6.8303   LearningRate 0.0589   Epoch: 4   Global Step: 57690   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:42:21,691-Speed 3391.04 samples/sec   Loss 6.8687   LearningRate 0.0589   Epoch: 4   Global Step: 57700   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:42:24,777-Speed 3319.05 samples/sec   Loss 6.9011   LearningRate 0.0589   Epoch: 4   Global Step: 57710   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:42:27,812-Speed 3375.46 samples/sec   Loss 7.0172   LearningRate 0.0589   Epoch: 4   Global Step: 57720   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:42:30,822-Speed 3403.16 samples/sec   Loss 6.8700   LearningRate 0.0589   Epoch: 4   Global Step: 57730   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:42:33,869-Speed 3361.79 samples/sec   Loss 6.8388   LearningRate 0.0589   Epoch: 4   Global Step: 57740   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:42:36,933-Speed 3343.81 samples/sec   Loss 6.9067   LearningRate 0.0589   Epoch: 4   Global Step: 57750   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:42:39,962-Speed 3381.41 samples/sec   Loss 6.8337   LearningRate 0.0589   Epoch: 4   Global Step: 57760   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:42:43,034-Speed 3334.24 samples/sec   Loss 6.8626   LearningRate 0.0589   Epoch: 4   Global Step: 57770   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:42:46,049-Speed 3397.51 samples/sec   Loss 6.9326   LearningRate 0.0589   Epoch: 4   Global Step: 57780   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:42:49,070-Speed 3390.34 samples/sec   Loss 6.9153   LearningRate 0.0589   Epoch: 4   Global Step: 57790   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:42:52,188-Speed 3285.49 samples/sec   Loss 6.9339   LearningRate 0.0589   Epoch: 4   Global Step: 57800   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:42:55,242-Speed 3353.99 samples/sec   Loss 6.8277   LearningRate 0.0589   Epoch: 4   Global Step: 57810   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:42:58,254-Speed 3401.51 samples/sec   Loss 6.8734   LearningRate 0.0589   Epoch: 4   Global Step: 57820   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:43:01,256-Speed 3412.06 samples/sec   Loss 6.9221   LearningRate 0.0589   Epoch: 4   Global Step: 57830   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:43:04,339-Speed 3321.90 samples/sec   Loss 6.8575   LearningRate 0.0589   Epoch: 4   Global Step: 57840   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:43:07,349-Speed 3403.66 samples/sec   Loss 6.8733   LearningRate 0.0588   Epoch: 4   Global Step: 57850   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:43:10,362-Speed 3399.50 samples/sec   Loss 7.0689   LearningRate 0.0588   Epoch: 4   Global Step: 57860   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:43:13,420-Speed 3349.66 samples/sec   Loss 6.9681   LearningRate 0.0588   Epoch: 4   Global Step: 57870   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:43:16,527-Speed 3296.90 samples/sec   Loss 6.9002   LearningRate 0.0588   Epoch: 4   Global Step: 57880   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:43:19,612-Speed 3319.88 samples/sec   Loss 6.9249   LearningRate 0.0588   Epoch: 4   Global Step: 57890   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:43:22,625-Speed 3399.24 samples/sec   Loss 6.9478   LearningRate 0.0588   Epoch: 4   Global Step: 57900   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:43:25,669-Speed 3365.39 samples/sec   Loss 6.9117   LearningRate 0.0588   Epoch: 4   Global Step: 57910   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:43:28,748-Speed 3327.58 samples/sec   Loss 6.8018   LearningRate 0.0588   Epoch: 4   Global Step: 57920   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:43:31,766-Speed 3393.57 samples/sec   Loss 6.9311   LearningRate 0.0588   Epoch: 4   Global Step: 57930   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:43:34,825-Speed 3348.92 samples/sec   Loss 6.7437   LearningRate 0.0588   Epoch: 4   Global Step: 57940   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:43:37,902-Speed 3328.94 samples/sec   Loss 6.9300   LearningRate 0.0588   Epoch: 4   Global Step: 57950   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:43:40,973-Speed 3335.53 samples/sec   Loss 6.8969   LearningRate 0.0588   Epoch: 4   Global Step: 57960   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:43:43,980-Speed 3406.27 samples/sec   Loss 6.9192   LearningRate 0.0588   Epoch: 4   Global Step: 57970   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:43:47,024-Speed 3365.58 samples/sec   Loss 7.0629   LearningRate 0.0588   Epoch: 4   Global Step: 57980   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:43:50,136-Speed 3290.83 samples/sec   Loss 7.0359   LearningRate 0.0588   Epoch: 4   Global Step: 57990   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:43:53,182-Speed 3363.80 samples/sec   Loss 6.9456   LearningRate 0.0588   Epoch: 4   Global Step: 58000   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:43:56,190-Speed 3405.40 samples/sec   Loss 6.9095   LearningRate 0.0587   Epoch: 4   Global Step: 58010   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:43:59,224-Speed 3376.31 samples/sec   Loss 6.8406   LearningRate 0.0587   Epoch: 4   Global Step: 58020   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:44:02,302-Speed 3327.89 samples/sec   Loss 6.9076   LearningRate 0.0587   Epoch: 4   Global Step: 58030   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:44:05,366-Speed 3342.78 samples/sec   Loss 6.9638   LearningRate 0.0587   Epoch: 4   Global Step: 58040   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:44:08,434-Speed 3337.82 samples/sec   Loss 7.0027   LearningRate 0.0587   Epoch: 4   Global Step: 58050   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:44:11,487-Speed 3356.13 samples/sec   Loss 6.9123   LearningRate 0.0587   Epoch: 4   Global Step: 58060   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:44:14,538-Speed 3357.02 samples/sec   Loss 6.7888   LearningRate 0.0587   Epoch: 4   Global Step: 58070   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:44:17,579-Speed 3368.71 samples/sec   Loss 6.9489   LearningRate 0.0587   Epoch: 4   Global Step: 58080   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:44:20,590-Speed 3402.49 samples/sec   Loss 6.9075   LearningRate 0.0587   Epoch: 4   Global Step: 58090   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:44:23,594-Speed 3410.07 samples/sec   Loss 6.8639   LearningRate 0.0587   Epoch: 4   Global Step: 58100   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:44:26,648-Speed 3353.94 samples/sec   Loss 6.7663   LearningRate 0.0587   Epoch: 4   Global Step: 58110   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:44:29,658-Speed 3403.14 samples/sec   Loss 6.9077   LearningRate 0.0587   Epoch: 4   Global Step: 58120   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:44:32,726-Speed 3339.09 samples/sec   Loss 6.8626   LearningRate 0.0587   Epoch: 4   Global Step: 58130   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:44:35,752-Speed 3384.92 samples/sec   Loss 6.9164   LearningRate 0.0587   Epoch: 4   Global Step: 58140   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:44:38,803-Speed 3357.65 samples/sec   Loss 6.8805   LearningRate 0.0587   Epoch: 4   Global Step: 58150   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:44:41,913-Speed 3293.09 samples/sec   Loss 6.8231   LearningRate 0.0587   Epoch: 4   Global Step: 58160   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:44:44,994-Speed 3324.65 samples/sec   Loss 6.9436   LearningRate 0.0587   Epoch: 4   Global Step: 58170   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:44:48,059-Speed 3342.43 samples/sec   Loss 6.8388   LearningRate 0.0586   Epoch: 4   Global Step: 58180   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:44:51,067-Speed 3405.85 samples/sec   Loss 6.9063   LearningRate 0.0586   Epoch: 4   Global Step: 58190   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:44:54,144-Speed 3329.13 samples/sec   Loss 6.8784   LearningRate 0.0586   Epoch: 4   Global Step: 58200   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:44:57,142-Speed 3416.11 samples/sec   Loss 6.9389   LearningRate 0.0586   Epoch: 4   Global Step: 58210   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:45:00,190-Speed 3360.73 samples/sec   Loss 6.8482   LearningRate 0.0586   Epoch: 4   Global Step: 58220   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:45:03,263-Speed 3334.08 samples/sec   Loss 6.8117   LearningRate 0.0586   Epoch: 4   Global Step: 58230   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:45:06,294-Speed 3379.62 samples/sec   Loss 6.8646   LearningRate 0.0586   Epoch: 4   Global Step: 58240   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:45:09,321-Speed 3383.13 samples/sec   Loss 6.8642   LearningRate 0.0586   Epoch: 4   Global Step: 58250   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:45:12,396-Speed 3331.89 samples/sec   Loss 6.8912   LearningRate 0.0586   Epoch: 4   Global Step: 58260   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:45:15,434-Speed 3371.88 samples/sec   Loss 6.8293   LearningRate 0.0586   Epoch: 4   Global Step: 58270   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:45:18,456-Speed 3388.50 samples/sec   Loss 6.8599   LearningRate 0.0586   Epoch: 4   Global Step: 58280   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:45:21,484-Speed 3383.64 samples/sec   Loss 6.9423   LearningRate 0.0586   Epoch: 4   Global Step: 58290   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:45:24,504-Speed 3391.80 samples/sec   Loss 6.7795   LearningRate 0.0586   Epoch: 4   Global Step: 58300   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:45:27,513-Speed 3403.98 samples/sec   Loss 6.8491   LearningRate 0.0586   Epoch: 4   Global Step: 58310   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:45:30,624-Speed 3292.52 samples/sec   Loss 6.8058   LearningRate 0.0586   Epoch: 4   Global Step: 58320   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:45:33,661-Speed 3373.22 samples/sec   Loss 6.9566   LearningRate 0.0586   Epoch: 4   Global Step: 58330   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:45:36,728-Speed 3339.51 samples/sec   Loss 6.7720   LearningRate 0.0585   Epoch: 4   Global Step: 58340   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:45:39,834-Speed 3297.76 samples/sec   Loss 6.8309   LearningRate 0.0585   Epoch: 4   Global Step: 58350   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:45:42,916-Speed 3323.21 samples/sec   Loss 6.7664   LearningRate 0.0585   Epoch: 4   Global Step: 58360   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:45:45,914-Speed 3416.52 samples/sec   Loss 6.9046   LearningRate 0.0585   Epoch: 4   Global Step: 58370   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:45:48,990-Speed 3330.53 samples/sec   Loss 6.9057   LearningRate 0.0585   Epoch: 4   Global Step: 58380   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:45:52,006-Speed 3396.18 samples/sec   Loss 6.9372   LearningRate 0.0585   Epoch: 4   Global Step: 58390   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:45:55,048-Speed 3367.81 samples/sec   Loss 6.9416   LearningRate 0.0585   Epoch: 4   Global Step: 58400   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:45:58,061-Speed 3399.62 samples/sec   Loss 6.8209   LearningRate 0.0585   Epoch: 4   Global Step: 58410   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:46:01,088-Speed 3384.69 samples/sec   Loss 6.7575   LearningRate 0.0585   Epoch: 4   Global Step: 58420   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:46:04,193-Speed 3297.96 samples/sec   Loss 6.8543   LearningRate 0.0585   Epoch: 4   Global Step: 58430   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:46:07,288-Speed 3309.90 samples/sec   Loss 6.9249   LearningRate 0.0585   Epoch: 4   Global Step: 58440   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:46:10,310-Speed 3390.03 samples/sec   Loss 6.8382   LearningRate 0.0585   Epoch: 4   Global Step: 58450   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:46:13,429-Speed 3283.38 samples/sec   Loss 6.9891   LearningRate 0.0585   Epoch: 4   Global Step: 58460   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:46:16,539-Speed 3293.99 samples/sec   Loss 6.9140   LearningRate 0.0585   Epoch: 4   Global Step: 58470   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:46:19,568-Speed 3382.13 samples/sec   Loss 6.8529   LearningRate 0.0585   Epoch: 4   Global Step: 58480   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:46:22,586-Speed 3393.17 samples/sec   Loss 6.7759   LearningRate 0.0585   Epoch: 4   Global Step: 58490   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:46:25,671-Speed 3320.75 samples/sec   Loss 6.7626   LearningRate 0.0584   Epoch: 4   Global Step: 58500   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:46:28,739-Speed 3339.40 samples/sec   Loss 6.8882   LearningRate 0.0584   Epoch: 4   Global Step: 58510   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:46:31,787-Speed 3359.81 samples/sec   Loss 7.0252   LearningRate 0.0584   Epoch: 4   Global Step: 58520   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:46:34,843-Speed 3352.35 samples/sec   Loss 6.7675   LearningRate 0.0584   Epoch: 4   Global Step: 58530   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:46:37,880-Speed 3372.76 samples/sec   Loss 6.9054   LearningRate 0.0584   Epoch: 4   Global Step: 58540   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:46:40,900-Speed 3391.86 samples/sec   Loss 6.8411   LearningRate 0.0584   Epoch: 4   Global Step: 58550   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:46:43,941-Speed 3368.02 samples/sec   Loss 6.8144   LearningRate 0.0584   Epoch: 4   Global Step: 58560   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:46:46,992-Speed 3357.82 samples/sec   Loss 6.9970   LearningRate 0.0584   Epoch: 4   Global Step: 58570   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:46:50,069-Speed 3328.60 samples/sec   Loss 6.8481   LearningRate 0.0584   Epoch: 4   Global Step: 58580   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:46:53,151-Speed 3323.69 samples/sec   Loss 6.8006   LearningRate 0.0584   Epoch: 4   Global Step: 58590   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:46:56,252-Speed 3303.94 samples/sec   Loss 6.9491   LearningRate 0.0584   Epoch: 4   Global Step: 58600   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:46:59,303-Speed 3357.15 samples/sec   Loss 6.8560   LearningRate 0.0584   Epoch: 4   Global Step: 58610   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:47:02,367-Speed 3342.25 samples/sec   Loss 6.9405   LearningRate 0.0584   Epoch: 4   Global Step: 58620   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:47:05,432-Speed 3343.27 samples/sec   Loss 6.9244   LearningRate 0.0584   Epoch: 4   Global Step: 58630   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:47:08,465-Speed 3376.80 samples/sec   Loss 6.8990   LearningRate 0.0584   Epoch: 4   Global Step: 58640   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:47:11,495-Speed 3380.25 samples/sec   Loss 6.7174   LearningRate 0.0584   Epoch: 4   Global Step: 58650   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:47:14,577-Speed 3324.16 samples/sec   Loss 6.8047   LearningRate 0.0583   Epoch: 4   Global Step: 58660   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:47:17,636-Speed 3347.77 samples/sec   Loss 6.8142   LearningRate 0.0583   Epoch: 4   Global Step: 58670   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:47:20,669-Speed 3377.37 samples/sec   Loss 6.9884   LearningRate 0.0583   Epoch: 4   Global Step: 58680   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:47:23,698-Speed 3381.96 samples/sec   Loss 6.8452   LearningRate 0.0583   Epoch: 4   Global Step: 58690   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:47:26,767-Speed 3337.98 samples/sec   Loss 6.8291   LearningRate 0.0583   Epoch: 4   Global Step: 58700   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:47:29,842-Speed 3331.24 samples/sec   Loss 6.8652   LearningRate 0.0583   Epoch: 4   Global Step: 58710   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:47:32,888-Speed 3362.30 samples/sec   Loss 6.8208   LearningRate 0.0583   Epoch: 4   Global Step: 58720   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:47:35,939-Speed 3357.62 samples/sec   Loss 6.7956   LearningRate 0.0583   Epoch: 4   Global Step: 58730   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:47:38,989-Speed 3359.44 samples/sec   Loss 6.7771   LearningRate 0.0583   Epoch: 4   Global Step: 58740   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:47:42,079-Speed 3314.62 samples/sec   Loss 6.8061   LearningRate 0.0583   Epoch: 4   Global Step: 58750   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:47:45,121-Speed 3366.69 samples/sec   Loss 6.8241   LearningRate 0.0583   Epoch: 4   Global Step: 58760   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:47:48,187-Speed 3341.33 samples/sec   Loss 6.8172   LearningRate 0.0583   Epoch: 4   Global Step: 58770   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:47:51,215-Speed 3382.75 samples/sec   Loss 6.8542   LearningRate 0.0583   Epoch: 4   Global Step: 58780   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:47:54,248-Speed 3376.95 samples/sec   Loss 7.0127   LearningRate 0.0583   Epoch: 4   Global Step: 58790   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:47:57,295-Speed 3363.00 samples/sec   Loss 6.7668   LearningRate 0.0583   Epoch: 4   Global Step: 58800   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:48:00,398-Speed 3300.67 samples/sec   Loss 6.8686   LearningRate 0.0583   Epoch: 4   Global Step: 58810   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:48:03,436-Speed 3371.00 samples/sec   Loss 6.9717   LearningRate 0.0583   Epoch: 4   Global Step: 58820   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:48:06,489-Speed 3355.69 samples/sec   Loss 6.9044   LearningRate 0.0582   Epoch: 4   Global Step: 58830   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:48:09,526-Speed 3373.01 samples/sec   Loss 7.1049   LearningRate 0.0582   Epoch: 4   Global Step: 58840   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:48:12,566-Speed 3368.71 samples/sec   Loss 6.7485   LearningRate 0.0582   Epoch: 4   Global Step: 58850   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:48:15,619-Speed 3356.27 samples/sec   Loss 6.7009   LearningRate 0.0582   Epoch: 4   Global Step: 58860   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:48:18,688-Speed 3337.56 samples/sec   Loss 6.8327   LearningRate 0.0582   Epoch: 4   Global Step: 58870   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:48:21,688-Speed 3413.53 samples/sec   Loss 6.8425   LearningRate 0.0582   Epoch: 4   Global Step: 58880   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:48:24,706-Speed 3395.21 samples/sec   Loss 6.8608   LearningRate 0.0582   Epoch: 4   Global Step: 58890   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:48:27,729-Speed 3387.98 samples/sec   Loss 6.8790   LearningRate 0.0582   Epoch: 4   Global Step: 58900   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:48:30,767-Speed 3371.52 samples/sec   Loss 6.7943   LearningRate 0.0582   Epoch: 4   Global Step: 58910   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:48:33,803-Speed 3374.16 samples/sec   Loss 6.8115   LearningRate 0.0582   Epoch: 4   Global Step: 58920   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:48:36,821-Speed 3394.15 samples/sec   Loss 6.9341   LearningRate 0.0582   Epoch: 4   Global Step: 58930   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:48:39,867-Speed 3363.16 samples/sec   Loss 6.9132   LearningRate 0.0582   Epoch: 4   Global Step: 58940   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:48:42,979-Speed 3291.61 samples/sec   Loss 6.8710   LearningRate 0.0582   Epoch: 4   Global Step: 58950   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:48:46,016-Speed 3372.74 samples/sec   Loss 6.9318   LearningRate 0.0582   Epoch: 4   Global Step: 58960   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:48:49,068-Speed 3355.52 samples/sec   Loss 6.8760   LearningRate 0.0582   Epoch: 4   Global Step: 58970   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:48:52,143-Speed 3331.41 samples/sec   Loss 6.8281   LearningRate 0.0582   Epoch: 4   Global Step: 58980   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:48:55,181-Speed 3372.00 samples/sec   Loss 6.8749   LearningRate 0.0581   Epoch: 4   Global Step: 58990   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:48:58,217-Speed 3374.30 samples/sec   Loss 6.8892   LearningRate 0.0581   Epoch: 4   Global Step: 59000   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:49:01,283-Speed 3339.74 samples/sec   Loss 6.8914   LearningRate 0.0581   Epoch: 4   Global Step: 59010   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:49:04,337-Speed 3354.14 samples/sec   Loss 6.7380   LearningRate 0.0581   Epoch: 4   Global Step: 59020   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:49:07,352-Speed 3398.18 samples/sec   Loss 6.8460   LearningRate 0.0581   Epoch: 4   Global Step: 59030   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:49:10,374-Speed 3390.01 samples/sec   Loss 6.8912   LearningRate 0.0581   Epoch: 4   Global Step: 59040   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:49:13,485-Speed 3292.41 samples/sec   Loss 6.8332   LearningRate 0.0581   Epoch: 4   Global Step: 59050   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:49:16,521-Speed 3373.67 samples/sec   Loss 6.8169   LearningRate 0.0581   Epoch: 4   Global Step: 59060   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:49:19,549-Speed 3383.06 samples/sec   Loss 6.9359   LearningRate 0.0581   Epoch: 4   Global Step: 59070   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:49:22,567-Speed 3394.41 samples/sec   Loss 6.8918   LearningRate 0.0581   Epoch: 4   Global Step: 59080   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:49:25,599-Speed 3378.15 samples/sec   Loss 6.9524   LearningRate 0.0581   Epoch: 4   Global Step: 59090   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:49:28,679-Speed 3325.76 samples/sec   Loss 6.9131   LearningRate 0.0581   Epoch: 4   Global Step: 59100   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:49:31,761-Speed 3323.38 samples/sec   Loss 6.8469   LearningRate 0.0581   Epoch: 4   Global Step: 59110   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:49:34,821-Speed 3347.98 samples/sec   Loss 6.7803   LearningRate 0.0581   Epoch: 4   Global Step: 59120   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:49:37,871-Speed 3358.60 samples/sec   Loss 6.8157   LearningRate 0.0581   Epoch: 4   Global Step: 59130   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:49:40,925-Speed 3354.26 samples/sec   Loss 6.8206   LearningRate 0.0581   Epoch: 4   Global Step: 59140   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:49:43,958-Speed 3377.71 samples/sec   Loss 6.7974   LearningRate 0.0580   Epoch: 4   Global Step: 59150   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:49:46,962-Speed 3409.47 samples/sec   Loss 6.9391   LearningRate 0.0580   Epoch: 4   Global Step: 59160   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:49:49,971-Speed 3404.44 samples/sec   Loss 6.9036   LearningRate 0.0580   Epoch: 4   Global Step: 59170   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:49:53,040-Speed 3337.31 samples/sec   Loss 6.8421   LearningRate 0.0580   Epoch: 4   Global Step: 59180   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:49:56,085-Speed 3363.95 samples/sec   Loss 6.8341   LearningRate 0.0580   Epoch: 4   Global Step: 59190   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:49:59,097-Speed 3400.80 samples/sec   Loss 6.9249   LearningRate 0.0580   Epoch: 4   Global Step: 59200   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:50:02,157-Speed 3347.40 samples/sec   Loss 7.0139   LearningRate 0.0580   Epoch: 4   Global Step: 59210   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:50:05,221-Speed 3343.18 samples/sec   Loss 6.7666   LearningRate 0.0580   Epoch: 4   Global Step: 59220   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:50:08,291-Speed 3336.65 samples/sec   Loss 6.9209   LearningRate 0.0580   Epoch: 4   Global Step: 59230   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:50:11,359-Speed 3338.65 samples/sec   Loss 6.9376   LearningRate 0.0580   Epoch: 4   Global Step: 59240   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:50:14,405-Speed 3363.57 samples/sec   Loss 6.7687   LearningRate 0.0580   Epoch: 4   Global Step: 59250   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:50:17,483-Speed 3327.15 samples/sec   Loss 6.8917   LearningRate 0.0580   Epoch: 4   Global Step: 59260   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:50:20,503-Speed 3391.79 samples/sec   Loss 6.6992   LearningRate 0.0580   Epoch: 4   Global Step: 59270   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:50:23,550-Speed 3362.15 samples/sec   Loss 6.9091   LearningRate 0.0580   Epoch: 4   Global Step: 59280   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:50:26,629-Speed 3326.99 samples/sec   Loss 6.9106   LearningRate 0.0580   Epoch: 4   Global Step: 59290   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:50:29,688-Speed 3347.64 samples/sec   Loss 6.9071   LearningRate 0.0580   Epoch: 4   Global Step: 59300   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:50:32,725-Speed 3372.98 samples/sec   Loss 6.8392   LearningRate 0.0580   Epoch: 4   Global Step: 59310   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:50:35,764-Speed 3371.10 samples/sec   Loss 6.8121   LearningRate 0.0579   Epoch: 4   Global Step: 59320   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:50:38,822-Speed 3349.37 samples/sec   Loss 6.8036   LearningRate 0.0579   Epoch: 4   Global Step: 59330   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:50:41,869-Speed 3361.71 samples/sec   Loss 6.8684   LearningRate 0.0579   Epoch: 4   Global Step: 59340   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:50:44,870-Speed 3413.30 samples/sec   Loss 6.7599   LearningRate 0.0579   Epoch: 4   Global Step: 59350   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:50:47,896-Speed 3385.82 samples/sec   Loss 6.9887   LearningRate 0.0579   Epoch: 4   Global Step: 59360   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:50:50,971-Speed 3330.93 samples/sec   Loss 6.8492   LearningRate 0.0579   Epoch: 4   Global Step: 59370   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:50:54,002-Speed 3379.59 samples/sec   Loss 6.8652   LearningRate 0.0579   Epoch: 4   Global Step: 59380   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:50:57,056-Speed 3354.89 samples/sec   Loss 6.8601   LearningRate 0.0579   Epoch: 4   Global Step: 59390   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:51:00,092-Speed 3373.47 samples/sec   Loss 6.9178   LearningRate 0.0579   Epoch: 4   Global Step: 59400   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:51:03,181-Speed 3315.62 samples/sec   Loss 6.8452   LearningRate 0.0579   Epoch: 4   Global Step: 59410   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:51:06,245-Speed 3343.67 samples/sec   Loss 6.6904   LearningRate 0.0579   Epoch: 4   Global Step: 59420   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:51:09,247-Speed 3411.60 samples/sec   Loss 6.9014   LearningRate 0.0579   Epoch: 4   Global Step: 59430   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:51:12,327-Speed 3326.36 samples/sec   Loss 6.8769   LearningRate 0.0579   Epoch: 4   Global Step: 59440   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:51:15,337-Speed 3403.69 samples/sec   Loss 6.9153   LearningRate 0.0579   Epoch: 4   Global Step: 59450   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:51:18,355-Speed 3393.41 samples/sec   Loss 6.8318   LearningRate 0.0579   Epoch: 4   Global Step: 59460   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:51:21,373-Speed 3393.47 samples/sec   Loss 6.9567   LearningRate 0.0579   Epoch: 4   Global Step: 59470   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:51:24,420-Speed 3362.78 samples/sec   Loss 6.7377   LearningRate 0.0578   Epoch: 4   Global Step: 59480   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:51:27,518-Speed 3306.22 samples/sec   Loss 6.8182   LearningRate 0.0578   Epoch: 4   Global Step: 59490   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:51:30,651-Speed 3269.37 samples/sec   Loss 6.9027   LearningRate 0.0578   Epoch: 4   Global Step: 59500   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:51:33,670-Speed 3392.61 samples/sec   Loss 6.8859   LearningRate 0.0578   Epoch: 4   Global Step: 59510   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:51:36,741-Speed 3335.91 samples/sec   Loss 6.8879   LearningRate 0.0578   Epoch: 4   Global Step: 59520   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:51:39,845-Speed 3299.63 samples/sec   Loss 6.7708   LearningRate 0.0578   Epoch: 4   Global Step: 59530   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:51:42,894-Speed 3359.90 samples/sec   Loss 6.8392   LearningRate 0.0578   Epoch: 4   Global Step: 59540   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:51:45,941-Speed 3362.04 samples/sec   Loss 6.8645   LearningRate 0.0578   Epoch: 4   Global Step: 59550   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:51:49,084-Speed 3258.33 samples/sec   Loss 6.8177   LearningRate 0.0578   Epoch: 4   Global Step: 59560   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:51:52,220-Speed 3266.67 samples/sec   Loss 6.9525   LearningRate 0.0578   Epoch: 4   Global Step: 59570   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:51:55,297-Speed 3329.56 samples/sec   Loss 6.8972   LearningRate 0.0578   Epoch: 4   Global Step: 59580   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:51:58,304-Speed 3406.01 samples/sec   Loss 6.8710   LearningRate 0.0578   Epoch: 4   Global Step: 59590   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:52:01,328-Speed 3387.67 samples/sec   Loss 6.8373   LearningRate 0.0578   Epoch: 4   Global Step: 59600   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:52:04,347-Speed 3392.49 samples/sec   Loss 6.7760   LearningRate 0.0578   Epoch: 4   Global Step: 59610   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:52:07,444-Speed 3307.57 samples/sec   Loss 6.8407   LearningRate 0.0578   Epoch: 4   Global Step: 59620   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:52:10,465-Speed 3390.48 samples/sec   Loss 6.8014   LearningRate 0.0578   Epoch: 4   Global Step: 59630   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:52:13,512-Speed 3362.04 samples/sec   Loss 6.8122   LearningRate 0.0577   Epoch: 4   Global Step: 59640   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:52:16,552-Speed 3370.34 samples/sec   Loss 6.9889   LearningRate 0.0577   Epoch: 4   Global Step: 59650   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:52:19,592-Speed 3368.91 samples/sec   Loss 7.0050   LearningRate 0.0577   Epoch: 4   Global Step: 59660   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:52:22,602-Speed 3402.86 samples/sec   Loss 6.9689   LearningRate 0.0577   Epoch: 4   Global Step: 59670   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:52:25,624-Speed 3390.06 samples/sec   Loss 6.8334   LearningRate 0.0577   Epoch: 4   Global Step: 59680   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:52:28,629-Speed 3409.02 samples/sec   Loss 6.7783   LearningRate 0.0577   Epoch: 4   Global Step: 59690   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:52:31,634-Speed 3407.85 samples/sec   Loss 6.8621   LearningRate 0.0577   Epoch: 4   Global Step: 59700   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:52:34,660-Speed 3386.05 samples/sec   Loss 6.8378   LearningRate 0.0577   Epoch: 4   Global Step: 59710   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:52:37,696-Speed 3373.95 samples/sec   Loss 6.9185   LearningRate 0.0577   Epoch: 4   Global Step: 59720   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:52:40,802-Speed 3297.45 samples/sec   Loss 6.8172   LearningRate 0.0577   Epoch: 4   Global Step: 59730   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:52:43,822-Speed 3391.66 samples/sec   Loss 6.8690   LearningRate 0.0577   Epoch: 4   Global Step: 59740   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:52:46,905-Speed 3322.25 samples/sec   Loss 6.8828   LearningRate 0.0577   Epoch: 4   Global Step: 59750   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:52:49,986-Speed 3324.67 samples/sec   Loss 6.9097   LearningRate 0.0577   Epoch: 4   Global Step: 59760   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:52:53,102-Speed 3287.86 samples/sec   Loss 6.9856   LearningRate 0.0577   Epoch: 4   Global Step: 59770   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:52:56,127-Speed 3385.43 samples/sec   Loss 6.8432   LearningRate 0.0577   Epoch: 4   Global Step: 59780   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 06:52:59,173-Speed 3363.56 samples/sec   Loss 6.9011   LearningRate 0.0577   Epoch: 4   Global Step: 59790   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 06:53:02,205-Speed 3378.66 samples/sec   Loss 6.8600   LearningRate 0.0577   Epoch: 4   Global Step: 59800   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:53:05,239-Speed 3376.11 samples/sec   Loss 7.0104   LearningRate 0.0576   Epoch: 4   Global Step: 59810   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:53:08,257-Speed 3394.56 samples/sec   Loss 6.8879   LearningRate 0.0576   Epoch: 4   Global Step: 59820   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:53:11,309-Speed 3355.19 samples/sec   Loss 6.8585   LearningRate 0.0576   Epoch: 4   Global Step: 59830   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:53:14,344-Speed 3375.78 samples/sec   Loss 6.7660   LearningRate 0.0576   Epoch: 4   Global Step: 59840   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:53:17,342-Speed 3417.08 samples/sec   Loss 6.8231   LearningRate 0.0576   Epoch: 4   Global Step: 59850   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:53:20,410-Speed 3338.74 samples/sec   Loss 6.8316   LearningRate 0.0576   Epoch: 4   Global Step: 59860   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:53:23,466-Speed 3352.13 samples/sec   Loss 6.8923   LearningRate 0.0576   Epoch: 4   Global Step: 59870   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:53:26,549-Speed 3322.22 samples/sec   Loss 6.8362   LearningRate 0.0576   Epoch: 4   Global Step: 59880   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:53:29,683-Speed 3267.95 samples/sec   Loss 6.9188   LearningRate 0.0576   Epoch: 4   Global Step: 59890   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:53:32,711-Speed 3383.18 samples/sec   Loss 6.8603   LearningRate 0.0576   Epoch: 4   Global Step: 59900   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:53:35,819-Speed 3295.85 samples/sec   Loss 6.8855   LearningRate 0.0576   Epoch: 4   Global Step: 59910   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:53:38,858-Speed 3370.84 samples/sec   Loss 6.8129   LearningRate 0.0576   Epoch: 4   Global Step: 59920   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:53:41,930-Speed 3334.20 samples/sec   Loss 6.8276   LearningRate 0.0576   Epoch: 4   Global Step: 59930   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:53:45,012-Speed 3323.77 samples/sec   Loss 6.8649   LearningRate 0.0576   Epoch: 4   Global Step: 59940   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:53:48,066-Speed 3354.48 samples/sec   Loss 6.8882   LearningRate 0.0576   Epoch: 4   Global Step: 59950   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:53:51,170-Speed 3298.89 samples/sec   Loss 6.8433   LearningRate 0.0576   Epoch: 4   Global Step: 59960   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:53:54,212-Speed 3367.20 samples/sec   Loss 6.9045   LearningRate 0.0575   Epoch: 4   Global Step: 59970   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:53:57,238-Speed 3385.34 samples/sec   Loss 6.8157   LearningRate 0.0575   Epoch: 4   Global Step: 59980   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:54:00,275-Speed 3373.76 samples/sec   Loss 6.8342   LearningRate 0.0575   Epoch: 4   Global Step: 59990   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:54:03,400-Speed 3277.28 samples/sec   Loss 6.8400   LearningRate 0.0575   Epoch: 4   Global Step: 60000   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:54:06,498-Speed 3306.66 samples/sec   Loss 6.7646   LearningRate 0.0575   Epoch: 4   Global Step: 60010   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:54:09,533-Speed 3374.85 samples/sec   Loss 6.8878   LearningRate 0.0575   Epoch: 4   Global Step: 60020   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:54:12,581-Speed 3360.17 samples/sec   Loss 6.9313   LearningRate 0.0575   Epoch: 4   Global Step: 60030   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:54:15,663-Speed 3324.53 samples/sec   Loss 6.7549   LearningRate 0.0575   Epoch: 4   Global Step: 60040   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:54:18,711-Speed 3359.81 samples/sec   Loss 6.8571   LearningRate 0.0575   Epoch: 4   Global Step: 60050   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:54:21,709-Speed 3416.99 samples/sec   Loss 6.8446   LearningRate 0.0575   Epoch: 4   Global Step: 60060   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:54:24,761-Speed 3357.19 samples/sec   Loss 6.9370   LearningRate 0.0575   Epoch: 4   Global Step: 60070   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:54:27,852-Speed 3313.41 samples/sec   Loss 6.8591   LearningRate 0.0575   Epoch: 4   Global Step: 60080   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:54:30,910-Speed 3349.73 samples/sec   Loss 6.8674   LearningRate 0.0575   Epoch: 4   Global Step: 60090   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:54:33,934-Speed 3386.58 samples/sec   Loss 6.7762   LearningRate 0.0575   Epoch: 4   Global Step: 60100   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:54:36,983-Speed 3360.67 samples/sec   Loss 6.8779   LearningRate 0.0575   Epoch: 4   Global Step: 60110   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:54:39,987-Speed 3409.75 samples/sec   Loss 6.8417   LearningRate 0.0575   Epoch: 4   Global Step: 60120   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:54:43,065-Speed 3327.18 samples/sec   Loss 6.9235   LearningRate 0.0574   Epoch: 4   Global Step: 60130   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:54:46,116-Speed 3356.94 samples/sec   Loss 6.9361   LearningRate 0.0574   Epoch: 4   Global Step: 60140   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:54:49,126-Speed 3404.14 samples/sec   Loss 6.8278   LearningRate 0.0574   Epoch: 4   Global Step: 60150   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:54:52,159-Speed 3377.33 samples/sec   Loss 6.8612   LearningRate 0.0574   Epoch: 4   Global Step: 60160   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:54:55,219-Speed 3347.60 samples/sec   Loss 6.7933   LearningRate 0.0574   Epoch: 4   Global Step: 60170   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:54:58,235-Speed 3396.20 samples/sec   Loss 6.4992   LearningRate 0.0574   Epoch: 4   Global Step: 60180   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:55:01,250-Speed 3397.93 samples/sec   Loss 6.8014   LearningRate 0.0574   Epoch: 4   Global Step: 60190   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:55:04,262-Speed 3400.58 samples/sec   Loss 6.8765   LearningRate 0.0574   Epoch: 4   Global Step: 60200   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:55:07,294-Speed 3378.23 samples/sec   Loss 6.8522   LearningRate 0.0574   Epoch: 4   Global Step: 60210   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:55:10,315-Speed 3390.92 samples/sec   Loss 6.8597   LearningRate 0.0574   Epoch: 4   Global Step: 60220   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:55:13,412-Speed 3307.44 samples/sec   Loss 6.9089   LearningRate 0.0574   Epoch: 4   Global Step: 60230   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:55:16,495-Speed 3322.73 samples/sec   Loss 6.9087   LearningRate 0.0574   Epoch: 4   Global Step: 60240   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:55:19,596-Speed 3302.93 samples/sec   Loss 6.8724   LearningRate 0.0574   Epoch: 4   Global Step: 60250   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:55:22,599-Speed 3410.78 samples/sec   Loss 6.9834   LearningRate 0.0574   Epoch: 4   Global Step: 60260   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:55:25,661-Speed 3346.23 samples/sec   Loss 6.7929   LearningRate 0.0574   Epoch: 4   Global Step: 60270   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:55:28,751-Speed 3314.89 samples/sec   Loss 6.7549   LearningRate 0.0574   Epoch: 4   Global Step: 60280   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:55:31,879-Speed 3273.96 samples/sec   Loss 6.9173   LearningRate 0.0574   Epoch: 4   Global Step: 60290   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:55:34,944-Speed 3342.89 samples/sec   Loss 6.8133   LearningRate 0.0573   Epoch: 4   Global Step: 60300   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:55:37,980-Speed 3373.51 samples/sec   Loss 6.9829   LearningRate 0.0573   Epoch: 4   Global Step: 60310   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:55:41,043-Speed 3343.59 samples/sec   Loss 6.8337   LearningRate 0.0573   Epoch: 4   Global Step: 60320   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:55:44,057-Speed 3399.16 samples/sec   Loss 6.6742   LearningRate 0.0573   Epoch: 4   Global Step: 60330   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:55:47,102-Speed 3363.18 samples/sec   Loss 6.8204   LearningRate 0.0573   Epoch: 4   Global Step: 60340   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:55:50,145-Speed 3366.61 samples/sec   Loss 6.8198   LearningRate 0.0573   Epoch: 4   Global Step: 60350   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:55:53,190-Speed 3364.65 samples/sec   Loss 6.8867   LearningRate 0.0573   Epoch: 4   Global Step: 60360   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:55:56,273-Speed 3322.02 samples/sec   Loss 6.8249   LearningRate 0.0573   Epoch: 4   Global Step: 60370   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:55:59,341-Speed 3338.79 samples/sec   Loss 6.8311   LearningRate 0.0573   Epoch: 4   Global Step: 60380   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:56:02,471-Speed 3272.84 samples/sec   Loss 6.7728   LearningRate 0.0573   Epoch: 4   Global Step: 60390   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:56:05,524-Speed 3355.32 samples/sec   Loss 6.7645   LearningRate 0.0573   Epoch: 4   Global Step: 60400   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:56:08,530-Speed 3407.11 samples/sec   Loss 6.6529   LearningRate 0.0573   Epoch: 4   Global Step: 60410   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:56:11,572-Speed 3367.37 samples/sec   Loss 6.6542   LearningRate 0.0573   Epoch: 4   Global Step: 60420   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:56:14,606-Speed 3376.36 samples/sec   Loss 6.8398   LearningRate 0.0573   Epoch: 4   Global Step: 60430   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:56:17,671-Speed 3342.06 samples/sec   Loss 6.7240   LearningRate 0.0573   Epoch: 4   Global Step: 60440   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:56:20,741-Speed 3336.37 samples/sec   Loss 6.8990   LearningRate 0.0573   Epoch: 4   Global Step: 60450   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:56:23,789-Speed 3360.89 samples/sec   Loss 6.9240   LearningRate 0.0572   Epoch: 4   Global Step: 60460   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:56:26,829-Speed 3369.06 samples/sec   Loss 6.9851   LearningRate 0.0572   Epoch: 4   Global Step: 60470   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:56:29,911-Speed 3323.62 samples/sec   Loss 6.9582   LearningRate 0.0572   Epoch: 4   Global Step: 60480   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:56:32,986-Speed 3330.98 samples/sec   Loss 6.9067   LearningRate 0.0572   Epoch: 4   Global Step: 60490   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:56:36,111-Speed 3277.72 samples/sec   Loss 6.8215   LearningRate 0.0572   Epoch: 4   Global Step: 60500   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:56:39,210-Speed 3305.58 samples/sec   Loss 6.8487   LearningRate 0.0572   Epoch: 4   Global Step: 60510   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:56:42,343-Speed 3269.52 samples/sec   Loss 6.9316   LearningRate 0.0572   Epoch: 4   Global Step: 60520   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:56:45,361-Speed 3393.88 samples/sec   Loss 6.8086   LearningRate 0.0572   Epoch: 4   Global Step: 60530   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:56:48,410-Speed 3360.33 samples/sec   Loss 6.8678   LearningRate 0.0572   Epoch: 4   Global Step: 60540   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:56:51,499-Speed 3316.27 samples/sec   Loss 6.9454   LearningRate 0.0572   Epoch: 4   Global Step: 60550   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:56:54,559-Speed 3347.75 samples/sec   Loss 6.8537   LearningRate 0.0572   Epoch: 4   Global Step: 60560   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:56:57,573-Speed 3398.06 samples/sec   Loss 6.8302   LearningRate 0.0572   Epoch: 4   Global Step: 60570   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:57:00,618-Speed 3363.74 samples/sec   Loss 6.8074   LearningRate 0.0572   Epoch: 4   Global Step: 60580   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:57:03,740-Speed 3281.05 samples/sec   Loss 6.9219   LearningRate 0.0572   Epoch: 4   Global Step: 60590   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:57:06,784-Speed 3366.22 samples/sec   Loss 6.9690   LearningRate 0.0572   Epoch: 4   Global Step: 60600   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:57:09,812-Speed 3382.43 samples/sec   Loss 6.9402   LearningRate 0.0572   Epoch: 4   Global Step: 60610   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:57:12,879-Speed 3339.78 samples/sec   Loss 6.7963   LearningRate 0.0572   Epoch: 4   Global Step: 60620   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:57:15,930-Speed 3357.72 samples/sec   Loss 6.8431   LearningRate 0.0571   Epoch: 4   Global Step: 60630   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:57:19,021-Speed 3313.51 samples/sec   Loss 6.7979   LearningRate 0.0571   Epoch: 4   Global Step: 60640   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:57:22,034-Speed 3399.42 samples/sec   Loss 6.8972   LearningRate 0.0571   Epoch: 4   Global Step: 60650   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:57:25,069-Speed 3375.70 samples/sec   Loss 6.7820   LearningRate 0.0571   Epoch: 4   Global Step: 60660   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 06:57:28,138-Speed 3338.42 samples/sec   Loss 6.7289   LearningRate 0.0571   Epoch: 4   Global Step: 60670   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:57:31,196-Speed 3349.00 samples/sec   Loss 6.9107   LearningRate 0.0571   Epoch: 4   Global Step: 60680   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:57:34,249-Speed 3355.71 samples/sec   Loss 6.8043   LearningRate 0.0571   Epoch: 4   Global Step: 60690   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:57:37,317-Speed 3338.35 samples/sec   Loss 6.9150   LearningRate 0.0571   Epoch: 4   Global Step: 60700   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:57:40,390-Speed 3333.04 samples/sec   Loss 6.8748   LearningRate 0.0571   Epoch: 4   Global Step: 60710   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:57:43,449-Speed 3348.30 samples/sec   Loss 6.8695   LearningRate 0.0571   Epoch: 4   Global Step: 60720   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:57:46,502-Speed 3355.45 samples/sec   Loss 6.9112   LearningRate 0.0571   Epoch: 4   Global Step: 60730   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:57:49,585-Speed 3322.46 samples/sec   Loss 6.7513   LearningRate 0.0571   Epoch: 4   Global Step: 60740   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:57:52,654-Speed 3337.79 samples/sec   Loss 6.8147   LearningRate 0.0571   Epoch: 4   Global Step: 60750   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:57:55,729-Speed 3330.92 samples/sec   Loss 6.7607   LearningRate 0.0571   Epoch: 4   Global Step: 60760   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:57:58,759-Speed 3381.15 samples/sec   Loss 6.8483   LearningRate 0.0571   Epoch: 4   Global Step: 60770   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:58:01,856-Speed 3307.58 samples/sec   Loss 6.7286   LearningRate 0.0571   Epoch: 4   Global Step: 60780   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:58:04,921-Speed 3341.78 samples/sec   Loss 6.8410   LearningRate 0.0570   Epoch: 4   Global Step: 60790   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:58:07,940-Speed 3392.38 samples/sec   Loss 6.8240   LearningRate 0.0570   Epoch: 4   Global Step: 60800   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:58:10,966-Speed 3385.84 samples/sec   Loss 6.8828   LearningRate 0.0570   Epoch: 4   Global Step: 60810   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:58:14,041-Speed 3331.11 samples/sec   Loss 6.8401   LearningRate 0.0570   Epoch: 4   Global Step: 60820   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:58:17,136-Speed 3309.73 samples/sec   Loss 6.7976   LearningRate 0.0570   Epoch: 4   Global Step: 60830   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:58:20,159-Speed 3387.89 samples/sec   Loss 6.7710   LearningRate 0.0570   Epoch: 4   Global Step: 60840   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:58:23,243-Speed 3321.95 samples/sec   Loss 6.7916   LearningRate 0.0570   Epoch: 4   Global Step: 60850   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:58:26,299-Speed 3352.22 samples/sec   Loss 6.8052   LearningRate 0.0570   Epoch: 4   Global Step: 60860   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:58:29,407-Speed 3295.82 samples/sec   Loss 6.7463   LearningRate 0.0570   Epoch: 4   Global Step: 60870   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:58:32,534-Speed 3275.70 samples/sec   Loss 6.8954   LearningRate 0.0570   Epoch: 4   Global Step: 60880   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:58:35,562-Speed 3383.97 samples/sec   Loss 6.8420   LearningRate 0.0570   Epoch: 4   Global Step: 60890   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:58:38,617-Speed 3352.92 samples/sec   Loss 6.8232   LearningRate 0.0570   Epoch: 4   Global Step: 60900   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:58:41,715-Speed 3306.73 samples/sec   Loss 6.8141   LearningRate 0.0570   Epoch: 4   Global Step: 60910   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:58:44,782-Speed 3338.97 samples/sec   Loss 6.6827   LearningRate 0.0570   Epoch: 4   Global Step: 60920   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:58:47,833-Speed 3357.69 samples/sec   Loss 6.8562   LearningRate 0.0570   Epoch: 4   Global Step: 60930   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:58:50,884-Speed 3357.72 samples/sec   Loss 6.8457   LearningRate 0.0570   Epoch: 4   Global Step: 60940   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:58:53,912-Speed 3382.96 samples/sec   Loss 6.8287   LearningRate 0.0569   Epoch: 4   Global Step: 60950   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:58:56,943-Speed 3379.70 samples/sec   Loss 6.7331   LearningRate 0.0569   Epoch: 4   Global Step: 60960   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:58:59,968-Speed 3386.30 samples/sec   Loss 6.8255   LearningRate 0.0569   Epoch: 4   Global Step: 60970   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:59:03,018-Speed 3357.25 samples/sec   Loss 6.8049   LearningRate 0.0569   Epoch: 4   Global Step: 60980   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:59:06,072-Speed 3354.47 samples/sec   Loss 6.8959   LearningRate 0.0569   Epoch: 4   Global Step: 60990   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:59:09,113-Speed 3368.12 samples/sec   Loss 6.8197   LearningRate 0.0569   Epoch: 4   Global Step: 61000   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:59:12,150-Speed 3372.82 samples/sec   Loss 6.7478   LearningRate 0.0569   Epoch: 4   Global Step: 61010   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:59:15,226-Speed 3330.48 samples/sec   Loss 6.7780   LearningRate 0.0569   Epoch: 4   Global Step: 61020   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 06:59:18,251-Speed 3386.45 samples/sec   Loss 6.7319   LearningRate 0.0569   Epoch: 4   Global Step: 61030   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:59:21,268-Speed 3395.09 samples/sec   Loss 6.8983   LearningRate 0.0569   Epoch: 4   Global Step: 61040   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:59:24,308-Speed 3368.99 samples/sec   Loss 6.8113   LearningRate 0.0569   Epoch: 4   Global Step: 61050   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:59:27,352-Speed 3365.41 samples/sec   Loss 6.7691   LearningRate 0.0569   Epoch: 4   Global Step: 61060   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:59:30,407-Speed 3353.44 samples/sec   Loss 6.8697   LearningRate 0.0569   Epoch: 4   Global Step: 61070   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:59:33,437-Speed 3380.62 samples/sec   Loss 6.8574   LearningRate 0.0569   Epoch: 4   Global Step: 61080   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:59:36,494-Speed 3351.01 samples/sec   Loss 6.8880   LearningRate 0.0569   Epoch: 4   Global Step: 61090   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:59:39,508-Speed 3398.49 samples/sec   Loss 6.8258   LearningRate 0.0569   Epoch: 4   Global Step: 61100   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:59:42,589-Speed 3324.14 samples/sec   Loss 6.7343   LearningRate 0.0569   Epoch: 4   Global Step: 61110   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:59:45,628-Speed 3370.55 samples/sec   Loss 6.8224   LearningRate 0.0568   Epoch: 4   Global Step: 61120   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:59:48,665-Speed 3373.13 samples/sec   Loss 6.8257   LearningRate 0.0568   Epoch: 4   Global Step: 61130   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 06:59:51,703-Speed 3372.29 samples/sec   Loss 6.9748   LearningRate 0.0568   Epoch: 4   Global Step: 61140   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:59:54,779-Speed 3330.24 samples/sec   Loss 6.9924   LearningRate 0.0568   Epoch: 4   Global Step: 61150   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 06:59:57,805-Speed 3384.81 samples/sec   Loss 6.8732   LearningRate 0.0568   Epoch: 4   Global Step: 61160   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:00:00,940-Speed 3267.75 samples/sec   Loss 6.6608   LearningRate 0.0568   Epoch: 4   Global Step: 61170   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:00:04,026-Speed 3319.04 samples/sec   Loss 6.7044   LearningRate 0.0568   Epoch: 4   Global Step: 61180   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:00:07,054-Speed 3383.17 samples/sec   Loss 6.8848   LearningRate 0.0568   Epoch: 4   Global Step: 61190   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:00:10,064-Speed 3403.27 samples/sec   Loss 6.7709   LearningRate 0.0568   Epoch: 4   Global Step: 61200   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:00:13,105-Speed 3367.59 samples/sec   Loss 6.7733   LearningRate 0.0568   Epoch: 4   Global Step: 61210   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:00:16,207-Speed 3302.43 samples/sec   Loss 6.9067   LearningRate 0.0568   Epoch: 4   Global Step: 61220   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:00:19,326-Speed 3284.08 samples/sec   Loss 6.7800   LearningRate 0.0568   Epoch: 4   Global Step: 61230   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:00:22,361-Speed 3375.72 samples/sec   Loss 6.7494   LearningRate 0.0568   Epoch: 4   Global Step: 61240   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:00:25,396-Speed 3374.99 samples/sec   Loss 6.7335   LearningRate 0.0568   Epoch: 4   Global Step: 61250   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:00:28,412-Speed 3396.63 samples/sec   Loss 6.8528   LearningRate 0.0568   Epoch: 4   Global Step: 61260   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:00:31,464-Speed 3355.42 samples/sec   Loss 6.7088   LearningRate 0.0568   Epoch: 4   Global Step: 61270   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:00:34,497-Speed 3378.47 samples/sec   Loss 6.8585   LearningRate 0.0567   Epoch: 4   Global Step: 61280   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:00:37,592-Speed 3309.17 samples/sec   Loss 6.8434   LearningRate 0.0567   Epoch: 4   Global Step: 61290   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:00:40,647-Speed 3352.97 samples/sec   Loss 6.9098   LearningRate 0.0567   Epoch: 4   Global Step: 61300   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:00:43,738-Speed 3314.63 samples/sec   Loss 6.9157   LearningRate 0.0567   Epoch: 4   Global Step: 61310   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:00:46,748-Speed 3403.00 samples/sec   Loss 6.7621   LearningRate 0.0567   Epoch: 4   Global Step: 61320   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:00:49,757-Speed 3403.38 samples/sec   Loss 6.8690   LearningRate 0.0567   Epoch: 4   Global Step: 61330   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:00:52,833-Speed 3330.62 samples/sec   Loss 6.8955   LearningRate 0.0567   Epoch: 4   Global Step: 61340   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:00:55,910-Speed 3328.62 samples/sec   Loss 6.7355   LearningRate 0.0567   Epoch: 4   Global Step: 61350   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:00:58,955-Speed 3364.52 samples/sec   Loss 6.8588   LearningRate 0.0567   Epoch: 4   Global Step: 61360   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:01:02,072-Speed 3286.23 samples/sec   Loss 6.8191   LearningRate 0.0567   Epoch: 4   Global Step: 61370   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:01:05,174-Speed 3301.96 samples/sec   Loss 6.8064   LearningRate 0.0567   Epoch: 4   Global Step: 61380   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:01:08,213-Speed 3370.76 samples/sec   Loss 6.7187   LearningRate 0.0567   Epoch: 4   Global Step: 61390   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:01:11,259-Speed 3362.47 samples/sec   Loss 6.7671   LearningRate 0.0567   Epoch: 4   Global Step: 61400   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:01:14,310-Speed 3357.53 samples/sec   Loss 6.8131   LearningRate 0.0567   Epoch: 4   Global Step: 61410   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:01:17,388-Speed 3327.69 samples/sec   Loss 6.8237   LearningRate 0.0567   Epoch: 4   Global Step: 61420   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:01:20,409-Speed 3391.54 samples/sec   Loss 6.8251   LearningRate 0.0567   Epoch: 4   Global Step: 61430   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:01:23,509-Speed 3303.63 samples/sec   Loss 6.6873   LearningRate 0.0567   Epoch: 4   Global Step: 61440   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:01:27,334-Speed 2678.12 samples/sec   Loss 6.6634   LearningRate 0.0566   Epoch: 4   Global Step: 61450   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:01:30,348-Speed 3398.07 samples/sec   Loss 6.7510   LearningRate 0.0566   Epoch: 4   Global Step: 61460   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:01:33,396-Speed 3360.89 samples/sec   Loss 6.7336   LearningRate 0.0566   Epoch: 4   Global Step: 61470   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:01:36,520-Speed 3278.94 samples/sec   Loss 6.8510   LearningRate 0.0566   Epoch: 4   Global Step: 61480   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:01:39,577-Speed 3350.68 samples/sec   Loss 6.7528   LearningRate 0.0566   Epoch: 4   Global Step: 61490   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:01:42,632-Speed 3352.94 samples/sec   Loss 6.7425   LearningRate 0.0566   Epoch: 4   Global Step: 61500   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:01:45,674-Speed 3367.70 samples/sec   Loss 6.9249   LearningRate 0.0566   Epoch: 4   Global Step: 61510   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:01:48,690-Speed 3396.06 samples/sec   Loss 6.8799   LearningRate 0.0566   Epoch: 4   Global Step: 61520   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:01:51,841-Speed 3250.42 samples/sec   Loss 6.7741   LearningRate 0.0566   Epoch: 4   Global Step: 61530   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:01:54,895-Speed 3354.20 samples/sec   Loss 6.6912   LearningRate 0.0566   Epoch: 4   Global Step: 61540   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:01:57,913-Speed 3393.57 samples/sec   Loss 6.8261   LearningRate 0.0566   Epoch: 4   Global Step: 61550   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:02:00,930-Speed 3396.14 samples/sec   Loss 6.7433   LearningRate 0.0566   Epoch: 4   Global Step: 61560   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:02:03,959-Speed 3380.99 samples/sec   Loss 6.7895   LearningRate 0.0566   Epoch: 4   Global Step: 61570   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:02:07,031-Speed 3335.56 samples/sec   Loss 6.8208   LearningRate 0.0566   Epoch: 4   Global Step: 61580   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:02:10,056-Speed 3385.72 samples/sec   Loss 6.8262   LearningRate 0.0566   Epoch: 4   Global Step: 61590   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:02:13,119-Speed 3343.44 samples/sec   Loss 6.9204   LearningRate 0.0566   Epoch: 4   Global Step: 61600   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:02:16,137-Speed 3394.62 samples/sec   Loss 6.8686   LearningRate 0.0565   Epoch: 4   Global Step: 61610   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:02:19,219-Speed 3324.02 samples/sec   Loss 6.8011   LearningRate 0.0565   Epoch: 4   Global Step: 61620   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:02:22,224-Speed 3408.77 samples/sec   Loss 6.6293   LearningRate 0.0565   Epoch: 4   Global Step: 61630   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:02:25,247-Speed 3388.24 samples/sec   Loss 6.7425   LearningRate 0.0565   Epoch: 4   Global Step: 61640   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:02:28,332-Speed 3320.17 samples/sec   Loss 6.8467   LearningRate 0.0565   Epoch: 4   Global Step: 61650   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:02:31,362-Speed 3380.32 samples/sec   Loss 6.7981   LearningRate 0.0565   Epoch: 4   Global Step: 61660   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:02:34,375-Speed 3400.15 samples/sec   Loss 6.6967   LearningRate 0.0565   Epoch: 4   Global Step: 61670   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:02:37,457-Speed 3324.10 samples/sec   Loss 6.7405   LearningRate 0.0565   Epoch: 4   Global Step: 61680   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:02:40,489-Speed 3377.53 samples/sec   Loss 6.7483   LearningRate 0.0565   Epoch: 4   Global Step: 61690   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:02:43,492-Speed 3411.39 samples/sec   Loss 6.6710   LearningRate 0.0565   Epoch: 4   Global Step: 61700   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:02:46,502-Speed 3403.50 samples/sec   Loss 6.7780   LearningRate 0.0565   Epoch: 4   Global Step: 61710   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:02:49,610-Speed 3295.33 samples/sec   Loss 6.9827   LearningRate 0.0565   Epoch: 4   Global Step: 61720   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:02:52,754-Speed 3257.58 samples/sec   Loss 6.7739   LearningRate 0.0565   Epoch: 4   Global Step: 61730   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:02:55,767-Speed 3400.07 samples/sec   Loss 6.7764   LearningRate 0.0565   Epoch: 4   Global Step: 61740   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:02:58,783-Speed 3396.09 samples/sec   Loss 6.7875   LearningRate 0.0565   Epoch: 4   Global Step: 61750   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:03:01,872-Speed 3316.65 samples/sec   Loss 6.7793   LearningRate 0.0565   Epoch: 4   Global Step: 61760   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:03:04,935-Speed 3343.78 samples/sec   Loss 6.8029   LearningRate 0.0565   Epoch: 4   Global Step: 61770   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:03:08,013-Speed 3328.21 samples/sec   Loss 6.7428   LearningRate 0.0564   Epoch: 4   Global Step: 61780   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:03:11,031-Speed 3394.11 samples/sec   Loss 6.7861   LearningRate 0.0564   Epoch: 4   Global Step: 61790   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:03:14,045-Speed 3397.74 samples/sec   Loss 6.8255   LearningRate 0.0564   Epoch: 4   Global Step: 61800   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:03:17,096-Speed 3357.85 samples/sec   Loss 6.6893   LearningRate 0.0564   Epoch: 4   Global Step: 61810   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:03:20,089-Speed 3422.87 samples/sec   Loss 6.7240   LearningRate 0.0564   Epoch: 4   Global Step: 61820   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:03:23,138-Speed 3358.85 samples/sec   Loss 6.8166   LearningRate 0.0564   Epoch: 4   Global Step: 61830   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:03:26,185-Speed 3361.91 samples/sec   Loss 6.7764   LearningRate 0.0564   Epoch: 4   Global Step: 61840   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:03:29,246-Speed 3346.14 samples/sec   Loss 6.8430   LearningRate 0.0564   Epoch: 4   Global Step: 61850   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:03:32,259-Speed 3399.57 samples/sec   Loss 6.7936   LearningRate 0.0564   Epoch: 4   Global Step: 61860   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:03:35,297-Speed 3372.34 samples/sec   Loss 6.7564   LearningRate 0.0564   Epoch: 4   Global Step: 61870   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:03:38,370-Speed 3333.49 samples/sec   Loss 6.9014   LearningRate 0.0564   Epoch: 4   Global Step: 61880   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:03:41,441-Speed 3334.91 samples/sec   Loss 6.7527   LearningRate 0.0564   Epoch: 4   Global Step: 61890   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:03:44,485-Speed 3365.27 samples/sec   Loss 6.7518   LearningRate 0.0564   Epoch: 4   Global Step: 61900   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:03:47,571-Speed 3319.01 samples/sec   Loss 6.7304   LearningRate 0.0564   Epoch: 4   Global Step: 61910   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:03:50,642-Speed 3335.46 samples/sec   Loss 6.9454   LearningRate 0.0564   Epoch: 4   Global Step: 61920   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:03:53,681-Speed 3371.14 samples/sec   Loss 6.8760   LearningRate 0.0564   Epoch: 4   Global Step: 61930   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:03:56,705-Speed 3386.56 samples/sec   Loss 6.8197   LearningRate 0.0563   Epoch: 4   Global Step: 61940   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:03:59,708-Speed 3411.40 samples/sec   Loss 6.7837   LearningRate 0.0563   Epoch: 4   Global Step: 61950   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:04:02,748-Speed 3369.06 samples/sec   Loss 6.8131   LearningRate 0.0563   Epoch: 4   Global Step: 61960   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:04:05,812-Speed 3343.78 samples/sec   Loss 6.7037   LearningRate 0.0563   Epoch: 4   Global Step: 61970   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:04:08,860-Speed 3360.52 samples/sec   Loss 6.8405   LearningRate 0.0563   Epoch: 4   Global Step: 61980   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:04:11,934-Speed 3332.02 samples/sec   Loss 6.7973   LearningRate 0.0563   Epoch: 4   Global Step: 61990   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:04:15,084-Speed 3251.75 samples/sec   Loss 6.7759   LearningRate 0.0563   Epoch: 4   Global Step: 62000   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:04:18,122-Speed 3371.32 samples/sec   Loss 6.7493   LearningRate 0.0563   Epoch: 4   Global Step: 62010   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:04:21,177-Speed 3353.54 samples/sec   Loss 6.7872   LearningRate 0.0563   Epoch: 4   Global Step: 62020   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:04:24,266-Speed 3316.11 samples/sec   Loss 6.8434   LearningRate 0.0563   Epoch: 4   Global Step: 62030   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:04:27,376-Speed 3293.48 samples/sec   Loss 6.7706   LearningRate 0.0563   Epoch: 4   Global Step: 62040   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:04:30,439-Speed 3344.42 samples/sec   Loss 6.7871   LearningRate 0.0563   Epoch: 4   Global Step: 62050   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:04:33,463-Speed 3386.81 samples/sec   Loss 6.7668   LearningRate 0.0563   Epoch: 4   Global Step: 62060   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:04:36,537-Speed 3333.03 samples/sec   Loss 6.8030   LearningRate 0.0563   Epoch: 4   Global Step: 62070   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:04:39,554-Speed 3393.91 samples/sec   Loss 6.8044   LearningRate 0.0563   Epoch: 4   Global Step: 62080   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:04:42,596-Speed 3368.13 samples/sec   Loss 6.7423   LearningRate 0.0563   Epoch: 4   Global Step: 62090   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:04:45,876-Speed 3123.05 samples/sec   Loss 6.7985   LearningRate 0.0563   Epoch: 4   Global Step: 62100   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:05:17,101-Speed 327.95 samples/sec   Loss 6.0436   LearningRate 0.0562   Epoch: 5   Global Step: 62110   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:05:20,239-Speed 3264.78 samples/sec   Loss 5.2854   LearningRate 0.0562   Epoch: 5   Global Step: 62120   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:05:23,309-Speed 3335.98 samples/sec   Loss 5.2553   LearningRate 0.0562   Epoch: 5   Global Step: 62130   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:05:26,371-Speed 3346.31 samples/sec   Loss 5.2035   LearningRate 0.0562   Epoch: 5   Global Step: 62140   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:05:29,494-Speed 3279.56 samples/sec   Loss 5.1410   LearningRate 0.0562   Epoch: 5   Global Step: 62150   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:05:32,521-Speed 3383.76 samples/sec   Loss 5.1855   LearningRate 0.0562   Epoch: 5   Global Step: 62160   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-04-27 07:05:35,564-Speed 3366.47 samples/sec   Loss 5.1190   LearningRate 0.0562   Epoch: 5   Global Step: 62170   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:05:38,596-Speed 3378.89 samples/sec   Loss 5.2076   LearningRate 0.0562   Epoch: 5   Global Step: 62180   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:05:41,654-Speed 3349.19 samples/sec   Loss 5.2136   LearningRate 0.0562   Epoch: 5   Global Step: 62190   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:05:44,675-Speed 3390.94 samples/sec   Loss 5.1406   LearningRate 0.0562   Epoch: 5   Global Step: 62200   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:05:47,711-Speed 3373.91 samples/sec   Loss 5.1811   LearningRate 0.0562   Epoch: 5   Global Step: 62210   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:05:50,732-Speed 3390.29 samples/sec   Loss 5.1106   LearningRate 0.0562   Epoch: 5   Global Step: 62220   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:05:53,820-Speed 3317.98 samples/sec   Loss 5.1114   LearningRate 0.0562   Epoch: 5   Global Step: 62230   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:05:56,984-Speed 3236.40 samples/sec   Loss 5.2361   LearningRate 0.0562   Epoch: 5   Global Step: 62240   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:05:59,990-Speed 3408.51 samples/sec   Loss 5.2502   LearningRate 0.0562   Epoch: 5   Global Step: 62250   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:06:03,050-Speed 3347.62 samples/sec   Loss 5.2655   LearningRate 0.0562   Epoch: 5   Global Step: 62260   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:06:06,052-Speed 3412.04 samples/sec   Loss 5.2122   LearningRate 0.0562   Epoch: 5   Global Step: 62270   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:06:09,056-Speed 3409.30 samples/sec   Loss 5.1602   LearningRate 0.0561   Epoch: 5   Global Step: 62280   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:06:12,075-Speed 3393.49 samples/sec   Loss 5.2266   LearningRate 0.0561   Epoch: 5   Global Step: 62290   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:06:15,090-Speed 3397.95 samples/sec   Loss 5.2117   LearningRate 0.0561   Epoch: 5   Global Step: 62300   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:06:18,139-Speed 3358.70 samples/sec   Loss 5.2068   LearningRate 0.0561   Epoch: 5   Global Step: 62310   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:06:21,176-Speed 3373.11 samples/sec   Loss 5.2045   LearningRate 0.0561   Epoch: 5   Global Step: 62320   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:06:24,267-Speed 3314.64 samples/sec   Loss 5.2228   LearningRate 0.0561   Epoch: 5   Global Step: 62330   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:06:27,325-Speed 3348.48 samples/sec   Loss 5.3077   LearningRate 0.0561   Epoch: 5   Global Step: 62340   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:06:30,430-Speed 3299.68 samples/sec   Loss 5.2368   LearningRate 0.0561   Epoch: 5   Global Step: 62350   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:06:33,438-Speed 3405.33 samples/sec   Loss 5.3677   LearningRate 0.0561   Epoch: 5   Global Step: 62360   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:06:36,495-Speed 3350.27 samples/sec   Loss 5.2370   LearningRate 0.0561   Epoch: 5   Global Step: 62370   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:06:39,611-Speed 3287.01 samples/sec   Loss 5.3264   LearningRate 0.0561   Epoch: 5   Global Step: 62380   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:06:42,661-Speed 3359.12 samples/sec   Loss 5.2407   LearningRate 0.0561   Epoch: 5   Global Step: 62390   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:06:45,665-Speed 3409.87 samples/sec   Loss 5.3958   LearningRate 0.0561   Epoch: 5   Global Step: 62400   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:06:48,715-Speed 3358.23 samples/sec   Loss 5.3562   LearningRate 0.0561   Epoch: 5   Global Step: 62410   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:06:51,787-Speed 3334.71 samples/sec   Loss 5.3924   LearningRate 0.0561   Epoch: 5   Global Step: 62420   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:06:54,794-Speed 3406.70 samples/sec   Loss 5.3661   LearningRate 0.0561   Epoch: 5   Global Step: 62430   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:06:57,830-Speed 3373.60 samples/sec   Loss 5.3451   LearningRate 0.0560   Epoch: 5   Global Step: 62440   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:07:00,853-Speed 3388.87 samples/sec   Loss 5.3339   LearningRate 0.0560   Epoch: 5   Global Step: 62450   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:07:03,946-Speed 3311.83 samples/sec   Loss 5.2643   LearningRate 0.0560   Epoch: 5   Global Step: 62460   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:07:07,042-Speed 3307.81 samples/sec   Loss 5.3466   LearningRate 0.0560   Epoch: 5   Global Step: 62470   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:07:10,046-Speed 3409.81 samples/sec   Loss 5.2855   LearningRate 0.0560   Epoch: 5   Global Step: 62480   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:07:13,068-Speed 3389.17 samples/sec   Loss 5.4047   LearningRate 0.0560   Epoch: 5   Global Step: 62490   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:07:16,086-Speed 3395.25 samples/sec   Loss 5.2634   LearningRate 0.0560   Epoch: 5   Global Step: 62500   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:07:19,115-Speed 3381.08 samples/sec   Loss 5.2822   LearningRate 0.0560   Epoch: 5   Global Step: 62510   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:07:22,120-Speed 3409.14 samples/sec   Loss 5.3074   LearningRate 0.0560   Epoch: 5   Global Step: 62520   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:07:25,120-Speed 3414.52 samples/sec   Loss 5.3642   LearningRate 0.0560   Epoch: 5   Global Step: 62530   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:07:28,159-Speed 3370.31 samples/sec   Loss 5.3322   LearningRate 0.0560   Epoch: 5   Global Step: 62540   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:07:31,148-Speed 3427.51 samples/sec   Loss 5.4012   LearningRate 0.0560   Epoch: 5   Global Step: 62550   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:07:34,173-Speed 3386.41 samples/sec   Loss 5.3952   LearningRate 0.0560   Epoch: 5   Global Step: 62560   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:07:37,190-Speed 3394.36 samples/sec   Loss 5.3804   LearningRate 0.0560   Epoch: 5   Global Step: 62570   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:07:40,240-Speed 3359.03 samples/sec   Loss 5.2570   LearningRate 0.0560   Epoch: 5   Global Step: 62580   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:07:43,337-Speed 3307.73 samples/sec   Loss 5.3430   LearningRate 0.0560   Epoch: 5   Global Step: 62590   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:07:46,365-Speed 3383.13 samples/sec   Loss 5.3266   LearningRate 0.0560   Epoch: 5   Global Step: 62600   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:07:49,379-Speed 3398.06 samples/sec   Loss 5.3727   LearningRate 0.0559   Epoch: 5   Global Step: 62610   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:07:52,418-Speed 3371.08 samples/sec   Loss 5.3958   LearningRate 0.0559   Epoch: 5   Global Step: 62620   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:07:55,435-Speed 3395.55 samples/sec   Loss 5.4697   LearningRate 0.0559   Epoch: 5   Global Step: 62630   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:07:58,451-Speed 3396.55 samples/sec   Loss 5.4711   LearningRate 0.0559   Epoch: 5   Global Step: 62640   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:08:01,505-Speed 3353.60 samples/sec   Loss 5.3254   LearningRate 0.0559   Epoch: 5   Global Step: 62650   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:08:04,583-Speed 3328.14 samples/sec   Loss 5.4678   LearningRate 0.0559   Epoch: 5   Global Step: 62660   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:08:07,608-Speed 3386.50 samples/sec   Loss 5.3988   LearningRate 0.0559   Epoch: 5   Global Step: 62670   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:08:10,604-Speed 3418.37 samples/sec   Loss 5.4226   LearningRate 0.0559   Epoch: 5   Global Step: 62680   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:08:13,646-Speed 3366.63 samples/sec   Loss 5.4583   LearningRate 0.0559   Epoch: 5   Global Step: 62690   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:08:16,652-Speed 3407.51 samples/sec   Loss 5.3968   LearningRate 0.0559   Epoch: 5   Global Step: 62700   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:08:19,680-Speed 3383.83 samples/sec   Loss 5.5079   LearningRate 0.0559   Epoch: 5   Global Step: 62710   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:08:22,718-Speed 3371.98 samples/sec   Loss 5.4619   LearningRate 0.0559   Epoch: 5   Global Step: 62720   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:08:25,724-Speed 3406.65 samples/sec   Loss 5.3682   LearningRate 0.0559   Epoch: 5   Global Step: 62730   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:08:28,722-Speed 3416.67 samples/sec   Loss 5.3895   LearningRate 0.0559   Epoch: 5   Global Step: 62740   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:08:31,749-Speed 3384.97 samples/sec   Loss 5.4998   LearningRate 0.0559   Epoch: 5   Global Step: 62750   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:08:34,791-Speed 3366.27 samples/sec   Loss 5.3729   LearningRate 0.0559   Epoch: 5   Global Step: 62760   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:08:37,826-Speed 3376.15 samples/sec   Loss 5.4575   LearningRate 0.0558   Epoch: 5   Global Step: 62770   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:08:40,857-Speed 3378.82 samples/sec   Loss 5.4577   LearningRate 0.0558   Epoch: 5   Global Step: 62780   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:08:43,938-Speed 3325.07 samples/sec   Loss 5.3501   LearningRate 0.0558   Epoch: 5   Global Step: 62790   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:08:46,976-Speed 3371.81 samples/sec   Loss 5.3795   LearningRate 0.0558   Epoch: 5   Global Step: 62800   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:08:49,976-Speed 3413.82 samples/sec   Loss 5.4623   LearningRate 0.0558   Epoch: 5   Global Step: 62810   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:08:53,071-Speed 3309.94 samples/sec   Loss 5.4076   LearningRate 0.0558   Epoch: 5   Global Step: 62820   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:08:56,089-Speed 3394.21 samples/sec   Loss 5.4315   LearningRate 0.0558   Epoch: 5   Global Step: 62830   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:08:59,141-Speed 3356.62 samples/sec   Loss 5.4244   LearningRate 0.0558   Epoch: 5   Global Step: 62840   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:09:02,150-Speed 3404.50 samples/sec   Loss 5.5081   LearningRate 0.0558   Epoch: 5   Global Step: 62850   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:09:05,152-Speed 3411.79 samples/sec   Loss 5.5006   LearningRate 0.0558   Epoch: 5   Global Step: 62860   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:09:08,159-Speed 3406.80 samples/sec   Loss 5.4321   LearningRate 0.0558   Epoch: 5   Global Step: 62870   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:09:11,157-Speed 3416.87 samples/sec   Loss 5.5991   LearningRate 0.0558   Epoch: 5   Global Step: 62880   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:09:14,166-Speed 3404.31 samples/sec   Loss 5.4184   LearningRate 0.0558   Epoch: 5   Global Step: 62890   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:09:17,236-Speed 3336.21 samples/sec   Loss 5.4584   LearningRate 0.0558   Epoch: 5   Global Step: 62900   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:09:20,262-Speed 3384.26 samples/sec   Loss 5.4496   LearningRate 0.0558   Epoch: 5   Global Step: 62910   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:09:23,296-Speed 3377.17 samples/sec   Loss 5.4662   LearningRate 0.0558   Epoch: 5   Global Step: 62920   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:09:26,311-Speed 3397.11 samples/sec   Loss 5.4695   LearningRate 0.0558   Epoch: 5   Global Step: 62930   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:09:29,393-Speed 3323.79 samples/sec   Loss 5.5759   LearningRate 0.0557   Epoch: 5   Global Step: 62940   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:09:32,406-Speed 3399.46 samples/sec   Loss 5.6518   LearningRate 0.0557   Epoch: 5   Global Step: 62950   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:09:35,457-Speed 3357.30 samples/sec   Loss 5.6774   LearningRate 0.0557   Epoch: 5   Global Step: 62960   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:09:38,468-Speed 3402.24 samples/sec   Loss 5.5634   LearningRate 0.0557   Epoch: 5   Global Step: 62970   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:09:41,511-Speed 3366.08 samples/sec   Loss 5.4944   LearningRate 0.0557   Epoch: 5   Global Step: 62980   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:09:44,562-Speed 3358.03 samples/sec   Loss 5.5133   LearningRate 0.0557   Epoch: 5   Global Step: 62990   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:09:47,571-Speed 3404.09 samples/sec   Loss 5.5607   LearningRate 0.0557   Epoch: 5   Global Step: 63000   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:09:50,604-Speed 3376.87 samples/sec   Loss 5.5825   LearningRate 0.0557   Epoch: 5   Global Step: 63010   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:09:53,626-Speed 3389.59 samples/sec   Loss 5.5720   LearningRate 0.0557   Epoch: 5   Global Step: 63020   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:09:56,654-Speed 3383.55 samples/sec   Loss 5.4836   LearningRate 0.0557   Epoch: 5   Global Step: 63030   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:09:59,663-Speed 3403.91 samples/sec   Loss 5.6519   LearningRate 0.0557   Epoch: 5   Global Step: 63040   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:10:02,685-Speed 3389.87 samples/sec   Loss 5.6127   LearningRate 0.0557   Epoch: 5   Global Step: 63050   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:10:05,731-Speed 3363.07 samples/sec   Loss 5.6822   LearningRate 0.0557   Epoch: 5   Global Step: 63060   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:10:08,750-Speed 3392.67 samples/sec   Loss 5.4599   LearningRate 0.0557   Epoch: 5   Global Step: 63070   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:10:11,791-Speed 3368.05 samples/sec   Loss 5.4643   LearningRate 0.0557   Epoch: 5   Global Step: 63080   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:10:14,811-Speed 3392.54 samples/sec   Loss 5.5424   LearningRate 0.0557   Epoch: 5   Global Step: 63090   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:10:17,849-Speed 3370.94 samples/sec   Loss 5.5130   LearningRate 0.0557   Epoch: 5   Global Step: 63100   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:10:20,863-Speed 3398.83 samples/sec   Loss 5.6007   LearningRate 0.0556   Epoch: 5   Global Step: 63110   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:10:23,861-Speed 3416.69 samples/sec   Loss 5.7373   LearningRate 0.0556   Epoch: 5   Global Step: 63120   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:10:26,869-Speed 3406.89 samples/sec   Loss 5.6441   LearningRate 0.0556   Epoch: 5   Global Step: 63130   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:10:29,985-Speed 3286.77 samples/sec   Loss 5.5627   LearningRate 0.0556   Epoch: 5   Global Step: 63140   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:10:33,040-Speed 3352.70 samples/sec   Loss 5.5570   LearningRate 0.0556   Epoch: 5   Global Step: 63150   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:10:36,073-Speed 3377.31 samples/sec   Loss 5.4936   LearningRate 0.0556   Epoch: 5   Global Step: 63160   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:10:39,086-Speed 3400.09 samples/sec   Loss 5.4351   LearningRate 0.0556   Epoch: 5   Global Step: 63170   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:10:42,113-Speed 3384.35 samples/sec   Loss 5.5656   LearningRate 0.0556   Epoch: 5   Global Step: 63180   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:10:45,139-Speed 3385.11 samples/sec   Loss 5.4982   LearningRate 0.0556   Epoch: 5   Global Step: 63190   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:10:48,192-Speed 3354.75 samples/sec   Loss 5.5693   LearningRate 0.0556   Epoch: 5   Global Step: 63200   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:10:51,206-Speed 3398.78 samples/sec   Loss 5.5771   LearningRate 0.0556   Epoch: 5   Global Step: 63210   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:10:54,216-Speed 3403.39 samples/sec   Loss 5.6811   LearningRate 0.0556   Epoch: 5   Global Step: 63220   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:10:57,232-Speed 3396.36 samples/sec   Loss 5.6878   LearningRate 0.0556   Epoch: 5   Global Step: 63230   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:11:00,279-Speed 3362.41 samples/sec   Loss 5.6470   LearningRate 0.0556   Epoch: 5   Global Step: 63240   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:11:03,288-Speed 3404.03 samples/sec   Loss 5.6513   LearningRate 0.0556   Epoch: 5   Global Step: 63250   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:11:06,346-Speed 3349.52 samples/sec   Loss 5.5967   LearningRate 0.0556   Epoch: 5   Global Step: 63260   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:11:09,359-Speed 3399.32 samples/sec   Loss 5.7393   LearningRate 0.0555   Epoch: 5   Global Step: 63270   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:11:12,371-Speed 3400.85 samples/sec   Loss 5.6776   LearningRate 0.0555   Epoch: 5   Global Step: 63280   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:11:15,437-Speed 3341.50 samples/sec   Loss 5.6121   LearningRate 0.0555   Epoch: 5   Global Step: 63290   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:11:18,449-Speed 3401.01 samples/sec   Loss 5.5679   LearningRate 0.0555   Epoch: 5   Global Step: 63300   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:11:21,481-Speed 3377.89 samples/sec   Loss 5.5907   LearningRate 0.0555   Epoch: 5   Global Step: 63310   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:11:24,536-Speed 3353.02 samples/sec   Loss 5.6464   LearningRate 0.0555   Epoch: 5   Global Step: 63320   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:11:27,554-Speed 3394.49 samples/sec   Loss 5.6301   LearningRate 0.0555   Epoch: 5   Global Step: 63330   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:11:30,567-Speed 3399.75 samples/sec   Loss 5.5809   LearningRate 0.0555   Epoch: 5   Global Step: 63340   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:11:33,658-Speed 3313.67 samples/sec   Loss 5.6119   LearningRate 0.0555   Epoch: 5   Global Step: 63350   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:11:36,716-Speed 3350.10 samples/sec   Loss 5.6998   LearningRate 0.0555   Epoch: 5   Global Step: 63360   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:11:39,745-Speed 3381.24 samples/sec   Loss 5.6318   LearningRate 0.0555   Epoch: 5   Global Step: 63370   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:11:42,782-Speed 3372.85 samples/sec   Loss 5.7201   LearningRate 0.0555   Epoch: 5   Global Step: 63380   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:11:45,803-Speed 3390.67 samples/sec   Loss 5.8212   LearningRate 0.0555   Epoch: 5   Global Step: 63390   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:11:48,868-Speed 3341.62 samples/sec   Loss 5.7203   LearningRate 0.0555   Epoch: 5   Global Step: 63400   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:11:51,934-Speed 3341.40 samples/sec   Loss 5.7185   LearningRate 0.0555   Epoch: 5   Global Step: 63410   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:11:54,981-Speed 3361.71 samples/sec   Loss 5.7346   LearningRate 0.0555   Epoch: 5   Global Step: 63420   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:11:58,022-Speed 3369.20 samples/sec   Loss 5.7029   LearningRate 0.0555   Epoch: 5   Global Step: 63430   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:12:01,027-Speed 3409.00 samples/sec   Loss 5.6789   LearningRate 0.0554   Epoch: 5   Global Step: 63440   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:12:04,045-Speed 3393.66 samples/sec   Loss 5.7523   LearningRate 0.0554   Epoch: 5   Global Step: 63450   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:12:07,051-Speed 3407.61 samples/sec   Loss 5.7415   LearningRate 0.0554   Epoch: 5   Global Step: 63460   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:12:10,051-Speed 3414.07 samples/sec   Loss 5.7790   LearningRate 0.0554   Epoch: 5   Global Step: 63470   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:12:13,078-Speed 3384.15 samples/sec   Loss 5.6338   LearningRate 0.0554   Epoch: 5   Global Step: 63480   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:12:16,109-Speed 3380.22 samples/sec   Loss 5.6856   LearningRate 0.0554   Epoch: 5   Global Step: 63490   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:12:19,110-Speed 3412.38 samples/sec   Loss 5.6191   LearningRate 0.0554   Epoch: 5   Global Step: 63500   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:12:22,116-Speed 3407.98 samples/sec   Loss 5.7248   LearningRate 0.0554   Epoch: 5   Global Step: 63510   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:12:25,119-Speed 3410.70 samples/sec   Loss 5.6381   LearningRate 0.0554   Epoch: 5   Global Step: 63520   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:12:28,132-Speed 3399.97 samples/sec   Loss 5.6332   LearningRate 0.0554   Epoch: 5   Global Step: 63530   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:12:31,137-Speed 3408.41 samples/sec   Loss 5.6204   LearningRate 0.0554   Epoch: 5   Global Step: 63540   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:12:34,150-Speed 3399.75 samples/sec   Loss 5.7883   LearningRate 0.0554   Epoch: 5   Global Step: 63550   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:12:37,202-Speed 3355.99 samples/sec   Loss 5.6418   LearningRate 0.0554   Epoch: 5   Global Step: 63560   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:12:40,250-Speed 3361.05 samples/sec   Loss 5.7529   LearningRate 0.0554   Epoch: 5   Global Step: 63570   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:12:43,286-Speed 3374.18 samples/sec   Loss 5.7562   LearningRate 0.0554   Epoch: 5   Global Step: 63580   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:12:46,328-Speed 3367.20 samples/sec   Loss 5.6646   LearningRate 0.0554   Epoch: 5   Global Step: 63590   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:12:49,361-Speed 3377.97 samples/sec   Loss 5.7764   LearningRate 0.0554   Epoch: 5   Global Step: 63600   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:12:52,395-Speed 3375.95 samples/sec   Loss 5.7169   LearningRate 0.0553   Epoch: 5   Global Step: 63610   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:12:55,520-Speed 3277.35 samples/sec   Loss 5.8248   LearningRate 0.0553   Epoch: 5   Global Step: 63620   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:12:58,540-Speed 3392.67 samples/sec   Loss 5.6333   LearningRate 0.0553   Epoch: 5   Global Step: 63630   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:13:01,542-Speed 3412.39 samples/sec   Loss 5.7428   LearningRate 0.0553   Epoch: 5   Global Step: 63640   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:13:04,607-Speed 3341.21 samples/sec   Loss 5.8310   LearningRate 0.0553   Epoch: 5   Global Step: 63650   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:13:07,631-Speed 3387.35 samples/sec   Loss 5.8262   LearningRate 0.0553   Epoch: 5   Global Step: 63660   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:13:10,637-Speed 3408.04 samples/sec   Loss 5.6899   LearningRate 0.0553   Epoch: 5   Global Step: 63670   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:13:13,676-Speed 3371.16 samples/sec   Loss 5.7478   LearningRate 0.0553   Epoch: 5   Global Step: 63680   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:13:16,702-Speed 3384.93 samples/sec   Loss 5.7255   LearningRate 0.0553   Epoch: 5   Global Step: 63690   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:13:19,698-Speed 3417.96 samples/sec   Loss 5.7190   LearningRate 0.0553   Epoch: 5   Global Step: 63700   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:13:22,723-Speed 3387.41 samples/sec   Loss 5.7661   LearningRate 0.0553   Epoch: 5   Global Step: 63710   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:13:25,801-Speed 3327.71 samples/sec   Loss 5.8148   LearningRate 0.0553   Epoch: 5   Global Step: 63720   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:13:28,820-Speed 3392.64 samples/sec   Loss 5.8041   LearningRate 0.0553   Epoch: 5   Global Step: 63730   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:13:31,818-Speed 3416.87 samples/sec   Loss 5.8277   LearningRate 0.0553   Epoch: 5   Global Step: 63740   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:13:34,818-Speed 3413.99 samples/sec   Loss 5.8527   LearningRate 0.0553   Epoch: 5   Global Step: 63750   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:13:37,852-Speed 3376.37 samples/sec   Loss 5.7068   LearningRate 0.0553   Epoch: 5   Global Step: 63760   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:13:40,866-Speed 3399.00 samples/sec   Loss 5.8326   LearningRate 0.0552   Epoch: 5   Global Step: 63770   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:13:43,873-Speed 3406.27 samples/sec   Loss 5.7316   LearningRate 0.0552   Epoch: 5   Global Step: 63780   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:13:46,899-Speed 3384.82 samples/sec   Loss 5.7272   LearningRate 0.0552   Epoch: 5   Global Step: 63790   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:13:49,927-Speed 3382.61 samples/sec   Loss 5.7660   LearningRate 0.0552   Epoch: 5   Global Step: 63800   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:13:52,929-Speed 3412.44 samples/sec   Loss 5.7734   LearningRate 0.0552   Epoch: 5   Global Step: 63810   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:13:55,951-Speed 3389.29 samples/sec   Loss 5.8078   LearningRate 0.0552   Epoch: 5   Global Step: 63820   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:13:58,948-Speed 3418.63 samples/sec   Loss 5.8218   LearningRate 0.0552   Epoch: 5   Global Step: 63830   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:14:01,964-Speed 3395.67 samples/sec   Loss 5.7826   LearningRate 0.0552   Epoch: 5   Global Step: 63840   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:14:05,020-Speed 3352.67 samples/sec   Loss 5.7078   LearningRate 0.0552   Epoch: 5   Global Step: 63850   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:14:08,082-Speed 3344.86 samples/sec   Loss 5.8743   LearningRate 0.0552   Epoch: 5   Global Step: 63860   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:14:11,105-Speed 3389.05 samples/sec   Loss 5.9314   LearningRate 0.0552   Epoch: 5   Global Step: 63870   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:14:14,153-Speed 3360.22 samples/sec   Loss 5.8329   LearningRate 0.0552   Epoch: 5   Global Step: 63880   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:14:17,155-Speed 3412.63 samples/sec   Loss 5.7831   LearningRate 0.0552   Epoch: 5   Global Step: 63890   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:14:20,170-Speed 3397.99 samples/sec   Loss 5.8094   LearningRate 0.0552   Epoch: 5   Global Step: 63900   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:14:23,169-Speed 3415.14 samples/sec   Loss 5.8150   LearningRate 0.0552   Epoch: 5   Global Step: 63910   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:14:26,274-Speed 3299.59 samples/sec   Loss 5.8895   LearningRate 0.0552   Epoch: 5   Global Step: 63920   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:14:29,324-Speed 3358.53 samples/sec   Loss 5.7802   LearningRate 0.0552   Epoch: 5   Global Step: 63930   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:14:32,354-Speed 3380.16 samples/sec   Loss 5.7889   LearningRate 0.0551   Epoch: 5   Global Step: 63940   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:14:35,378-Speed 3387.69 samples/sec   Loss 5.7544   LearningRate 0.0551   Epoch: 5   Global Step: 63950   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:14:38,458-Speed 3326.05 samples/sec   Loss 5.7497   LearningRate 0.0551   Epoch: 5   Global Step: 63960   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:14:41,491-Speed 3377.41 samples/sec   Loss 5.7791   LearningRate 0.0551   Epoch: 5   Global Step: 63970   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:14:44,498-Speed 3406.22 samples/sec   Loss 5.8015   LearningRate 0.0551   Epoch: 5   Global Step: 63980   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:14:47,587-Speed 3316.05 samples/sec   Loss 5.8071   LearningRate 0.0551   Epoch: 5   Global Step: 63990   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:14:50,693-Speed 3297.69 samples/sec   Loss 5.9080   LearningRate 0.0551   Epoch: 5   Global Step: 64000   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:14:53,742-Speed 3359.97 samples/sec   Loss 5.8459   LearningRate 0.0551   Epoch: 5   Global Step: 64010   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:14:56,781-Speed 3370.73 samples/sec   Loss 5.8138   LearningRate 0.0551   Epoch: 5   Global Step: 64020   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:14:59,898-Speed 3285.81 samples/sec   Loss 5.9341   LearningRate 0.0551   Epoch: 5   Global Step: 64030   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:15:02,957-Speed 3349.15 samples/sec   Loss 5.7971   LearningRate 0.0551   Epoch: 5   Global Step: 64040   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:15:05,978-Speed 3390.17 samples/sec   Loss 5.8648   LearningRate 0.0551   Epoch: 5   Global Step: 64050   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:15:08,998-Speed 3391.92 samples/sec   Loss 5.8894   LearningRate 0.0551   Epoch: 5   Global Step: 64060   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:15:12,013-Speed 3397.65 samples/sec   Loss 5.7582   LearningRate 0.0551   Epoch: 5   Global Step: 64070   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:15:15,020-Speed 3405.92 samples/sec   Loss 5.8514   LearningRate 0.0551   Epoch: 5   Global Step: 64080   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:15:18,086-Speed 3340.99 samples/sec   Loss 5.8798   LearningRate 0.0551   Epoch: 5   Global Step: 64090   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:15:21,141-Speed 3353.43 samples/sec   Loss 5.8904   LearningRate 0.0551   Epoch: 5   Global Step: 64100   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:15:24,211-Speed 3336.35 samples/sec   Loss 5.8005   LearningRate 0.0550   Epoch: 5   Global Step: 64110   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:15:27,283-Speed 3334.89 samples/sec   Loss 5.8786   LearningRate 0.0550   Epoch: 5   Global Step: 64120   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:15:30,384-Speed 3302.86 samples/sec   Loss 5.9227   LearningRate 0.0550   Epoch: 5   Global Step: 64130   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:15:33,398-Speed 3398.14 samples/sec   Loss 5.9080   LearningRate 0.0550   Epoch: 5   Global Step: 64140   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:15:36,474-Speed 3331.09 samples/sec   Loss 5.9102   LearningRate 0.0550   Epoch: 5   Global Step: 64150   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:15:39,518-Speed 3364.00 samples/sec   Loss 5.8885   LearningRate 0.0550   Epoch: 5   Global Step: 64160   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:15:42,542-Speed 3387.76 samples/sec   Loss 5.7972   LearningRate 0.0550   Epoch: 5   Global Step: 64170   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:15:45,547-Speed 3408.53 samples/sec   Loss 5.8557   LearningRate 0.0550   Epoch: 5   Global Step: 64180   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:15:48,574-Speed 3383.89 samples/sec   Loss 5.9019   LearningRate 0.0550   Epoch: 5   Global Step: 64190   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:15:51,600-Speed 3385.76 samples/sec   Loss 5.8864   LearningRate 0.0550   Epoch: 5   Global Step: 64200   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:15:54,666-Speed 3340.63 samples/sec   Loss 5.9208   LearningRate 0.0550   Epoch: 5   Global Step: 64210   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:15:57,677-Speed 3402.13 samples/sec   Loss 5.8714   LearningRate 0.0550   Epoch: 5   Global Step: 64220   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:16:00,704-Speed 3384.06 samples/sec   Loss 5.8770   LearningRate 0.0550   Epoch: 5   Global Step: 64230   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:16:03,750-Speed 3362.30 samples/sec   Loss 5.8996   LearningRate 0.0550   Epoch: 5   Global Step: 64240   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:16:06,811-Speed 3346.32 samples/sec   Loss 5.9315   LearningRate 0.0550   Epoch: 5   Global Step: 64250   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:16:09,830-Speed 3393.83 samples/sec   Loss 5.9924   LearningRate 0.0550   Epoch: 5   Global Step: 64260   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:16:12,897-Speed 3338.72 samples/sec   Loss 5.8721   LearningRate 0.0550   Epoch: 5   Global Step: 64270   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:16:15,930-Speed 3377.41 samples/sec   Loss 5.8924   LearningRate 0.0549   Epoch: 5   Global Step: 64280   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:16:18,975-Speed 3363.94 samples/sec   Loss 5.9520   LearningRate 0.0549   Epoch: 5   Global Step: 64290   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:16:22,002-Speed 3384.49 samples/sec   Loss 5.7626   LearningRate 0.0549   Epoch: 5   Global Step: 64300   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:16:25,107-Speed 3299.29 samples/sec   Loss 5.9491   LearningRate 0.0549   Epoch: 5   Global Step: 64310   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:16:28,213-Speed 3297.96 samples/sec   Loss 5.9488   LearningRate 0.0549   Epoch: 5   Global Step: 64320   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:16:31,265-Speed 3356.44 samples/sec   Loss 5.8604   LearningRate 0.0549   Epoch: 5   Global Step: 64330   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:16:34,274-Speed 3403.54 samples/sec   Loss 5.8690   LearningRate 0.0549   Epoch: 5   Global Step: 64340   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:16:37,350-Speed 3330.15 samples/sec   Loss 5.9182   LearningRate 0.0549   Epoch: 5   Global Step: 64350   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:16:40,457-Speed 3297.87 samples/sec   Loss 5.9299   LearningRate 0.0549   Epoch: 5   Global Step: 64360   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:16:43,517-Speed 3346.68 samples/sec   Loss 5.8628   LearningRate 0.0549   Epoch: 5   Global Step: 64370   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:16:46,532-Speed 3398.34 samples/sec   Loss 5.8569   LearningRate 0.0549   Epoch: 5   Global Step: 64380   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:16:49,615-Speed 3322.46 samples/sec   Loss 6.0243   LearningRate 0.0549   Epoch: 5   Global Step: 64390   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:16:52,637-Speed 3389.58 samples/sec   Loss 5.9527   LearningRate 0.0549   Epoch: 5   Global Step: 64400   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:16:55,638-Speed 3413.38 samples/sec   Loss 5.8940   LearningRate 0.0549   Epoch: 5   Global Step: 64410   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:16:58,638-Speed 3413.44 samples/sec   Loss 5.9349   LearningRate 0.0549   Epoch: 5   Global Step: 64420   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:17:01,693-Speed 3353.64 samples/sec   Loss 5.8973   LearningRate 0.0549   Epoch: 5   Global Step: 64430   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:17:04,833-Speed 3262.11 samples/sec   Loss 5.9789   LearningRate 0.0548   Epoch: 5   Global Step: 64440   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:17:08,574-Speed 2737.46 samples/sec   Loss 5.8625   LearningRate 0.0548   Epoch: 5   Global Step: 64450   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:17:11,636-Speed 3346.40 samples/sec   Loss 5.9290   LearningRate 0.0548   Epoch: 5   Global Step: 64460   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:17:14,698-Speed 3344.44 samples/sec   Loss 5.9310   LearningRate 0.0548   Epoch: 5   Global Step: 64470   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:17:17,765-Speed 3340.29 samples/sec   Loss 5.8061   LearningRate 0.0548   Epoch: 5   Global Step: 64480   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:17:20,810-Speed 3375.32 samples/sec   Loss 5.9732   LearningRate 0.0548   Epoch: 5   Global Step: 64490   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:17:23,893-Speed 3322.95 samples/sec   Loss 5.9327   LearningRate 0.0548   Epoch: 5   Global Step: 64500   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:17:26,925-Speed 3377.87 samples/sec   Loss 5.8996   LearningRate 0.0548   Epoch: 5   Global Step: 64510   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:17:30,049-Speed 3279.09 samples/sec   Loss 6.0421   LearningRate 0.0548   Epoch: 5   Global Step: 64520   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:17:33,079-Speed 3380.60 samples/sec   Loss 5.9474   LearningRate 0.0548   Epoch: 5   Global Step: 64530   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:17:36,114-Speed 3375.16 samples/sec   Loss 6.0406   LearningRate 0.0548   Epoch: 5   Global Step: 64540   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:17:39,126-Speed 3401.22 samples/sec   Loss 5.9291   LearningRate 0.0548   Epoch: 5   Global Step: 64550   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:17:42,165-Speed 3370.89 samples/sec   Loss 5.9528   LearningRate 0.0548   Epoch: 5   Global Step: 64560   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:17:45,206-Speed 3367.84 samples/sec   Loss 5.9671   LearningRate 0.0548   Epoch: 5   Global Step: 64570   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:17:48,224-Speed 3394.69 samples/sec   Loss 5.8481   LearningRate 0.0548   Epoch: 5   Global Step: 64580   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:17:51,266-Speed 3367.35 samples/sec   Loss 5.8957   LearningRate 0.0548   Epoch: 5   Global Step: 64590   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:17:54,296-Speed 3380.68 samples/sec   Loss 6.0490   LearningRate 0.0548   Epoch: 5   Global Step: 64600   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:17:57,365-Speed 3337.60 samples/sec   Loss 6.0325   LearningRate 0.0547   Epoch: 5   Global Step: 64610   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:18:00,451-Speed 3319.64 samples/sec   Loss 6.0484   LearningRate 0.0547   Epoch: 5   Global Step: 64620   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:18:03,482-Speed 3379.35 samples/sec   Loss 5.9946   LearningRate 0.0547   Epoch: 5   Global Step: 64630   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:18:06,587-Speed 3299.18 samples/sec   Loss 6.0318   LearningRate 0.0547   Epoch: 5   Global Step: 64640   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:18:09,621-Speed 3376.24 samples/sec   Loss 6.0229   LearningRate 0.0547   Epoch: 5   Global Step: 64650   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:18:12,755-Speed 3268.34 samples/sec   Loss 6.0024   LearningRate 0.0547   Epoch: 5   Global Step: 64660   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:18:15,794-Speed 3370.54 samples/sec   Loss 6.0078   LearningRate 0.0547   Epoch: 5   Global Step: 64670   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:18:18,815-Speed 3390.36 samples/sec   Loss 5.9626   LearningRate 0.0547   Epoch: 5   Global Step: 64680   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:18:21,850-Speed 3375.49 samples/sec   Loss 6.0703   LearningRate 0.0547   Epoch: 5   Global Step: 64690   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:18:24,921-Speed 3335.98 samples/sec   Loss 6.0077   LearningRate 0.0547   Epoch: 5   Global Step: 64700   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:18:28,036-Speed 3287.70 samples/sec   Loss 5.9771   LearningRate 0.0547   Epoch: 5   Global Step: 64710   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:18:31,102-Speed 3340.66 samples/sec   Loss 5.9661   LearningRate 0.0547   Epoch: 5   Global Step: 64720   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:18:34,143-Speed 3368.77 samples/sec   Loss 5.9433   LearningRate 0.0547   Epoch: 5   Global Step: 64730   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:18:37,210-Speed 3340.10 samples/sec   Loss 5.9016   LearningRate 0.0547   Epoch: 5   Global Step: 64740   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:18:40,315-Speed 3298.64 samples/sec   Loss 6.0572   LearningRate 0.0547   Epoch: 5   Global Step: 64750   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:18:44,707-Speed 2331.93 samples/sec   Loss 5.9560   LearningRate 0.0547   Epoch: 5   Global Step: 64760   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:18:47,744-Speed 3372.91 samples/sec   Loss 5.9409   LearningRate 0.0547   Epoch: 5   Global Step: 64770   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:18:50,779-Speed 3375.55 samples/sec   Loss 5.9418   LearningRate 0.0546   Epoch: 5   Global Step: 64780   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:18:53,814-Speed 3375.13 samples/sec   Loss 5.9714   LearningRate 0.0546   Epoch: 5   Global Step: 64790   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:18:56,889-Speed 3331.06 samples/sec   Loss 6.0589   LearningRate 0.0546   Epoch: 5   Global Step: 64800   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:18:59,956-Speed 3339.82 samples/sec   Loss 6.0328   LearningRate 0.0546   Epoch: 5   Global Step: 64810   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:19:02,999-Speed 3365.61 samples/sec   Loss 5.9650   LearningRate 0.0546   Epoch: 5   Global Step: 64820   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:19:06,019-Speed 3392.66 samples/sec   Loss 5.9039   LearningRate 0.0546   Epoch: 5   Global Step: 64830   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:19:09,029-Speed 3402.80 samples/sec   Loss 6.0495   LearningRate 0.0546   Epoch: 5   Global Step: 64840   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:19:12,135-Speed 3297.86 samples/sec   Loss 6.0371   LearningRate 0.0546   Epoch: 5   Global Step: 64850   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:19:15,202-Speed 3339.99 samples/sec   Loss 6.0546   LearningRate 0.0546   Epoch: 5   Global Step: 64860   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:19:18,234-Speed 3379.44 samples/sec   Loss 5.9892   LearningRate 0.0546   Epoch: 5   Global Step: 64870   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:19:21,276-Speed 3367.01 samples/sec   Loss 6.0765   LearningRate 0.0546   Epoch: 5   Global Step: 64880   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:19:24,317-Speed 3367.75 samples/sec   Loss 6.1312   LearningRate 0.0546   Epoch: 5   Global Step: 64890   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:19:27,388-Speed 3335.97 samples/sec   Loss 6.0110   LearningRate 0.0546   Epoch: 5   Global Step: 64900   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:19:30,448-Speed 3347.60 samples/sec   Loss 5.9957   LearningRate 0.0546   Epoch: 5   Global Step: 64910   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:19:33,475-Speed 3383.83 samples/sec   Loss 6.0131   LearningRate 0.0546   Epoch: 5   Global Step: 64920   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:19:36,526-Speed 3357.90 samples/sec   Loss 6.1356   LearningRate 0.0546   Epoch: 5   Global Step: 64930   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:19:39,539-Speed 3399.78 samples/sec   Loss 5.9658   LearningRate 0.0546   Epoch: 5   Global Step: 64940   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:19:42,581-Speed 3366.85 samples/sec   Loss 5.8942   LearningRate 0.0545   Epoch: 5   Global Step: 64950   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:19:45,590-Speed 3404.84 samples/sec   Loss 6.1013   LearningRate 0.0545   Epoch: 5   Global Step: 64960   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:19:48,611-Speed 3390.23 samples/sec   Loss 5.9798   LearningRate 0.0545   Epoch: 5   Global Step: 64970   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:19:51,726-Speed 3288.65 samples/sec   Loss 6.0727   LearningRate 0.0545   Epoch: 5   Global Step: 64980   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:19:54,768-Speed 3367.40 samples/sec   Loss 6.0536   LearningRate 0.0545   Epoch: 5   Global Step: 64990   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:19:57,773-Speed 3408.68 samples/sec   Loss 6.0088   LearningRate 0.0545   Epoch: 5   Global Step: 65000   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:20:00,824-Speed 3357.72 samples/sec   Loss 5.9403   LearningRate 0.0545   Epoch: 5   Global Step: 65010   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:20:03,876-Speed 3355.38 samples/sec   Loss 5.9189   LearningRate 0.0545   Epoch: 5   Global Step: 65020   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:20:06,995-Speed 3284.34 samples/sec   Loss 5.9770   LearningRate 0.0545   Epoch: 5   Global Step: 65030   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:20:10,016-Speed 3390.50 samples/sec   Loss 6.0874   LearningRate 0.0545   Epoch: 5   Global Step: 65040   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:20:13,036-Speed 3392.38 samples/sec   Loss 6.0338   LearningRate 0.0545   Epoch: 5   Global Step: 65050   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:20:16,056-Speed 3391.94 samples/sec   Loss 6.1107   LearningRate 0.0545   Epoch: 5   Global Step: 65060   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:20:19,189-Speed 3269.23 samples/sec   Loss 5.9650   LearningRate 0.0545   Epoch: 5   Global Step: 65070   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:20:22,218-Speed 3381.52 samples/sec   Loss 6.0408   LearningRate 0.0545   Epoch: 5   Global Step: 65080   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:20:25,239-Speed 3391.42 samples/sec   Loss 6.0673   LearningRate 0.0545   Epoch: 5   Global Step: 65090   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:20:28,320-Speed 3324.13 samples/sec   Loss 6.0891   LearningRate 0.0545   Epoch: 5   Global Step: 65100   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:20:31,345-Speed 3386.11 samples/sec   Loss 6.0174   LearningRate 0.0545   Epoch: 5   Global Step: 65110   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:20:34,368-Speed 3388.97 samples/sec   Loss 6.1173   LearningRate 0.0544   Epoch: 5   Global Step: 65120   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:20:37,438-Speed 3337.33 samples/sec   Loss 6.0837   LearningRate 0.0544   Epoch: 5   Global Step: 65130   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:20:40,494-Speed 3352.11 samples/sec   Loss 6.0502   LearningRate 0.0544   Epoch: 5   Global Step: 65140   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:20:43,571-Speed 3328.73 samples/sec   Loss 6.0238   LearningRate 0.0544   Epoch: 5   Global Step: 65150   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:20:46,604-Speed 3377.24 samples/sec   Loss 5.9785   LearningRate 0.0544   Epoch: 5   Global Step: 65160   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:20:49,608-Speed 3410.07 samples/sec   Loss 5.9848   LearningRate 0.0544   Epoch: 5   Global Step: 65170   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:20:52,681-Speed 3332.75 samples/sec   Loss 6.1200   LearningRate 0.0544   Epoch: 5   Global Step: 65180   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:20:55,740-Speed 3348.86 samples/sec   Loss 6.0456   LearningRate 0.0544   Epoch: 5   Global Step: 65190   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:20:58,775-Speed 3375.26 samples/sec   Loss 6.0508   LearningRate 0.0544   Epoch: 5   Global Step: 65200   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:21:01,845-Speed 3336.30 samples/sec   Loss 6.0257   LearningRate 0.0544   Epoch: 5   Global Step: 65210   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:21:04,941-Speed 3308.82 samples/sec   Loss 6.0472   LearningRate 0.0544   Epoch: 5   Global Step: 65220   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:21:07,960-Speed 3393.62 samples/sec   Loss 6.1333   LearningRate 0.0544   Epoch: 5   Global Step: 65230   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:21:11,067-Speed 3296.50 samples/sec   Loss 6.0647   LearningRate 0.0544   Epoch: 5   Global Step: 65240   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:21:14,079-Speed 3400.90 samples/sec   Loss 6.0447   LearningRate 0.0544   Epoch: 5   Global Step: 65250   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:21:17,166-Speed 3318.38 samples/sec   Loss 6.1089   LearningRate 0.0544   Epoch: 5   Global Step: 65260   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:21:20,232-Speed 3341.24 samples/sec   Loss 6.0565   LearningRate 0.0544   Epoch: 5   Global Step: 65270   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:21:23,308-Speed 3329.20 samples/sec   Loss 6.0478   LearningRate 0.0543   Epoch: 5   Global Step: 65280   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:21:26,435-Speed 3275.70 samples/sec   Loss 6.1717   LearningRate 0.0543   Epoch: 5   Global Step: 65290   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:21:29,466-Speed 3379.35 samples/sec   Loss 6.1391   LearningRate 0.0543   Epoch: 5   Global Step: 65300   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:21:32,549-Speed 3322.43 samples/sec   Loss 6.0033   LearningRate 0.0543   Epoch: 5   Global Step: 65310   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:21:35,615-Speed 3341.31 samples/sec   Loss 6.1899   LearningRate 0.0543   Epoch: 5   Global Step: 65320   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:21:38,650-Speed 3375.24 samples/sec   Loss 6.1413   LearningRate 0.0543   Epoch: 5   Global Step: 65330   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:21:41,698-Speed 3360.11 samples/sec   Loss 6.0502   LearningRate 0.0543   Epoch: 5   Global Step: 65340   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:21:44,709-Speed 3402.04 samples/sec   Loss 6.0085   LearningRate 0.0543   Epoch: 5   Global Step: 65350   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:21:47,782-Speed 3333.53 samples/sec   Loss 6.1223   LearningRate 0.0543   Epoch: 5   Global Step: 65360   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:21:50,874-Speed 3313.06 samples/sec   Loss 6.1045   LearningRate 0.0543   Epoch: 5   Global Step: 65370   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:21:54,048-Speed 3227.62 samples/sec   Loss 5.9279   LearningRate 0.0543   Epoch: 5   Global Step: 65380   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:21:57,059-Speed 3402.06 samples/sec   Loss 6.1190   LearningRate 0.0543   Epoch: 5   Global Step: 65390   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:22:00,168-Speed 3294.86 samples/sec   Loss 6.0767   LearningRate 0.0543   Epoch: 5   Global Step: 65400   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:22:03,260-Speed 3312.78 samples/sec   Loss 6.0935   LearningRate 0.0543   Epoch: 5   Global Step: 65410   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:22:06,374-Speed 3289.70 samples/sec   Loss 6.0256   LearningRate 0.0543   Epoch: 5   Global Step: 65420   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:22:09,360-Speed 3430.16 samples/sec   Loss 6.0602   LearningRate 0.0543   Epoch: 5   Global Step: 65430   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:22:12,422-Speed 3345.81 samples/sec   Loss 5.9502   LearningRate 0.0543   Epoch: 5   Global Step: 65440   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:22:15,494-Speed 3333.17 samples/sec   Loss 6.0160   LearningRate 0.0542   Epoch: 5   Global Step: 65450   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:22:18,555-Speed 3346.69 samples/sec   Loss 6.0671   LearningRate 0.0542   Epoch: 5   Global Step: 65460   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:22:21,599-Speed 3365.52 samples/sec   Loss 6.0538   LearningRate 0.0542   Epoch: 5   Global Step: 65470   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:22:24,645-Speed 3362.30 samples/sec   Loss 6.0769   LearningRate 0.0542   Epoch: 5   Global Step: 65480   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:22:27,719-Speed 3332.97 samples/sec   Loss 6.1387   LearningRate 0.0542   Epoch: 5   Global Step: 65490   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:22:30,787-Speed 3338.01 samples/sec   Loss 6.0685   LearningRate 0.0542   Epoch: 5   Global Step: 65500   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:22:33,829-Speed 3367.98 samples/sec   Loss 6.0988   LearningRate 0.0542   Epoch: 5   Global Step: 65510   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:22:36,886-Speed 3350.20 samples/sec   Loss 6.0836   LearningRate 0.0542   Epoch: 5   Global Step: 65520   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:22:39,922-Speed 3374.36 samples/sec   Loss 6.0537   LearningRate 0.0542   Epoch: 5   Global Step: 65530   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:22:42,962-Speed 3369.23 samples/sec   Loss 6.0407   LearningRate 0.0542   Epoch: 5   Global Step: 65540   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:22:45,967-Speed 3408.83 samples/sec   Loss 6.1810   LearningRate 0.0542   Epoch: 5   Global Step: 65550   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:22:49,053-Speed 3318.55 samples/sec   Loss 6.0670   LearningRate 0.0542   Epoch: 5   Global Step: 65560   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:22:52,195-Speed 3260.11 samples/sec   Loss 6.1185   LearningRate 0.0542   Epoch: 5   Global Step: 65570   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:22:55,240-Speed 3364.74 samples/sec   Loss 6.0157   LearningRate 0.0542   Epoch: 5   Global Step: 65580   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:22:58,285-Speed 3363.87 samples/sec   Loss 6.1088   LearningRate 0.0542   Epoch: 5   Global Step: 65590   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:23:01,359-Speed 3331.94 samples/sec   Loss 6.1867   LearningRate 0.0542   Epoch: 5   Global Step: 65600   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:23:04,429-Speed 3337.17 samples/sec   Loss 6.1124   LearningRate 0.0542   Epoch: 5   Global Step: 65610   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:23:07,497-Speed 3337.94 samples/sec   Loss 6.1147   LearningRate 0.0541   Epoch: 5   Global Step: 65620   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:23:10,526-Speed 3382.16 samples/sec   Loss 6.0163   LearningRate 0.0541   Epoch: 5   Global Step: 65630   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:23:13,563-Speed 3373.10 samples/sec   Loss 6.1767   LearningRate 0.0541   Epoch: 5   Global Step: 65640   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:23:16,720-Speed 3243.83 samples/sec   Loss 6.0803   LearningRate 0.0541   Epoch: 5   Global Step: 65650   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:23:19,786-Speed 3340.81 samples/sec   Loss 6.1771   LearningRate 0.0541   Epoch: 5   Global Step: 65660   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:23:22,806-Speed 3391.87 samples/sec   Loss 6.1521   LearningRate 0.0541   Epoch: 5   Global Step: 65670   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:23:25,837-Speed 3379.77 samples/sec   Loss 6.1209   LearningRate 0.0541   Epoch: 5   Global Step: 65680   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:23:28,946-Speed 3294.88 samples/sec   Loss 6.0803   LearningRate 0.0541   Epoch: 5   Global Step: 65690   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:23:32,004-Speed 3349.88 samples/sec   Loss 6.1031   LearningRate 0.0541   Epoch: 5   Global Step: 65700   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:23:35,037-Speed 3377.58 samples/sec   Loss 6.0247   LearningRate 0.0541   Epoch: 5   Global Step: 65710   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:23:38,119-Speed 3323.32 samples/sec   Loss 5.9756   LearningRate 0.0541   Epoch: 5   Global Step: 65720   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:23:41,146-Speed 3384.20 samples/sec   Loss 6.0334   LearningRate 0.0541   Epoch: 5   Global Step: 65730   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:23:44,154-Speed 3404.70 samples/sec   Loss 6.0369   LearningRate 0.0541   Epoch: 5   Global Step: 65740   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:23:47,150-Speed 3419.61 samples/sec   Loss 5.9943   LearningRate 0.0541   Epoch: 5   Global Step: 65750   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:23:50,220-Speed 3337.20 samples/sec   Loss 6.1388   LearningRate 0.0541   Epoch: 5   Global Step: 65760   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:23:53,298-Speed 3327.69 samples/sec   Loss 6.0806   LearningRate 0.0541   Epoch: 5   Global Step: 65770   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:23:56,368-Speed 3336.28 samples/sec   Loss 6.2331   LearningRate 0.0541   Epoch: 5   Global Step: 65780   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:23:59,404-Speed 3374.12 samples/sec   Loss 6.0297   LearningRate 0.0540   Epoch: 5   Global Step: 65790   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:24:02,459-Speed 3352.92 samples/sec   Loss 6.2841   LearningRate 0.0540   Epoch: 5   Global Step: 65800   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:24:05,498-Speed 3370.67 samples/sec   Loss 6.1594   LearningRate 0.0540   Epoch: 5   Global Step: 65810   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:24:08,549-Speed 3356.85 samples/sec   Loss 6.1946   LearningRate 0.0540   Epoch: 5   Global Step: 65820   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:24:11,570-Speed 3391.27 samples/sec   Loss 6.0708   LearningRate 0.0540   Epoch: 5   Global Step: 65830   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:24:14,616-Speed 3362.97 samples/sec   Loss 6.1647   LearningRate 0.0540   Epoch: 5   Global Step: 65840   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:24:17,668-Speed 3356.38 samples/sec   Loss 6.0359   LearningRate 0.0540   Epoch: 5   Global Step: 65850   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:24:20,699-Speed 3379.01 samples/sec   Loss 6.1263   LearningRate 0.0540   Epoch: 5   Global Step: 65860   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:24:23,752-Speed 3355.89 samples/sec   Loss 6.2322   LearningRate 0.0540   Epoch: 5   Global Step: 65870   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:24:26,775-Speed 3387.80 samples/sec   Loss 6.0872   LearningRate 0.0540   Epoch: 5   Global Step: 65880   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:24:29,845-Speed 3336.91 samples/sec   Loss 6.1177   LearningRate 0.0540   Epoch: 5   Global Step: 65890   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:24:32,840-Speed 3420.15 samples/sec   Loss 6.1473   LearningRate 0.0540   Epoch: 5   Global Step: 65900   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:24:35,908-Speed 3339.01 samples/sec   Loss 5.9860   LearningRate 0.0540   Epoch: 5   Global Step: 65910   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:24:38,973-Speed 3341.88 samples/sec   Loss 6.1003   LearningRate 0.0540   Epoch: 5   Global Step: 65920   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:24:42,063-Speed 3314.84 samples/sec   Loss 6.1906   LearningRate 0.0540   Epoch: 5   Global Step: 65930   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:24:45,061-Speed 3417.64 samples/sec   Loss 6.0547   LearningRate 0.0540   Epoch: 5   Global Step: 65940   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:24:48,061-Speed 3413.74 samples/sec   Loss 6.0985   LearningRate 0.0540   Epoch: 5   Global Step: 65950   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:24:51,079-Speed 3393.84 samples/sec   Loss 5.9440   LearningRate 0.0539   Epoch: 5   Global Step: 65960   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:24:54,111-Speed 3379.25 samples/sec   Loss 6.1316   LearningRate 0.0539   Epoch: 5   Global Step: 65970   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:24:57,148-Speed 3373.21 samples/sec   Loss 6.1076   LearningRate 0.0539   Epoch: 5   Global Step: 65980   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:25:00,166-Speed 3393.53 samples/sec   Loss 6.1202   LearningRate 0.0539   Epoch: 5   Global Step: 65990   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:25:03,206-Speed 3369.78 samples/sec   Loss 6.1849   LearningRate 0.0539   Epoch: 5   Global Step: 66000   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:25:06,229-Speed 3388.05 samples/sec   Loss 6.1246   LearningRate 0.0539   Epoch: 5   Global Step: 66010   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:25:09,226-Speed 3417.77 samples/sec   Loss 6.1298   LearningRate 0.0539   Epoch: 5   Global Step: 66020   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:25:12,243-Speed 3395.85 samples/sec   Loss 6.0528   LearningRate 0.0539   Epoch: 5   Global Step: 66030   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:25:15,254-Speed 3401.60 samples/sec   Loss 6.0633   LearningRate 0.0539   Epoch: 5   Global Step: 66040   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:25:18,332-Speed 3327.63 samples/sec   Loss 6.1680   LearningRate 0.0539   Epoch: 5   Global Step: 66050   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:25:21,353-Speed 3390.70 samples/sec   Loss 6.1983   LearningRate 0.0539   Epoch: 5   Global Step: 66060   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:25:24,375-Speed 3389.71 samples/sec   Loss 6.1407   LearningRate 0.0539   Epoch: 5   Global Step: 66070   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:25:27,425-Speed 3358.79 samples/sec   Loss 6.1767   LearningRate 0.0539   Epoch: 5   Global Step: 66080   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:25:30,461-Speed 3373.93 samples/sec   Loss 6.1132   LearningRate 0.0539   Epoch: 5   Global Step: 66090   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:25:33,478-Speed 3394.77 samples/sec   Loss 6.1451   LearningRate 0.0539   Epoch: 5   Global Step: 66100   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:25:36,539-Speed 3346.27 samples/sec   Loss 6.2297   LearningRate 0.0539   Epoch: 5   Global Step: 66110   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:25:39,555-Speed 3396.57 samples/sec   Loss 6.1659   LearningRate 0.0539   Epoch: 5   Global Step: 66120   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:25:42,560-Speed 3408.14 samples/sec   Loss 6.1660   LearningRate 0.0538   Epoch: 5   Global Step: 66130   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:25:45,603-Speed 3366.63 samples/sec   Loss 6.2017   LearningRate 0.0538   Epoch: 5   Global Step: 66140   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:25:48,639-Speed 3373.80 samples/sec   Loss 6.1808   LearningRate 0.0538   Epoch: 5   Global Step: 66150   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:25:51,716-Speed 3328.98 samples/sec   Loss 5.9989   LearningRate 0.0538   Epoch: 5   Global Step: 66160   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:25:54,749-Speed 3377.11 samples/sec   Loss 6.0838   LearningRate 0.0538   Epoch: 5   Global Step: 66170   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:25:57,756-Speed 3406.61 samples/sec   Loss 6.1997   LearningRate 0.0538   Epoch: 5   Global Step: 66180   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:26:00,781-Speed 3386.11 samples/sec   Loss 6.1983   LearningRate 0.0538   Epoch: 5   Global Step: 66190   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:26:03,847-Speed 3341.28 samples/sec   Loss 6.2310   LearningRate 0.0538   Epoch: 5   Global Step: 66200   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:26:06,881-Speed 3377.05 samples/sec   Loss 6.1564   LearningRate 0.0538   Epoch: 5   Global Step: 66210   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:26:09,899-Speed 3393.78 samples/sec   Loss 6.2604   LearningRate 0.0538   Epoch: 5   Global Step: 66220   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:26:12,904-Speed 3407.96 samples/sec   Loss 6.2134   LearningRate 0.0538   Epoch: 5   Global Step: 66230   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:26:15,971-Speed 3340.37 samples/sec   Loss 6.2172   LearningRate 0.0538   Epoch: 5   Global Step: 66240   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:26:19,092-Speed 3282.11 samples/sec   Loss 6.1359   LearningRate 0.0538   Epoch: 5   Global Step: 66250   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:26:22,105-Speed 3398.88 samples/sec   Loss 6.2632   LearningRate 0.0538   Epoch: 5   Global Step: 66260   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:26:25,221-Speed 3288.24 samples/sec   Loss 6.2352   LearningRate 0.0538   Epoch: 5   Global Step: 66270   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:26:28,317-Speed 3308.12 samples/sec   Loss 6.1892   LearningRate 0.0538   Epoch: 5   Global Step: 66280   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:26:31,371-Speed 3353.69 samples/sec   Loss 6.0791   LearningRate 0.0538   Epoch: 5   Global Step: 66290   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:26:34,395-Speed 3387.63 samples/sec   Loss 6.1722   LearningRate 0.0537   Epoch: 5   Global Step: 66300   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:26:37,431-Speed 3373.70 samples/sec   Loss 6.0884   LearningRate 0.0537   Epoch: 5   Global Step: 66310   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:26:40,495-Speed 3342.82 samples/sec   Loss 6.1583   LearningRate 0.0537   Epoch: 5   Global Step: 66320   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:26:43,516-Speed 3391.02 samples/sec   Loss 6.0961   LearningRate 0.0537   Epoch: 5   Global Step: 66330   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:26:46,582-Speed 3341.01 samples/sec   Loss 6.1132   LearningRate 0.0537   Epoch: 5   Global Step: 66340   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:26:49,625-Speed 3366.47 samples/sec   Loss 6.2687   LearningRate 0.0537   Epoch: 5   Global Step: 66350   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:26:52,733-Speed 3295.99 samples/sec   Loss 6.1666   LearningRate 0.0537   Epoch: 5   Global Step: 66360   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:26:55,794-Speed 3345.68 samples/sec   Loss 6.0968   LearningRate 0.0537   Epoch: 5   Global Step: 66370   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:26:58,839-Speed 3364.15 samples/sec   Loss 6.1185   LearningRate 0.0537   Epoch: 5   Global Step: 66380   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:27:01,938-Speed 3305.94 samples/sec   Loss 6.1602   LearningRate 0.0537   Epoch: 5   Global Step: 66390   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:27:05,014-Speed 3329.33 samples/sec   Loss 6.1822   LearningRate 0.0537   Epoch: 5   Global Step: 66400   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:27:08,050-Speed 3374.97 samples/sec   Loss 6.2206   LearningRate 0.0537   Epoch: 5   Global Step: 66410   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:27:11,105-Speed 3352.12 samples/sec   Loss 6.3034   LearningRate 0.0537   Epoch: 5   Global Step: 66420   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:27:14,161-Speed 3352.07 samples/sec   Loss 6.1667   LearningRate 0.0537   Epoch: 5   Global Step: 66430   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:27:17,197-Speed 3374.32 samples/sec   Loss 6.2454   LearningRate 0.0537   Epoch: 5   Global Step: 66440   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:27:20,203-Speed 3407.67 samples/sec   Loss 6.2082   LearningRate 0.0537   Epoch: 5   Global Step: 66450   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:27:23,230-Speed 3383.33 samples/sec   Loss 6.1518   LearningRate 0.0537   Epoch: 5   Global Step: 66460   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:27:26,303-Speed 3333.10 samples/sec   Loss 6.2230   LearningRate 0.0536   Epoch: 5   Global Step: 66470   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:27:29,428-Speed 3277.76 samples/sec   Loss 6.1834   LearningRate 0.0536   Epoch: 5   Global Step: 66480   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:27:32,496-Speed 3339.13 samples/sec   Loss 6.1844   LearningRate 0.0536   Epoch: 5   Global Step: 66490   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:27:35,545-Speed 3359.84 samples/sec   Loss 6.1649   LearningRate 0.0536   Epoch: 5   Global Step: 66500   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:27:38,546-Speed 3412.75 samples/sec   Loss 6.1532   LearningRate 0.0536   Epoch: 5   Global Step: 66510   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-04-27 07:27:41,600-Speed 3354.19 samples/sec   Loss 6.2266   LearningRate 0.0536   Epoch: 5   Global Step: 66520   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:27:44,620-Speed 3391.11 samples/sec   Loss 6.2143   LearningRate 0.0536   Epoch: 5   Global Step: 66530   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:27:47,710-Speed 3314.70 samples/sec   Loss 6.2982   LearningRate 0.0536   Epoch: 5   Global Step: 66540   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:27:50,794-Speed 3322.62 samples/sec   Loss 6.1839   LearningRate 0.0536   Epoch: 5   Global Step: 66550   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:27:53,818-Speed 3386.85 samples/sec   Loss 6.1504   LearningRate 0.0536   Epoch: 5   Global Step: 66560   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:27:56,841-Speed 3388.86 samples/sec   Loss 6.2166   LearningRate 0.0536   Epoch: 5   Global Step: 66570   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:27:59,865-Speed 3386.51 samples/sec   Loss 6.1423   LearningRate 0.0536   Epoch: 5   Global Step: 66580   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:28:02,963-Speed 3307.28 samples/sec   Loss 6.2773   LearningRate 0.0536   Epoch: 5   Global Step: 66590   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:28:06,021-Speed 3348.74 samples/sec   Loss 6.2048   LearningRate 0.0536   Epoch: 5   Global Step: 66600   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:28:09,051-Speed 3381.64 samples/sec   Loss 6.2172   LearningRate 0.0536   Epoch: 5   Global Step: 66610   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:28:12,115-Speed 3342.61 samples/sec   Loss 6.0876   LearningRate 0.0536   Epoch: 5   Global Step: 66620   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:28:15,148-Speed 3377.47 samples/sec   Loss 6.1779   LearningRate 0.0536   Epoch: 5   Global Step: 66630   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:28:18,239-Speed 3313.69 samples/sec   Loss 6.1539   LearningRate 0.0535   Epoch: 5   Global Step: 66640   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:28:21,250-Speed 3401.79 samples/sec   Loss 6.2150   LearningRate 0.0535   Epoch: 5   Global Step: 66650   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:28:24,324-Speed 3332.86 samples/sec   Loss 6.2785   LearningRate 0.0535   Epoch: 5   Global Step: 66660   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:28:27,414-Speed 3314.38 samples/sec   Loss 6.1714   LearningRate 0.0535   Epoch: 5   Global Step: 66670   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:28:30,439-Speed 3386.12 samples/sec   Loss 6.2559   LearningRate 0.0535   Epoch: 5   Global Step: 66680   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:28:33,450-Speed 3402.16 samples/sec   Loss 6.2712   LearningRate 0.0535   Epoch: 5   Global Step: 66690   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:28:36,470-Speed 3392.20 samples/sec   Loss 6.2290   LearningRate 0.0535   Epoch: 5   Global Step: 66700   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:28:39,513-Speed 3365.80 samples/sec   Loss 6.2252   LearningRate 0.0535   Epoch: 5   Global Step: 66710   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:28:42,547-Speed 3375.75 samples/sec   Loss 6.1520   LearningRate 0.0535   Epoch: 5   Global Step: 66720   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:28:45,645-Speed 3307.19 samples/sec   Loss 6.3824   LearningRate 0.0535   Epoch: 5   Global Step: 66730   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:28:48,703-Speed 3349.14 samples/sec   Loss 6.1618   LearningRate 0.0535   Epoch: 5   Global Step: 66740   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:28:51,805-Speed 3302.67 samples/sec   Loss 6.2024   LearningRate 0.0535   Epoch: 5   Global Step: 66750   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:28:54,843-Speed 3372.00 samples/sec   Loss 6.2035   LearningRate 0.0535   Epoch: 5   Global Step: 66760   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:28:57,850-Speed 3406.43 samples/sec   Loss 6.1865   LearningRate 0.0535   Epoch: 5   Global Step: 66770   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:29:00,968-Speed 3284.98 samples/sec   Loss 6.0738   LearningRate 0.0535   Epoch: 5   Global Step: 66780   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:29:04,068-Speed 3303.76 samples/sec   Loss 6.2731   LearningRate 0.0535   Epoch: 5   Global Step: 66790   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:29:07,085-Speed 3395.90 samples/sec   Loss 6.2265   LearningRate 0.0535   Epoch: 5   Global Step: 66800   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:29:10,172-Speed 3318.20 samples/sec   Loss 6.1621   LearningRate 0.0534   Epoch: 5   Global Step: 66810   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:29:13,194-Speed 3389.73 samples/sec   Loss 6.0911   LearningRate 0.0534   Epoch: 5   Global Step: 66820   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:29:16,285-Speed 3312.83 samples/sec   Loss 6.1665   LearningRate 0.0534   Epoch: 5   Global Step: 66830   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:29:19,351-Speed 3340.97 samples/sec   Loss 6.1996   LearningRate 0.0534   Epoch: 5   Global Step: 66840   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:29:22,401-Speed 3359.31 samples/sec   Loss 6.1680   LearningRate 0.0534   Epoch: 5   Global Step: 66850   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:29:25,449-Speed 3359.72 samples/sec   Loss 6.2214   LearningRate 0.0534   Epoch: 5   Global Step: 66860   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-04-27 07:29:28,502-Speed 3355.38 samples/sec   Loss 6.2874   LearningRate 0.0534   Epoch: 5   Global Step: 66870   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:29:31,539-Speed 3372.96 samples/sec   Loss 6.2625   LearningRate 0.0534   Epoch: 5   Global Step: 66880   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-04-27 07:29:34,568-Speed 3382.15 samples/sec   Loss 6.2071   LearningRate 0.0534   Epoch: 5   Global Step: 66890   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:29:37,616-Speed 3360.44 samples/sec   Loss 6.3356   LearningRate 0.0534   Epoch: 5   Global Step: 66900   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:29:40,660-Speed 3364.94 samples/sec   Loss 6.2919   LearningRate 0.0534   Epoch: 5   Global Step: 66910   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:29:43,670-Speed 3403.63 samples/sec   Loss 6.2088   LearningRate 0.0534   Epoch: 5   Global Step: 66920   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:29:46,693-Speed 3388.16 samples/sec   Loss 6.2151   LearningRate 0.0534   Epoch: 5   Global Step: 66930   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:29:49,749-Speed 3352.24 samples/sec   Loss 6.1837   LearningRate 0.0534   Epoch: 5   Global Step: 66940   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:29:52,794-Speed 3363.76 samples/sec   Loss 6.2396   LearningRate 0.0534   Epoch: 5   Global Step: 66950   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:29:55,865-Speed 3335.84 samples/sec   Loss 6.2086   LearningRate 0.0534   Epoch: 5   Global Step: 66960   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:29:58,923-Speed 3348.92 samples/sec   Loss 6.1671   LearningRate 0.0534   Epoch: 5   Global Step: 66970   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:30:01,997-Speed 3332.78 samples/sec   Loss 6.1575   LearningRate 0.0533   Epoch: 5   Global Step: 66980   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:30:05,086-Speed 3315.47 samples/sec   Loss 6.1871   LearningRate 0.0533   Epoch: 5   Global Step: 66990   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:30:08,141-Speed 3353.04 samples/sec   Loss 6.1867   LearningRate 0.0533   Epoch: 5   Global Step: 67000   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:30:11,199-Speed 3350.61 samples/sec   Loss 6.2537   LearningRate 0.0533   Epoch: 5   Global Step: 67010   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:30:14,318-Speed 3284.03 samples/sec   Loss 6.2322   LearningRate 0.0533   Epoch: 5   Global Step: 67020   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:30:17,388-Speed 3335.91 samples/sec   Loss 6.3152   LearningRate 0.0533   Epoch: 5   Global Step: 67030   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:30:20,406-Speed 3394.51 samples/sec   Loss 6.1286   LearningRate 0.0533   Epoch: 5   Global Step: 67040   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:30:23,437-Speed 3378.45 samples/sec   Loss 6.2097   LearningRate 0.0533   Epoch: 5   Global Step: 67050   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:30:26,527-Speed 3315.22 samples/sec   Loss 6.2158   LearningRate 0.0533   Epoch: 5   Global Step: 67060   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:30:29,541-Speed 3399.35 samples/sec   Loss 6.1462   LearningRate 0.0533   Epoch: 5   Global Step: 67070   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:30:32,593-Speed 3356.01 samples/sec   Loss 6.3082   LearningRate 0.0533   Epoch: 5   Global Step: 67080   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:30:35,709-Speed 3286.45 samples/sec   Loss 6.2772   LearningRate 0.0533   Epoch: 5   Global Step: 67090   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:30:38,767-Speed 3350.01 samples/sec   Loss 6.2714   LearningRate 0.0533   Epoch: 5   Global Step: 67100   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:30:41,871-Speed 3300.96 samples/sec   Loss 6.3300   LearningRate 0.0533   Epoch: 5   Global Step: 67110   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:30:44,898-Speed 3383.24 samples/sec   Loss 6.3215   LearningRate 0.0533   Epoch: 5   Global Step: 67120   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:30:47,923-Speed 3386.57 samples/sec   Loss 6.2500   LearningRate 0.0533   Epoch: 5   Global Step: 67130   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:30:50,950-Speed 3384.65 samples/sec   Loss 6.2672   LearningRate 0.0533   Epoch: 5   Global Step: 67140   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:30:54,093-Speed 3258.12 samples/sec   Loss 6.1589   LearningRate 0.0532   Epoch: 5   Global Step: 67150   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:30:57,125-Speed 3378.93 samples/sec   Loss 6.2826   LearningRate 0.0532   Epoch: 5   Global Step: 67160   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:31:00,222-Speed 3307.41 samples/sec   Loss 6.2293   LearningRate 0.0532   Epoch: 5   Global Step: 67170   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:31:03,257-Speed 3374.79 samples/sec   Loss 6.2123   LearningRate 0.0532   Epoch: 5   Global Step: 67180   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:31:06,280-Speed 3389.38 samples/sec   Loss 6.1719   LearningRate 0.0532   Epoch: 5   Global Step: 67190   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:31:09,323-Speed 3366.56 samples/sec   Loss 6.3218   LearningRate 0.0532   Epoch: 5   Global Step: 67200   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:31:12,360-Speed 3372.51 samples/sec   Loss 6.2359   LearningRate 0.0532   Epoch: 5   Global Step: 67210   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:31:15,413-Speed 3355.07 samples/sec   Loss 6.2636   LearningRate 0.0532   Epoch: 5   Global Step: 67220   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:31:18,456-Speed 3366.19 samples/sec   Loss 6.1598   LearningRate 0.0532   Epoch: 5   Global Step: 67230   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:31:21,481-Speed 3386.27 samples/sec   Loss 6.2760   LearningRate 0.0532   Epoch: 5   Global Step: 67240   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:31:24,542-Speed 3345.77 samples/sec   Loss 6.2011   LearningRate 0.0532   Epoch: 5   Global Step: 67250   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:31:27,587-Speed 3364.23 samples/sec   Loss 6.1690   LearningRate 0.0532   Epoch: 5   Global Step: 67260   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:31:30,721-Speed 3268.42 samples/sec   Loss 6.2495   LearningRate 0.0532   Epoch: 5   Global Step: 67270   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:31:33,741-Speed 3392.50 samples/sec   Loss 6.2290   LearningRate 0.0532   Epoch: 5   Global Step: 67280   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:31:36,827-Speed 3319.48 samples/sec   Loss 6.2228   LearningRate 0.0532   Epoch: 5   Global Step: 67290   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:31:39,905-Speed 3326.77 samples/sec   Loss 6.1319   LearningRate 0.0532   Epoch: 5   Global Step: 67300   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:31:42,997-Speed 3313.49 samples/sec   Loss 6.2546   LearningRate 0.0532   Epoch: 5   Global Step: 67310   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:31:46,028-Speed 3379.09 samples/sec   Loss 6.1666   LearningRate 0.0531   Epoch: 5   Global Step: 67320   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:31:49,041-Speed 3400.15 samples/sec   Loss 6.3304   LearningRate 0.0531   Epoch: 5   Global Step: 67330   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:31:52,087-Speed 3363.46 samples/sec   Loss 6.1649   LearningRate 0.0531   Epoch: 5   Global Step: 67340   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:31:55,145-Speed 3349.26 samples/sec   Loss 6.1743   LearningRate 0.0531   Epoch: 5   Global Step: 67350   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:31:58,173-Speed 3383.06 samples/sec   Loss 6.3077   LearningRate 0.0531   Epoch: 5   Global Step: 67360   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:32:01,184-Speed 3402.15 samples/sec   Loss 6.1317   LearningRate 0.0531   Epoch: 5   Global Step: 67370   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:32:04,191-Speed 3405.99 samples/sec   Loss 6.1742   LearningRate 0.0531   Epoch: 5   Global Step: 67380   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:32:07,381-Speed 3211.16 samples/sec   Loss 6.2969   LearningRate 0.0531   Epoch: 5   Global Step: 67390   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:32:10,416-Speed 3375.54 samples/sec   Loss 6.3369   LearningRate 0.0531   Epoch: 5   Global Step: 67400   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:32:13,465-Speed 3359.65 samples/sec   Loss 6.2254   LearningRate 0.0531   Epoch: 5   Global Step: 67410   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:32:16,534-Speed 3337.03 samples/sec   Loss 6.1326   LearningRate 0.0531   Epoch: 5   Global Step: 67420   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:32:19,561-Speed 3385.10 samples/sec   Loss 6.3254   LearningRate 0.0531   Epoch: 5   Global Step: 67430   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:32:22,590-Speed 3381.20 samples/sec   Loss 6.2561   LearningRate 0.0531   Epoch: 5   Global Step: 67440   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:32:25,674-Speed 3321.19 samples/sec   Loss 6.0671   LearningRate 0.0531   Epoch: 5   Global Step: 67450   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:32:28,690-Speed 3395.98 samples/sec   Loss 6.1965   LearningRate 0.0531   Epoch: 5   Global Step: 67460   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:32:31,749-Speed 3348.67 samples/sec   Loss 6.2369   LearningRate 0.0531   Epoch: 5   Global Step: 67470   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:32:34,800-Speed 3357.41 samples/sec   Loss 6.2801   LearningRate 0.0531   Epoch: 5   Global Step: 67480   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:32:37,832-Speed 3378.59 samples/sec   Loss 6.2600   LearningRate 0.0530   Epoch: 5   Global Step: 67490   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:32:40,939-Speed 3297.11 samples/sec   Loss 6.4558   LearningRate 0.0530   Epoch: 5   Global Step: 67500   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:32:44,013-Speed 3331.78 samples/sec   Loss 6.3555   LearningRate 0.0530   Epoch: 5   Global Step: 67510   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:32:47,082-Speed 3338.34 samples/sec   Loss 6.2067   LearningRate 0.0530   Epoch: 5   Global Step: 67520   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:32:50,128-Speed 3362.78 samples/sec   Loss 6.2829   LearningRate 0.0530   Epoch: 5   Global Step: 67530   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:32:53,229-Speed 3302.65 samples/sec   Loss 6.2158   LearningRate 0.0530   Epoch: 5   Global Step: 67540   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:32:56,265-Speed 3374.36 samples/sec   Loss 6.2769   LearningRate 0.0530   Epoch: 5   Global Step: 67550   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:32:59,325-Speed 3347.09 samples/sec   Loss 6.3115   LearningRate 0.0530   Epoch: 5   Global Step: 67560   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:33:02,388-Speed 3344.59 samples/sec   Loss 6.2511   LearningRate 0.0530   Epoch: 5   Global Step: 67570   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:33:05,458-Speed 3336.58 samples/sec   Loss 6.3631   LearningRate 0.0530   Epoch: 5   Global Step: 67580   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:33:08,478-Speed 3391.72 samples/sec   Loss 6.2913   LearningRate 0.0530   Epoch: 5   Global Step: 67590   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:33:11,562-Speed 3321.71 samples/sec   Loss 6.3128   LearningRate 0.0530   Epoch: 5   Global Step: 67600   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:33:14,600-Speed 3371.46 samples/sec   Loss 6.3314   LearningRate 0.0530   Epoch: 5   Global Step: 67610   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:33:17,646-Speed 3363.35 samples/sec   Loss 6.2855   LearningRate 0.0530   Epoch: 5   Global Step: 67620   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:33:20,726-Speed 3325.53 samples/sec   Loss 6.2649   LearningRate 0.0530   Epoch: 5   Global Step: 67630   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:33:23,745-Speed 3392.92 samples/sec   Loss 6.1858   LearningRate 0.0530   Epoch: 5   Global Step: 67640   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:33:26,804-Speed 3348.51 samples/sec   Loss 6.1402   LearningRate 0.0530   Epoch: 5   Global Step: 67650   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:33:29,907-Speed 3301.52 samples/sec   Loss 6.3845   LearningRate 0.0529   Epoch: 5   Global Step: 67660   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:33:32,941-Speed 3375.25 samples/sec   Loss 6.3436   LearningRate 0.0529   Epoch: 5   Global Step: 67670   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:33:36,085-Speed 3258.46 samples/sec   Loss 6.2328   LearningRate 0.0529   Epoch: 5   Global Step: 67680   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:33:39,129-Speed 3365.41 samples/sec   Loss 6.2902   LearningRate 0.0529   Epoch: 5   Global Step: 67690   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:33:42,208-Speed 3326.08 samples/sec   Loss 6.2736   LearningRate 0.0529   Epoch: 5   Global Step: 67700   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:33:45,284-Speed 3329.90 samples/sec   Loss 6.3418   LearningRate 0.0529   Epoch: 5   Global Step: 67710   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:33:48,359-Speed 3331.69 samples/sec   Loss 6.3074   LearningRate 0.0529   Epoch: 5   Global Step: 67720   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:33:51,437-Speed 3328.19 samples/sec   Loss 6.2744   LearningRate 0.0529   Epoch: 5   Global Step: 67730   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:33:54,476-Speed 3369.90 samples/sec   Loss 6.3275   LearningRate 0.0529   Epoch: 5   Global Step: 67740   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:33:57,546-Speed 3337.08 samples/sec   Loss 6.1965   LearningRate 0.0529   Epoch: 5   Global Step: 67750   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:34:00,716-Speed 3230.97 samples/sec   Loss 6.3762   LearningRate 0.0529   Epoch: 5   Global Step: 67760   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:34:03,768-Speed 3356.92 samples/sec   Loss 6.3030   LearningRate 0.0529   Epoch: 5   Global Step: 67770   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:34:06,785-Speed 3395.58 samples/sec   Loss 6.2368   LearningRate 0.0529   Epoch: 5   Global Step: 67780   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:34:09,803-Speed 3392.80 samples/sec   Loss 6.3434   LearningRate 0.0529   Epoch: 5   Global Step: 67790   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:34:12,850-Speed 3362.06 samples/sec   Loss 6.2896   LearningRate 0.0529   Epoch: 5   Global Step: 67800   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:34:15,937-Speed 3318.21 samples/sec   Loss 6.3288   LearningRate 0.0529   Epoch: 5   Global Step: 67810   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:34:19,007-Speed 3337.02 samples/sec   Loss 6.3683   LearningRate 0.0529   Epoch: 5   Global Step: 67820   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:34:22,022-Speed 3397.03 samples/sec   Loss 6.1344   LearningRate 0.0528   Epoch: 5   Global Step: 67830   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:34:25,172-Speed 3252.23 samples/sec   Loss 6.2680   LearningRate 0.0528   Epoch: 5   Global Step: 67840   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:34:28,333-Speed 3240.87 samples/sec   Loss 6.2648   LearningRate 0.0528   Epoch: 5   Global Step: 67850   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:34:31,474-Speed 3260.50 samples/sec   Loss 6.3416   LearningRate 0.0528   Epoch: 5   Global Step: 67860   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:34:34,504-Speed 3381.59 samples/sec   Loss 6.3326   LearningRate 0.0528   Epoch: 5   Global Step: 67870   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:34:37,632-Speed 3274.36 samples/sec   Loss 6.3008   LearningRate 0.0528   Epoch: 5   Global Step: 67880   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:34:40,706-Speed 3332.11 samples/sec   Loss 6.2443   LearningRate 0.0528   Epoch: 5   Global Step: 67890   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:34:43,758-Speed 3356.65 samples/sec   Loss 6.3350   LearningRate 0.0528   Epoch: 5   Global Step: 67900   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:34:46,824-Speed 3340.73 samples/sec   Loss 6.1825   LearningRate 0.0528   Epoch: 5   Global Step: 67910   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:34:49,820-Speed 3418.90 samples/sec   Loss 6.3012   LearningRate 0.0528   Epoch: 5   Global Step: 67920   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:34:52,852-Speed 3377.79 samples/sec   Loss 6.3458   LearningRate 0.0528   Epoch: 5   Global Step: 67930   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:34:55,966-Speed 3290.42 samples/sec   Loss 6.3437   LearningRate 0.0528   Epoch: 5   Global Step: 67940   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:34:59,097-Speed 3271.13 samples/sec   Loss 6.3454   LearningRate 0.0528   Epoch: 5   Global Step: 67950   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:35:02,158-Speed 3345.99 samples/sec   Loss 6.1795   LearningRate 0.0528   Epoch: 5   Global Step: 67960   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:35:05,312-Speed 3247.79 samples/sec   Loss 6.3087   LearningRate 0.0528   Epoch: 5   Global Step: 67970   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:35:08,372-Speed 3347.51 samples/sec   Loss 6.3738   LearningRate 0.0528   Epoch: 5   Global Step: 67980   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:35:11,483-Speed 3292.41 samples/sec   Loss 6.2195   LearningRate 0.0528   Epoch: 5   Global Step: 67990   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:35:14,518-Speed 3375.38 samples/sec   Loss 6.2938   LearningRate 0.0527   Epoch: 5   Global Step: 68000   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:35:17,540-Speed 3390.05 samples/sec   Loss 6.2450   LearningRate 0.0527   Epoch: 5   Global Step: 68010   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:35:20,572-Speed 3378.83 samples/sec   Loss 6.1707   LearningRate 0.0527   Epoch: 5   Global Step: 68020   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:35:23,609-Speed 3371.95 samples/sec   Loss 6.2986   LearningRate 0.0527   Epoch: 5   Global Step: 68030   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:35:26,689-Speed 3325.70 samples/sec   Loss 6.2469   LearningRate 0.0527   Epoch: 5   Global Step: 68040   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:35:29,736-Speed 3361.51 samples/sec   Loss 6.3322   LearningRate 0.0527   Epoch: 5   Global Step: 68050   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:35:32,755-Speed 3393.43 samples/sec   Loss 6.3057   LearningRate 0.0527   Epoch: 5   Global Step: 68060   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:35:35,796-Speed 3368.68 samples/sec   Loss 6.3819   LearningRate 0.0527   Epoch: 5   Global Step: 68070   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:35:38,858-Speed 3344.74 samples/sec   Loss 6.2950   LearningRate 0.0527   Epoch: 5   Global Step: 68080   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:35:41,899-Speed 3368.42 samples/sec   Loss 6.2362   LearningRate 0.0527   Epoch: 5   Global Step: 68090   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:35:44,964-Speed 3342.57 samples/sec   Loss 6.2848   LearningRate 0.0527   Epoch: 5   Global Step: 68100   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:35:48,103-Speed 3263.11 samples/sec   Loss 6.3431   LearningRate 0.0527   Epoch: 5   Global Step: 68110   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:35:51,159-Speed 3352.23 samples/sec   Loss 6.2533   LearningRate 0.0527   Epoch: 5   Global Step: 68120   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:35:54,238-Speed 3326.35 samples/sec   Loss 6.2313   LearningRate 0.0527   Epoch: 5   Global Step: 68130   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:35:57,299-Speed 3346.39 samples/sec   Loss 6.2685   LearningRate 0.0527   Epoch: 5   Global Step: 68140   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:36:00,391-Speed 3313.22 samples/sec   Loss 6.3834   LearningRate 0.0527   Epoch: 5   Global Step: 68150   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:36:03,484-Speed 3312.01 samples/sec   Loss 6.1897   LearningRate 0.0527   Epoch: 5   Global Step: 68160   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:36:06,591-Speed 3296.75 samples/sec   Loss 6.3125   LearningRate 0.0526   Epoch: 5   Global Step: 68170   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:36:09,637-Speed 3362.95 samples/sec   Loss 6.2978   LearningRate 0.0526   Epoch: 5   Global Step: 68180   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:36:12,697-Speed 3347.05 samples/sec   Loss 6.3473   LearningRate 0.0526   Epoch: 5   Global Step: 68190   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:36:15,767-Speed 3336.81 samples/sec   Loss 6.2455   LearningRate 0.0526   Epoch: 5   Global Step: 68200   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:36:18,857-Speed 3314.26 samples/sec   Loss 6.3750   LearningRate 0.0526   Epoch: 5   Global Step: 68210   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:36:21,867-Speed 3402.95 samples/sec   Loss 6.3278   LearningRate 0.0526   Epoch: 5   Global Step: 68220   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:36:24,939-Speed 3335.23 samples/sec   Loss 6.2548   LearningRate 0.0526   Epoch: 5   Global Step: 68230   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:36:27,974-Speed 3375.09 samples/sec   Loss 6.3010   LearningRate 0.0526   Epoch: 5   Global Step: 68240   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:36:31,051-Speed 3328.90 samples/sec   Loss 6.2917   LearningRate 0.0526   Epoch: 5   Global Step: 68250   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:36:34,173-Speed 3280.52 samples/sec   Loss 6.3946   LearningRate 0.0526   Epoch: 5   Global Step: 68260   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:36:37,298-Speed 3278.47 samples/sec   Loss 6.2216   LearningRate 0.0526   Epoch: 5   Global Step: 68270   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:36:40,384-Speed 3319.09 samples/sec   Loss 6.1328   LearningRate 0.0526   Epoch: 5   Global Step: 68280   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:36:43,412-Speed 3382.51 samples/sec   Loss 6.2071   LearningRate 0.0526   Epoch: 5   Global Step: 68290   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:36:46,469-Speed 3350.63 samples/sec   Loss 6.3266   LearningRate 0.0526   Epoch: 5   Global Step: 68300   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:36:49,557-Speed 3317.31 samples/sec   Loss 6.2358   LearningRate 0.0526   Epoch: 5   Global Step: 68310   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:36:52,708-Speed 3250.66 samples/sec   Loss 6.3809   LearningRate 0.0526   Epoch: 5   Global Step: 68320   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:36:55,854-Speed 3256.33 samples/sec   Loss 6.3366   LearningRate 0.0526   Epoch: 5   Global Step: 68330   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:36:58,897-Speed 3365.71 samples/sec   Loss 6.2842   LearningRate 0.0525   Epoch: 5   Global Step: 68340   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:37:01,954-Speed 3351.77 samples/sec   Loss 6.3679   LearningRate 0.0525   Epoch: 5   Global Step: 68350   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:37:05,012-Speed 3348.78 samples/sec   Loss 6.3270   LearningRate 0.0525   Epoch: 5   Global Step: 68360   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:37:08,038-Speed 3385.50 samples/sec   Loss 6.2863   LearningRate 0.0525   Epoch: 5   Global Step: 68370   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:37:11,077-Speed 3370.68 samples/sec   Loss 6.4104   LearningRate 0.0525   Epoch: 5   Global Step: 68380   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:37:14,107-Speed 3380.23 samples/sec   Loss 6.2873   LearningRate 0.0525   Epoch: 5   Global Step: 68390   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:37:17,171-Speed 3343.96 samples/sec   Loss 6.2484   LearningRate 0.0525   Epoch: 5   Global Step: 68400   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:37:20,218-Speed 3361.69 samples/sec   Loss 6.3154   LearningRate 0.0525   Epoch: 5   Global Step: 68410   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:37:23,255-Speed 3372.44 samples/sec   Loss 6.3259   LearningRate 0.0525   Epoch: 5   Global Step: 68420   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:37:26,293-Speed 3371.59 samples/sec   Loss 6.3056   LearningRate 0.0525   Epoch: 5   Global Step: 68430   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:37:29,423-Speed 3273.40 samples/sec   Loss 6.2728   LearningRate 0.0525   Epoch: 5   Global Step: 68440   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:37:32,458-Speed 3375.12 samples/sec   Loss 6.2544   LearningRate 0.0525   Epoch: 5   Global Step: 68450   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:37:35,535-Speed 3328.04 samples/sec   Loss 6.2905   LearningRate 0.0525   Epoch: 5   Global Step: 68460   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:37:38,588-Speed 3356.22 samples/sec   Loss 6.3335   LearningRate 0.0525   Epoch: 5   Global Step: 68470   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:37:41,672-Speed 3321.46 samples/sec   Loss 6.2834   LearningRate 0.0525   Epoch: 5   Global Step: 68480   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:37:44,756-Speed 3320.96 samples/sec   Loss 6.2068   LearningRate 0.0525   Epoch: 5   Global Step: 68490   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:37:47,821-Speed 3342.22 samples/sec   Loss 6.2148   LearningRate 0.0525   Epoch: 5   Global Step: 68500   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:37:50,927-Speed 3297.05 samples/sec   Loss 6.3437   LearningRate 0.0524   Epoch: 5   Global Step: 68510   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:37:53,967-Speed 3369.51 samples/sec   Loss 6.2919   LearningRate 0.0524   Epoch: 5   Global Step: 68520   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:37:57,051-Speed 3321.86 samples/sec   Loss 6.3376   LearningRate 0.0524   Epoch: 5   Global Step: 68530   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:38:00,123-Speed 3334.13 samples/sec   Loss 6.2070   LearningRate 0.0524   Epoch: 5   Global Step: 68540   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:38:03,203-Speed 3326.42 samples/sec   Loss 6.2589   LearningRate 0.0524   Epoch: 5   Global Step: 68550   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:38:06,299-Speed 3308.32 samples/sec   Loss 6.2798   LearningRate 0.0524   Epoch: 5   Global Step: 68560   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:38:09,345-Speed 3363.08 samples/sec   Loss 6.2798   LearningRate 0.0524   Epoch: 5   Global Step: 68570   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:38:12,383-Speed 3371.71 samples/sec   Loss 6.2776   LearningRate 0.0524   Epoch: 5   Global Step: 68580   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:38:15,436-Speed 3354.86 samples/sec   Loss 6.3598   LearningRate 0.0524   Epoch: 5   Global Step: 68590   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:38:18,469-Speed 3377.57 samples/sec   Loss 6.3914   LearningRate 0.0524   Epoch: 5   Global Step: 68600   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:38:21,517-Speed 3360.23 samples/sec   Loss 6.2179   LearningRate 0.0524   Epoch: 5   Global Step: 68610   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:38:24,631-Speed 3289.48 samples/sec   Loss 6.2988   LearningRate 0.0524   Epoch: 5   Global Step: 68620   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:38:27,699-Speed 3339.08 samples/sec   Loss 6.3582   LearningRate 0.0524   Epoch: 5   Global Step: 68630   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:38:30,871-Speed 3229.29 samples/sec   Loss 6.2581   LearningRate 0.0524   Epoch: 5   Global Step: 68640   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:38:33,911-Speed 3369.53 samples/sec   Loss 6.3058   LearningRate 0.0524   Epoch: 5   Global Step: 68650   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:38:36,992-Speed 3323.94 samples/sec   Loss 6.2593   LearningRate 0.0524   Epoch: 5   Global Step: 68660   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:38:40,131-Speed 3263.45 samples/sec   Loss 6.3556   LearningRate 0.0524   Epoch: 5   Global Step: 68670   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:38:43,182-Speed 3357.71 samples/sec   Loss 6.4178   LearningRate 0.0523   Epoch: 5   Global Step: 68680   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:38:46,213-Speed 3378.95 samples/sec   Loss 6.3205   LearningRate 0.0523   Epoch: 5   Global Step: 68690   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:38:49,259-Speed 3362.85 samples/sec   Loss 6.1861   LearningRate 0.0523   Epoch: 5   Global Step: 68700   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:38:52,364-Speed 3299.33 samples/sec   Loss 6.3236   LearningRate 0.0523   Epoch: 5   Global Step: 68710   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:38:55,416-Speed 3356.86 samples/sec   Loss 6.4031   LearningRate 0.0523   Epoch: 5   Global Step: 68720   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 07:38:58,453-Speed 3372.69 samples/sec   Loss 6.2725   LearningRate 0.0523   Epoch: 5   Global Step: 68730   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:39:01,498-Speed 3363.85 samples/sec   Loss 6.5169   LearningRate 0.0523   Epoch: 5   Global Step: 68740   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:39:04,600-Speed 3302.30 samples/sec   Loss 6.3362   LearningRate 0.0523   Epoch: 5   Global Step: 68750   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:39:07,678-Speed 3326.89 samples/sec   Loss 6.3885   LearningRate 0.0523   Epoch: 5   Global Step: 68760   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:39:10,725-Speed 3362.50 samples/sec   Loss 6.3326   LearningRate 0.0523   Epoch: 5   Global Step: 68770   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:39:13,803-Speed 3327.27 samples/sec   Loss 6.2559   LearningRate 0.0523   Epoch: 5   Global Step: 68780   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:39:16,888-Speed 3320.25 samples/sec   Loss 6.3035   LearningRate 0.0523   Epoch: 5   Global Step: 68790   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:39:19,938-Speed 3361.58 samples/sec   Loss 6.2501   LearningRate 0.0523   Epoch: 5   Global Step: 68800   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:39:23,004-Speed 3340.05 samples/sec   Loss 6.3038   LearningRate 0.0523   Epoch: 5   Global Step: 68810   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:39:26,135-Speed 3272.36 samples/sec   Loss 6.3965   LearningRate 0.0523   Epoch: 5   Global Step: 68820   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:39:29,178-Speed 3366.31 samples/sec   Loss 6.3206   LearningRate 0.0523   Epoch: 5   Global Step: 68830   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:39:32,232-Speed 3353.00 samples/sec   Loss 6.3301   LearningRate 0.0523   Epoch: 5   Global Step: 68840   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:39:35,241-Speed 3404.20 samples/sec   Loss 6.2021   LearningRate 0.0523   Epoch: 5   Global Step: 68850   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:39:38,302-Speed 3347.00 samples/sec   Loss 6.3294   LearningRate 0.0522   Epoch: 5   Global Step: 68860   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:39:41,439-Speed 3264.99 samples/sec   Loss 6.3260   LearningRate 0.0522   Epoch: 5   Global Step: 68870   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:39:44,488-Speed 3359.89 samples/sec   Loss 6.2450   LearningRate 0.0522   Epoch: 5   Global Step: 68880   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:39:47,537-Speed 3359.34 samples/sec   Loss 6.3503   LearningRate 0.0522   Epoch: 5   Global Step: 68890   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:39:50,586-Speed 3360.21 samples/sec   Loss 6.2959   LearningRate 0.0522   Epoch: 5   Global Step: 68900   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:39:53,634-Speed 3360.98 samples/sec   Loss 6.3167   LearningRate 0.0522   Epoch: 5   Global Step: 68910   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:39:56,722-Speed 3317.01 samples/sec   Loss 6.3189   LearningRate 0.0522   Epoch: 5   Global Step: 68920   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:39:59,784-Speed 3344.88 samples/sec   Loss 6.4135   LearningRate 0.0522   Epoch: 5   Global Step: 68930   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:40:02,961-Speed 3224.18 samples/sec   Loss 6.2942   LearningRate 0.0522   Epoch: 5   Global Step: 68940   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:40:06,026-Speed 3342.43 samples/sec   Loss 6.2942   LearningRate 0.0522   Epoch: 5   Global Step: 68950   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:40:09,108-Speed 3322.64 samples/sec   Loss 6.4667   LearningRate 0.0522   Epoch: 5   Global Step: 68960   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:40:12,120-Speed 3402.02 samples/sec   Loss 6.3262   LearningRate 0.0522   Epoch: 5   Global Step: 68970   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:40:15,147-Speed 3383.76 samples/sec   Loss 6.2310   LearningRate 0.0522   Epoch: 5   Global Step: 68980   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:40:18,174-Speed 3384.15 samples/sec   Loss 6.2779   LearningRate 0.0522   Epoch: 5   Global Step: 68990   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:40:21,230-Speed 3351.61 samples/sec   Loss 6.3094   LearningRate 0.0522   Epoch: 5   Global Step: 69000   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:40:24,315-Speed 3319.68 samples/sec   Loss 6.4098   LearningRate 0.0522   Epoch: 5   Global Step: 69010   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:40:27,367-Speed 3357.13 samples/sec   Loss 6.1594   LearningRate 0.0522   Epoch: 5   Global Step: 69020   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:40:30,486-Speed 3284.22 samples/sec   Loss 6.2844   LearningRate 0.0521   Epoch: 5   Global Step: 69030   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:40:33,549-Speed 3344.07 samples/sec   Loss 6.2574   LearningRate 0.0521   Epoch: 5   Global Step: 69040   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:40:36,572-Speed 3388.08 samples/sec   Loss 6.3226   LearningRate 0.0521   Epoch: 5   Global Step: 69050   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:40:39,575-Speed 3411.26 samples/sec   Loss 6.2748   LearningRate 0.0521   Epoch: 5   Global Step: 69060   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:40:42,620-Speed 3363.34 samples/sec   Loss 6.4019   LearningRate 0.0521   Epoch: 5   Global Step: 69070   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:40:45,692-Speed 3334.80 samples/sec   Loss 6.3416   LearningRate 0.0521   Epoch: 5   Global Step: 69080   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:40:48,733-Speed 3368.49 samples/sec   Loss 6.2067   LearningRate 0.0521   Epoch: 5   Global Step: 69090   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:40:51,803-Speed 3337.05 samples/sec   Loss 6.3413   LearningRate 0.0521   Epoch: 5   Global Step: 69100   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:40:54,861-Speed 3349.14 samples/sec   Loss 6.3745   LearningRate 0.0521   Epoch: 5   Global Step: 69110   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:40:57,931-Speed 3336.51 samples/sec   Loss 6.3439   LearningRate 0.0521   Epoch: 5   Global Step: 69120   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:41:00,989-Speed 3349.65 samples/sec   Loss 6.3390   LearningRate 0.0521   Epoch: 5   Global Step: 69130   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:41:04,081-Speed 3312.88 samples/sec   Loss 6.3247   LearningRate 0.0521   Epoch: 5   Global Step: 69140   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:41:07,135-Speed 3354.21 samples/sec   Loss 6.2149   LearningRate 0.0521   Epoch: 5   Global Step: 69150   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:41:10,187-Speed 3356.50 samples/sec   Loss 6.4440   LearningRate 0.0521   Epoch: 5   Global Step: 69160   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:41:13,300-Speed 3290.44 samples/sec   Loss 6.3084   LearningRate 0.0521   Epoch: 5   Global Step: 69170   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:41:16,336-Speed 3373.32 samples/sec   Loss 6.2847   LearningRate 0.0521   Epoch: 5   Global Step: 69180   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:41:19,417-Speed 3324.77 samples/sec   Loss 6.3124   LearningRate 0.0521   Epoch: 5   Global Step: 69190   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:41:22,487-Speed 3337.23 samples/sec   Loss 6.2755   LearningRate 0.0520   Epoch: 5   Global Step: 69200   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:41:25,569-Speed 3323.69 samples/sec   Loss 6.3368   LearningRate 0.0520   Epoch: 5   Global Step: 69210   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:41:28,648-Speed 3326.85 samples/sec   Loss 6.3233   LearningRate 0.0520   Epoch: 5   Global Step: 69220   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:41:31,720-Speed 3334.36 samples/sec   Loss 6.2437   LearningRate 0.0520   Epoch: 5   Global Step: 69230   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:41:34,750-Speed 3380.35 samples/sec   Loss 6.2810   LearningRate 0.0520   Epoch: 5   Global Step: 69240   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:41:37,849-Speed 3305.52 samples/sec   Loss 6.3636   LearningRate 0.0520   Epoch: 5   Global Step: 69250   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:41:40,897-Speed 3360.85 samples/sec   Loss 6.4450   LearningRate 0.0520   Epoch: 5   Global Step: 69260   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:41:44,018-Speed 3282.39 samples/sec   Loss 6.3241   LearningRate 0.0520   Epoch: 5   Global Step: 69270   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:41:47,078-Speed 3346.68 samples/sec   Loss 6.3676   LearningRate 0.0520   Epoch: 5   Global Step: 69280   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:41:50,242-Speed 3237.60 samples/sec   Loss 6.4139   LearningRate 0.0520   Epoch: 5   Global Step: 69290   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:41:53,339-Speed 3307.85 samples/sec   Loss 6.1563   LearningRate 0.0520   Epoch: 5   Global Step: 69300   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:41:56,382-Speed 3365.88 samples/sec   Loss 6.3144   LearningRate 0.0520   Epoch: 5   Global Step: 69310   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:41:59,448-Speed 3340.46 samples/sec   Loss 6.4220   LearningRate 0.0520   Epoch: 5   Global Step: 69320   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:42:02,498-Speed 3359.40 samples/sec   Loss 6.2776   LearningRate 0.0520   Epoch: 5   Global Step: 69330   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:42:05,561-Speed 3344.58 samples/sec   Loss 6.4169   LearningRate 0.0520   Epoch: 5   Global Step: 69340   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:42:08,645-Speed 3320.42 samples/sec   Loss 6.2718   LearningRate 0.0520   Epoch: 5   Global Step: 69350   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:42:11,704-Speed 3348.69 samples/sec   Loss 6.3861   LearningRate 0.0520   Epoch: 5   Global Step: 69360   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:42:14,771-Speed 3340.03 samples/sec   Loss 6.2846   LearningRate 0.0519   Epoch: 5   Global Step: 69370   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:42:17,868-Speed 3307.80 samples/sec   Loss 6.3808   LearningRate 0.0519   Epoch: 5   Global Step: 69380   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:42:20,911-Speed 3366.22 samples/sec   Loss 6.3791   LearningRate 0.0519   Epoch: 5   Global Step: 69390   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:42:24,038-Speed 3276.77 samples/sec   Loss 6.2794   LearningRate 0.0519   Epoch: 5   Global Step: 69400   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:42:27,114-Speed 3329.42 samples/sec   Loss 6.3033   LearningRate 0.0519   Epoch: 5   Global Step: 69410   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:42:30,200-Speed 3318.98 samples/sec   Loss 6.3849   LearningRate 0.0519   Epoch: 5   Global Step: 69420   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:42:33,272-Speed 3334.88 samples/sec   Loss 6.3276   LearningRate 0.0519   Epoch: 5   Global Step: 69430   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:42:36,406-Speed 3267.61 samples/sec   Loss 6.5255   LearningRate 0.0519   Epoch: 5   Global Step: 69440   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:42:39,510-Speed 3299.98 samples/sec   Loss 6.3166   LearningRate 0.0519   Epoch: 5   Global Step: 69450   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:42:42,554-Speed 3365.31 samples/sec   Loss 6.3011   LearningRate 0.0519   Epoch: 5   Global Step: 69460   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:42:45,585-Speed 3379.59 samples/sec   Loss 6.4978   LearningRate 0.0519   Epoch: 5   Global Step: 69470   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:42:48,627-Speed 3366.68 samples/sec   Loss 6.2702   LearningRate 0.0519   Epoch: 5   Global Step: 69480   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:42:51,688-Speed 3347.69 samples/sec   Loss 6.3011   LearningRate 0.0519   Epoch: 5   Global Step: 69490   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:42:54,733-Speed 3363.87 samples/sec   Loss 6.2885   LearningRate 0.0519   Epoch: 5   Global Step: 69500   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:42:57,786-Speed 3354.56 samples/sec   Loss 6.3984   LearningRate 0.0519   Epoch: 5   Global Step: 69510   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:43:00,843-Speed 3350.83 samples/sec   Loss 6.3876   LearningRate 0.0519   Epoch: 5   Global Step: 69520   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:43:03,942-Speed 3305.11 samples/sec   Loss 6.3491   LearningRate 0.0519   Epoch: 5   Global Step: 69530   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:43:07,023-Speed 3324.94 samples/sec   Loss 6.3218   LearningRate 0.0519   Epoch: 5   Global Step: 69540   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:43:10,066-Speed 3366.22 samples/sec   Loss 6.2938   LearningRate 0.0518   Epoch: 5   Global Step: 69550   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:43:13,170-Speed 3299.52 samples/sec   Loss 6.3237   LearningRate 0.0518   Epoch: 5   Global Step: 69560   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:43:16,212-Speed 3368.35 samples/sec   Loss 6.3219   LearningRate 0.0518   Epoch: 5   Global Step: 69570   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:43:19,285-Speed 3332.88 samples/sec   Loss 6.2904   LearningRate 0.0518   Epoch: 5   Global Step: 69580   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:43:22,374-Speed 3315.57 samples/sec   Loss 6.3924   LearningRate 0.0518   Epoch: 5   Global Step: 69590   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:43:25,475-Speed 3303.48 samples/sec   Loss 6.2856   LearningRate 0.0518   Epoch: 5   Global Step: 69600   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:43:28,611-Speed 3265.99 samples/sec   Loss 6.4063   LearningRate 0.0518   Epoch: 5   Global Step: 69610   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:43:31,727-Speed 3287.05 samples/sec   Loss 6.3475   LearningRate 0.0518   Epoch: 5   Global Step: 69620   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:43:34,788-Speed 3346.76 samples/sec   Loss 6.3129   LearningRate 0.0518   Epoch: 5   Global Step: 69630   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:43:37,926-Speed 3264.68 samples/sec   Loss 6.3970   LearningRate 0.0518   Epoch: 5   Global Step: 69640   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:43:41,006-Speed 3325.14 samples/sec   Loss 6.4451   LearningRate 0.0518   Epoch: 5   Global Step: 69650   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:43:44,062-Speed 3352.36 samples/sec   Loss 6.3753   LearningRate 0.0518   Epoch: 5   Global Step: 69660   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:43:47,154-Speed 3312.92 samples/sec   Loss 6.2767   LearningRate 0.0518   Epoch: 5   Global Step: 69670   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:43:50,227-Speed 3333.45 samples/sec   Loss 6.3311   LearningRate 0.0518   Epoch: 5   Global Step: 69680   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:43:53,327-Speed 3303.89 samples/sec   Loss 6.2973   LearningRate 0.0518   Epoch: 5   Global Step: 69690   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:43:56,381-Speed 3354.66 samples/sec   Loss 6.3205   LearningRate 0.0518   Epoch: 5   Global Step: 69700   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:43:59,442-Speed 3346.03 samples/sec   Loss 6.3863   LearningRate 0.0518   Epoch: 5   Global Step: 69710   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:44:02,535-Speed 3312.16 samples/sec   Loss 6.4187   LearningRate 0.0517   Epoch: 5   Global Step: 69720   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:44:05,619-Speed 3321.12 samples/sec   Loss 6.3524   LearningRate 0.0517   Epoch: 5   Global Step: 69730   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:44:08,660-Speed 3368.24 samples/sec   Loss 6.3589   LearningRate 0.0517   Epoch: 5   Global Step: 69740   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:44:11,680-Speed 3392.30 samples/sec   Loss 6.3395   LearningRate 0.0517   Epoch: 5   Global Step: 69750   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:44:14,743-Speed 3344.47 samples/sec   Loss 6.2957   LearningRate 0.0517   Epoch: 5   Global Step: 69760   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:44:17,804-Speed 3346.07 samples/sec   Loss 6.2759   LearningRate 0.0517   Epoch: 5   Global Step: 69770   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:44:20,872-Speed 3338.22 samples/sec   Loss 6.3579   LearningRate 0.0517   Epoch: 5   Global Step: 69780   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:44:23,981-Speed 3294.65 samples/sec   Loss 6.2773   LearningRate 0.0517   Epoch: 5   Global Step: 69790   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:44:27,095-Speed 3289.59 samples/sec   Loss 6.3691   LearningRate 0.0517   Epoch: 5   Global Step: 69800   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:44:30,229-Speed 3268.73 samples/sec   Loss 6.2563   LearningRate 0.0517   Epoch: 5   Global Step: 69810   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:44:33,306-Speed 3328.35 samples/sec   Loss 6.2505   LearningRate 0.0517   Epoch: 5   Global Step: 69820   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:44:36,432-Speed 3277.36 samples/sec   Loss 6.3322   LearningRate 0.0517   Epoch: 5   Global Step: 69830   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:44:39,484-Speed 3356.49 samples/sec   Loss 6.2635   LearningRate 0.0517   Epoch: 5   Global Step: 69840   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:44:42,634-Speed 3251.51 samples/sec   Loss 6.3292   LearningRate 0.0517   Epoch: 5   Global Step: 69850   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:44:45,668-Speed 3375.74 samples/sec   Loss 6.2216   LearningRate 0.0517   Epoch: 5   Global Step: 69860   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:44:48,726-Speed 3349.99 samples/sec   Loss 6.4179   LearningRate 0.0517   Epoch: 5   Global Step: 69870   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:44:51,843-Speed 3287.95 samples/sec   Loss 6.3594   LearningRate 0.0517   Epoch: 5   Global Step: 69880   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:44:54,973-Speed 3273.00 samples/sec   Loss 6.3542   LearningRate 0.0516   Epoch: 5   Global Step: 69890   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:44:58,033-Speed 3347.70 samples/sec   Loss 6.4765   LearningRate 0.0516   Epoch: 5   Global Step: 69900   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:45:01,137-Speed 3299.81 samples/sec   Loss 6.3775   LearningRate 0.0516   Epoch: 5   Global Step: 69910   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:45:04,274-Speed 3265.46 samples/sec   Loss 6.3864   LearningRate 0.0516   Epoch: 5   Global Step: 69920   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:45:07,357-Speed 3322.36 samples/sec   Loss 6.3015   LearningRate 0.0516   Epoch: 5   Global Step: 69930   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:45:10,418-Speed 3346.36 samples/sec   Loss 6.3043   LearningRate 0.0516   Epoch: 5   Global Step: 69940   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:45:13,523-Speed 3298.51 samples/sec   Loss 6.2929   LearningRate 0.0516   Epoch: 5   Global Step: 69950   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:45:16,627-Speed 3300.30 samples/sec   Loss 6.3483   LearningRate 0.0516   Epoch: 5   Global Step: 69960   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:45:19,710-Speed 3322.99 samples/sec   Loss 6.4233   LearningRate 0.0516   Epoch: 5   Global Step: 69970   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:45:22,786-Speed 3330.32 samples/sec   Loss 6.2937   LearningRate 0.0516   Epoch: 5   Global Step: 69980   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:45:25,873-Speed 3317.83 samples/sec   Loss 6.1548   LearningRate 0.0516   Epoch: 5   Global Step: 69990   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:45:28,948-Speed 3330.82 samples/sec   Loss 6.2502   LearningRate 0.0516   Epoch: 5   Global Step: 70000   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:45:32,012-Speed 3343.75 samples/sec   Loss 6.2923   LearningRate 0.0516   Epoch: 5   Global Step: 70010   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:45:35,105-Speed 3311.63 samples/sec   Loss 6.3991   LearningRate 0.0516   Epoch: 5   Global Step: 70020   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 07:45:38,181-Speed 3330.22 samples/sec   Loss 6.2314   LearningRate 0.0516   Epoch: 5   Global Step: 70030   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:45:41,246-Speed 3341.41 samples/sec   Loss 6.3203   LearningRate 0.0516   Epoch: 5   Global Step: 70040   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:45:44,310-Speed 3343.46 samples/sec   Loss 6.3349   LearningRate 0.0516   Epoch: 5   Global Step: 70050   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:45:47,351-Speed 3367.46 samples/sec   Loss 6.3800   LearningRate 0.0515   Epoch: 5   Global Step: 70060   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:45:50,385-Speed 3377.17 samples/sec   Loss 6.3148   LearningRate 0.0515   Epoch: 5   Global Step: 70070   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:45:53,470-Speed 3320.08 samples/sec   Loss 6.3022   LearningRate 0.0515   Epoch: 5   Global Step: 70080   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:45:56,520-Speed 3358.43 samples/sec   Loss 6.3069   LearningRate 0.0515   Epoch: 5   Global Step: 70090   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:45:59,563-Speed 3366.00 samples/sec   Loss 6.2943   LearningRate 0.0515   Epoch: 5   Global Step: 70100   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:46:02,647-Speed 3321.68 samples/sec   Loss 6.4039   LearningRate 0.0515   Epoch: 5   Global Step: 70110   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:46:05,713-Speed 3341.15 samples/sec   Loss 6.3267   LearningRate 0.0515   Epoch: 5   Global Step: 70120   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:46:08,726-Speed 3399.28 samples/sec   Loss 6.2384   LearningRate 0.0515   Epoch: 5   Global Step: 70130   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:46:11,802-Speed 3329.24 samples/sec   Loss 6.3296   LearningRate 0.0515   Epoch: 5   Global Step: 70140   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:46:14,941-Speed 3263.77 samples/sec   Loss 6.3765   LearningRate 0.0515   Epoch: 5   Global Step: 70150   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:46:18,020-Speed 3326.24 samples/sec   Loss 6.2960   LearningRate 0.0515   Epoch: 5   Global Step: 70160   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:46:21,069-Speed 3360.09 samples/sec   Loss 6.3980   LearningRate 0.0515   Epoch: 5   Global Step: 70170   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:46:24,164-Speed 3308.95 samples/sec   Loss 6.3156   LearningRate 0.0515   Epoch: 5   Global Step: 70180   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:46:27,241-Speed 3329.63 samples/sec   Loss 6.2820   LearningRate 0.0515   Epoch: 5   Global Step: 70190   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:46:30,371-Speed 3272.42 samples/sec   Loss 6.2953   LearningRate 0.0515   Epoch: 5   Global Step: 70200   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:46:33,509-Speed 3264.92 samples/sec   Loss 6.2845   LearningRate 0.0515   Epoch: 5   Global Step: 70210   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:46:36,606-Speed 3306.54 samples/sec   Loss 6.3419   LearningRate 0.0515   Epoch: 5   Global Step: 70220   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:46:39,721-Speed 3288.84 samples/sec   Loss 6.3176   LearningRate 0.0515   Epoch: 5   Global Step: 70230   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:46:42,813-Speed 3312.82 samples/sec   Loss 6.2987   LearningRate 0.0514   Epoch: 5   Global Step: 70240   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:46:45,849-Speed 3374.21 samples/sec   Loss 6.2770   LearningRate 0.0514   Epoch: 5   Global Step: 70250   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:46:49,066-Speed 3184.07 samples/sec   Loss 6.4098   LearningRate 0.0514   Epoch: 5   Global Step: 70260   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:46:52,173-Speed 3296.30 samples/sec   Loss 6.4091   LearningRate 0.0514   Epoch: 5   Global Step: 70270   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:46:55,266-Speed 3311.91 samples/sec   Loss 6.4047   LearningRate 0.0514   Epoch: 5   Global Step: 70280   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:46:58,323-Speed 3351.64 samples/sec   Loss 6.3044   LearningRate 0.0514   Epoch: 5   Global Step: 70290   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:47:01,445-Speed 3280.24 samples/sec   Loss 6.3003   LearningRate 0.0514   Epoch: 5   Global Step: 70300   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:47:04,518-Speed 3333.29 samples/sec   Loss 6.2659   LearningRate 0.0514   Epoch: 5   Global Step: 70310   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:47:07,598-Speed 3325.47 samples/sec   Loss 6.2577   LearningRate 0.0514   Epoch: 5   Global Step: 70320   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:47:10,644-Speed 3362.88 samples/sec   Loss 6.3115   LearningRate 0.0514   Epoch: 5   Global Step: 70330   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:47:13,749-Speed 3299.76 samples/sec   Loss 6.3903   LearningRate 0.0514   Epoch: 5   Global Step: 70340   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:47:16,857-Speed 3295.77 samples/sec   Loss 6.2195   LearningRate 0.0514   Epoch: 5   Global Step: 70350   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:47:19,937-Speed 3326.03 samples/sec   Loss 6.3872   LearningRate 0.0514   Epoch: 5   Global Step: 70360   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:47:23,034-Speed 3307.58 samples/sec   Loss 6.4043   LearningRate 0.0514   Epoch: 5   Global Step: 70370   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:47:26,136-Speed 3302.23 samples/sec   Loss 6.3928   LearningRate 0.0514   Epoch: 5   Global Step: 70380   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:47:29,217-Speed 3324.06 samples/sec   Loss 6.3090   LearningRate 0.0514   Epoch: 5   Global Step: 70390   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:47:32,314-Speed 3307.18 samples/sec   Loss 6.3580   LearningRate 0.0514   Epoch: 5   Global Step: 70400   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:47:35,449-Speed 3267.85 samples/sec   Loss 6.3956   LearningRate 0.0513   Epoch: 5   Global Step: 70410   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:47:38,548-Speed 3305.50 samples/sec   Loss 6.3370   LearningRate 0.0513   Epoch: 5   Global Step: 70420   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:47:41,650-Speed 3301.23 samples/sec   Loss 6.3524   LearningRate 0.0513   Epoch: 5   Global Step: 70430   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:47:44,734-Speed 3322.04 samples/sec   Loss 6.4170   LearningRate 0.0513   Epoch: 5   Global Step: 70440   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:47:47,774-Speed 3369.69 samples/sec   Loss 6.3354   LearningRate 0.0513   Epoch: 5   Global Step: 70450   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 07:47:50,837-Speed 3343.64 samples/sec   Loss 6.3701   LearningRate 0.0513   Epoch: 5   Global Step: 70460   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 07:47:53,930-Speed 3312.19 samples/sec   Loss 6.4556   LearningRate 0.0513   Epoch: 5   Global Step: 70470   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:47:56,932-Speed 3412.37 samples/sec   Loss 6.3080   LearningRate 0.0513   Epoch: 5   Global Step: 70480   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:48:00,095-Speed 3238.53 samples/sec   Loss 6.3042   LearningRate 0.0513   Epoch: 5   Global Step: 70490   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:48:03,203-Speed 3295.54 samples/sec   Loss 6.3738   LearningRate 0.0513   Epoch: 5   Global Step: 70500   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:48:06,311-Speed 3296.07 samples/sec   Loss 6.2716   LearningRate 0.0513   Epoch: 5   Global Step: 70510   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:48:09,349-Speed 3371.19 samples/sec   Loss 6.3380   LearningRate 0.0513   Epoch: 5   Global Step: 70520   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:48:12,398-Speed 3360.03 samples/sec   Loss 6.3171   LearningRate 0.0513   Epoch: 5   Global Step: 70530   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:48:15,498-Speed 3305.56 samples/sec   Loss 6.3537   LearningRate 0.0513   Epoch: 5   Global Step: 70540   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:48:18,652-Speed 3246.70 samples/sec   Loss 6.3712   LearningRate 0.0513   Epoch: 5   Global Step: 70550   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:48:21,772-Speed 3283.44 samples/sec   Loss 6.3740   LearningRate 0.0513   Epoch: 5   Global Step: 70560   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:48:24,951-Speed 3221.68 samples/sec   Loss 6.3141   LearningRate 0.0513   Epoch: 5   Global Step: 70570   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:48:28,137-Speed 3215.71 samples/sec   Loss 6.3325   LearningRate 0.0512   Epoch: 5   Global Step: 70580   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:48:31,247-Speed 3292.60 samples/sec   Loss 6.3523   LearningRate 0.0512   Epoch: 5   Global Step: 70590   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:48:34,314-Speed 3340.43 samples/sec   Loss 6.3207   LearningRate 0.0512   Epoch: 5   Global Step: 70600   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:48:37,355-Speed 3368.20 samples/sec   Loss 6.2001   LearningRate 0.0512   Epoch: 5   Global Step: 70610   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:48:40,422-Speed 3340.21 samples/sec   Loss 6.3192   LearningRate 0.0512   Epoch: 5   Global Step: 70620   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:48:43,459-Speed 3373.29 samples/sec   Loss 6.3167   LearningRate 0.0512   Epoch: 5   Global Step: 70630   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:48:46,542-Speed 3321.70 samples/sec   Loss 6.3231   LearningRate 0.0512   Epoch: 5   Global Step: 70640   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:48:49,603-Speed 3346.45 samples/sec   Loss 6.4258   LearningRate 0.0512   Epoch: 5   Global Step: 70650   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:48:52,711-Speed 3296.60 samples/sec   Loss 6.1784   LearningRate 0.0512   Epoch: 5   Global Step: 70660   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:48:55,760-Speed 3359.37 samples/sec   Loss 6.2886   LearningRate 0.0512   Epoch: 5   Global Step: 70670   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:48:58,814-Speed 3353.78 samples/sec   Loss 6.3862   LearningRate 0.0512   Epoch: 5   Global Step: 70680   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:49:01,900-Speed 3319.34 samples/sec   Loss 6.2785   LearningRate 0.0512   Epoch: 5   Global Step: 70690   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:49:04,992-Speed 3312.98 samples/sec   Loss 6.2264   LearningRate 0.0512   Epoch: 5   Global Step: 70700   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:49:08,064-Speed 3334.44 samples/sec   Loss 6.3462   LearningRate 0.0512   Epoch: 5   Global Step: 70710   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:49:11,145-Speed 3324.36 samples/sec   Loss 6.3920   LearningRate 0.0512   Epoch: 5   Global Step: 70720   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:49:14,201-Speed 3352.37 samples/sec   Loss 6.3978   LearningRate 0.0512   Epoch: 5   Global Step: 70730   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:49:17,259-Speed 3348.92 samples/sec   Loss 6.3390   LearningRate 0.0512   Epoch: 5   Global Step: 70740   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:49:20,341-Speed 3324.30 samples/sec   Loss 6.2726   LearningRate 0.0512   Epoch: 5   Global Step: 70750   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:49:23,416-Speed 3331.05 samples/sec   Loss 6.2496   LearningRate 0.0511   Epoch: 5   Global Step: 70760   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:49:26,474-Speed 3350.25 samples/sec   Loss 6.3607   LearningRate 0.0511   Epoch: 5   Global Step: 70770   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:49:29,534-Speed 3346.77 samples/sec   Loss 6.3295   LearningRate 0.0511   Epoch: 5   Global Step: 70780   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:49:32,590-Speed 3352.04 samples/sec   Loss 6.2617   LearningRate 0.0511   Epoch: 5   Global Step: 70790   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:49:35,613-Speed 3388.62 samples/sec   Loss 6.2282   LearningRate 0.0511   Epoch: 5   Global Step: 70800   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:49:38,719-Speed 3298.12 samples/sec   Loss 6.3500   LearningRate 0.0511   Epoch: 5   Global Step: 70810   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:49:41,816-Speed 3306.78 samples/sec   Loss 6.2885   LearningRate 0.0511   Epoch: 5   Global Step: 70820   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:49:44,865-Speed 3360.16 samples/sec   Loss 6.3708   LearningRate 0.0511   Epoch: 5   Global Step: 70830   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:49:47,917-Speed 3356.07 samples/sec   Loss 6.3294   LearningRate 0.0511   Epoch: 5   Global Step: 70840   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:49:51,043-Speed 3276.64 samples/sec   Loss 6.3596   LearningRate 0.0511   Epoch: 5   Global Step: 70850   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:49:54,133-Speed 3315.16 samples/sec   Loss 6.3393   LearningRate 0.0511   Epoch: 5   Global Step: 70860   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:49:57,165-Speed 3378.17 samples/sec   Loss 6.3263   LearningRate 0.0511   Epoch: 5   Global Step: 70870   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:50:00,222-Speed 3350.78 samples/sec   Loss 6.3204   LearningRate 0.0511   Epoch: 5   Global Step: 70880   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:50:03,329-Speed 3297.01 samples/sec   Loss 6.2463   LearningRate 0.0511   Epoch: 5   Global Step: 70890   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:50:06,400-Speed 3335.13 samples/sec   Loss 6.3844   LearningRate 0.0511   Epoch: 5   Global Step: 70900   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:50:09,461-Speed 3346.78 samples/sec   Loss 6.3194   LearningRate 0.0511   Epoch: 5   Global Step: 70910   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:50:12,555-Speed 3310.80 samples/sec   Loss 6.3162   LearningRate 0.0511   Epoch: 5   Global Step: 70920   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:50:15,675-Speed 3283.21 samples/sec   Loss 6.1863   LearningRate 0.0510   Epoch: 5   Global Step: 70930   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:50:18,735-Speed 3346.84 samples/sec   Loss 6.2803   LearningRate 0.0510   Epoch: 5   Global Step: 70940   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:50:21,818-Speed 3322.45 samples/sec   Loss 6.4293   LearningRate 0.0510   Epoch: 5   Global Step: 70950   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:50:24,962-Speed 3258.62 samples/sec   Loss 6.3399   LearningRate 0.0510   Epoch: 5   Global Step: 70960   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:50:28,121-Speed 3241.97 samples/sec   Loss 6.3373   LearningRate 0.0510   Epoch: 5   Global Step: 70970   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:50:31,248-Speed 3275.56 samples/sec   Loss 6.4230   LearningRate 0.0510   Epoch: 5   Global Step: 70980   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:50:34,301-Speed 3355.13 samples/sec   Loss 6.3316   LearningRate 0.0510   Epoch: 5   Global Step: 70990   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:50:37,351-Speed 3358.45 samples/sec   Loss 6.2592   LearningRate 0.0510   Epoch: 5   Global Step: 71000   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:50:40,448-Speed 3307.29 samples/sec   Loss 6.3283   LearningRate 0.0510   Epoch: 5   Global Step: 71010   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:50:43,514-Speed 3341.11 samples/sec   Loss 6.2097   LearningRate 0.0510   Epoch: 5   Global Step: 71020   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:50:46,551-Speed 3373.05 samples/sec   Loss 6.2753   LearningRate 0.0510   Epoch: 5   Global Step: 71030   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:50:49,713-Speed 3238.96 samples/sec   Loss 6.3130   LearningRate 0.0510   Epoch: 5   Global Step: 71040   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:50:52,801-Speed 3317.12 samples/sec   Loss 6.3361   LearningRate 0.0510   Epoch: 5   Global Step: 71050   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:50:55,918-Speed 3286.63 samples/sec   Loss 6.3311   LearningRate 0.0510   Epoch: 5   Global Step: 71060   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:50:58,958-Speed 3369.04 samples/sec   Loss 6.3752   LearningRate 0.0510   Epoch: 5   Global Step: 71070   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:51:02,091-Speed 3269.57 samples/sec   Loss 6.3832   LearningRate 0.0510   Epoch: 5   Global Step: 71080   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:51:05,197-Speed 3299.56 samples/sec   Loss 6.2516   LearningRate 0.0510   Epoch: 5   Global Step: 71090   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:51:08,285-Speed 3316.53 samples/sec   Loss 6.3393   LearningRate 0.0509   Epoch: 5   Global Step: 71100   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:51:11,390-Speed 3299.02 samples/sec   Loss 6.3372   LearningRate 0.0509   Epoch: 5   Global Step: 71110   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:51:14,520-Speed 3273.39 samples/sec   Loss 6.3266   LearningRate 0.0509   Epoch: 5   Global Step: 71120   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:51:17,569-Speed 3359.08 samples/sec   Loss 6.2834   LearningRate 0.0509   Epoch: 5   Global Step: 71130   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:51:20,675-Speed 3298.12 samples/sec   Loss 6.3149   LearningRate 0.0509   Epoch: 5   Global Step: 71140   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:51:23,712-Speed 3372.71 samples/sec   Loss 6.4149   LearningRate 0.0509   Epoch: 5   Global Step: 71150   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:51:26,792-Speed 3325.69 samples/sec   Loss 6.3451   LearningRate 0.0509   Epoch: 5   Global Step: 71160   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:51:29,877-Speed 3319.86 samples/sec   Loss 6.4122   LearningRate 0.0509   Epoch: 5   Global Step: 71170   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:51:32,969-Speed 3312.99 samples/sec   Loss 6.2915   LearningRate 0.0509   Epoch: 5   Global Step: 71180   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:51:36,125-Speed 3245.82 samples/sec   Loss 6.3256   LearningRate 0.0509   Epoch: 5   Global Step: 71190   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:51:39,263-Speed 3264.11 samples/sec   Loss 6.2986   LearningRate 0.0509   Epoch: 5   Global Step: 71200   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:51:42,353-Speed 3315.29 samples/sec   Loss 6.3080   LearningRate 0.0509   Epoch: 5   Global Step: 71210   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:51:45,468-Speed 3288.31 samples/sec   Loss 6.2455   LearningRate 0.0509   Epoch: 5   Global Step: 71220   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:51:48,644-Speed 3225.00 samples/sec   Loss 6.3497   LearningRate 0.0509   Epoch: 5   Global Step: 71230   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:51:51,845-Speed 3199.63 samples/sec   Loss 6.3398   LearningRate 0.0509   Epoch: 5   Global Step: 71240   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:51:55,023-Speed 3223.59 samples/sec   Loss 6.3982   LearningRate 0.0509   Epoch: 5   Global Step: 71250   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:51:58,093-Speed 3336.25 samples/sec   Loss 6.4150   LearningRate 0.0509   Epoch: 5   Global Step: 71260   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:52:01,192-Speed 3305.25 samples/sec   Loss 6.2611   LearningRate 0.0509   Epoch: 5   Global Step: 71270   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:52:04,302-Speed 3294.35 samples/sec   Loss 6.3350   LearningRate 0.0508   Epoch: 5   Global Step: 71280   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:52:07,383-Speed 3324.98 samples/sec   Loss 6.2541   LearningRate 0.0508   Epoch: 5   Global Step: 71290   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:52:10,426-Speed 3365.55 samples/sec   Loss 6.4616   LearningRate 0.0508   Epoch: 5   Global Step: 71300   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:52:13,486-Speed 3347.24 samples/sec   Loss 6.3031   LearningRate 0.0508   Epoch: 5   Global Step: 71310   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:52:16,609-Speed 3279.94 samples/sec   Loss 6.4573   LearningRate 0.0508   Epoch: 5   Global Step: 71320   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:52:19,670-Speed 3346.48 samples/sec   Loss 6.3007   LearningRate 0.0508   Epoch: 5   Global Step: 71330   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:52:22,736-Speed 3341.18 samples/sec   Loss 6.4501   LearningRate 0.0508   Epoch: 5   Global Step: 71340   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:52:25,855-Speed 3283.78 samples/sec   Loss 6.3387   LearningRate 0.0508   Epoch: 5   Global Step: 71350   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:52:28,973-Speed 3285.73 samples/sec   Loss 6.3404   LearningRate 0.0508   Epoch: 5   Global Step: 71360   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:52:32,104-Speed 3271.57 samples/sec   Loss 6.2265   LearningRate 0.0508   Epoch: 5   Global Step: 71370   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:52:35,152-Speed 3360.58 samples/sec   Loss 6.3480   LearningRate 0.0508   Epoch: 5   Global Step: 71380   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:52:38,272-Speed 3282.34 samples/sec   Loss 6.2969   LearningRate 0.0508   Epoch: 5   Global Step: 71390   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:52:41,482-Speed 3191.11 samples/sec   Loss 6.3485   LearningRate 0.0508   Epoch: 5   Global Step: 71400   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:52:44,580-Speed 3306.69 samples/sec   Loss 6.2434   LearningRate 0.0508   Epoch: 5   Global Step: 71410   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:52:47,715-Speed 3267.06 samples/sec   Loss 6.4279   LearningRate 0.0508   Epoch: 5   Global Step: 71420   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:52:50,825-Speed 3294.55 samples/sec   Loss 6.2345   LearningRate 0.0508   Epoch: 5   Global Step: 71430   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:52:53,884-Speed 3347.63 samples/sec   Loss 6.3931   LearningRate 0.0508   Epoch: 5   Global Step: 71440   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:52:56,978-Speed 3311.32 samples/sec   Loss 6.4046   LearningRate 0.0507   Epoch: 5   Global Step: 71450   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:53:00,035-Speed 3350.94 samples/sec   Loss 6.3182   LearningRate 0.0507   Epoch: 5   Global Step: 71460   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:53:03,134-Speed 3304.60 samples/sec   Loss 6.4443   LearningRate 0.0507   Epoch: 5   Global Step: 71470   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:53:06,214-Speed 3326.13 samples/sec   Loss 6.2700   LearningRate 0.0507   Epoch: 5   Global Step: 71480   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:53:09,215-Speed 3413.31 samples/sec   Loss 6.2995   LearningRate 0.0507   Epoch: 5   Global Step: 71490   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:53:12,342-Speed 3275.03 samples/sec   Loss 6.2938   LearningRate 0.0507   Epoch: 5   Global Step: 71500   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:53:15,467-Speed 3278.64 samples/sec   Loss 6.3036   LearningRate 0.0507   Epoch: 5   Global Step: 71510   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:53:18,536-Speed 3337.12 samples/sec   Loss 6.3691   LearningRate 0.0507   Epoch: 5   Global Step: 71520   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:53:21,590-Speed 3353.66 samples/sec   Loss 6.2682   LearningRate 0.0507   Epoch: 5   Global Step: 71530   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:53:24,690-Speed 3304.79 samples/sec   Loss 6.2530   LearningRate 0.0507   Epoch: 5   Global Step: 71540   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:53:27,767-Speed 3329.31 samples/sec   Loss 6.3906   LearningRate 0.0507   Epoch: 5   Global Step: 71550   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:53:30,906-Speed 3262.48 samples/sec   Loss 6.3562   LearningRate 0.0507   Epoch: 5   Global Step: 71560   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:53:33,961-Speed 3353.68 samples/sec   Loss 6.3272   LearningRate 0.0507   Epoch: 5   Global Step: 71570   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:53:37,064-Speed 3301.20 samples/sec   Loss 6.3495   LearningRate 0.0507   Epoch: 5   Global Step: 71580   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:53:40,173-Speed 3294.27 samples/sec   Loss 6.4176   LearningRate 0.0507   Epoch: 5   Global Step: 71590   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:53:43,271-Speed 3306.47 samples/sec   Loss 6.2815   LearningRate 0.0507   Epoch: 5   Global Step: 71600   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:53:46,342-Speed 3335.03 samples/sec   Loss 6.3148   LearningRate 0.0507   Epoch: 5   Global Step: 71610   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:53:49,487-Speed 3257.18 samples/sec   Loss 6.1617   LearningRate 0.0507   Epoch: 5   Global Step: 71620   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:53:52,610-Speed 3279.76 samples/sec   Loss 6.2961   LearningRate 0.0506   Epoch: 5   Global Step: 71630   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:53:55,719-Speed 3295.01 samples/sec   Loss 6.2376   LearningRate 0.0506   Epoch: 5   Global Step: 71640   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:53:58,779-Speed 3347.31 samples/sec   Loss 6.4514   LearningRate 0.0506   Epoch: 5   Global Step: 71650   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:54:01,879-Speed 3303.95 samples/sec   Loss 6.2452   LearningRate 0.0506   Epoch: 5   Global Step: 71660   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:54:05,048-Speed 3232.83 samples/sec   Loss 6.3308   LearningRate 0.0506   Epoch: 5   Global Step: 71670   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:54:08,125-Speed 3328.90 samples/sec   Loss 6.2047   LearningRate 0.0506   Epoch: 5   Global Step: 71680   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:54:11,257-Speed 3270.26 samples/sec   Loss 6.4266   LearningRate 0.0506   Epoch: 5   Global Step: 71690   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:54:14,352-Speed 3310.09 samples/sec   Loss 6.2739   LearningRate 0.0506   Epoch: 5   Global Step: 71700   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:54:17,527-Speed 3226.23 samples/sec   Loss 6.3337   LearningRate 0.0506   Epoch: 5   Global Step: 71710   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:54:20,624-Speed 3307.02 samples/sec   Loss 6.3120   LearningRate 0.0506   Epoch: 5   Global Step: 71720   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:54:23,701-Speed 3329.35 samples/sec   Loss 6.2235   LearningRate 0.0506   Epoch: 5   Global Step: 71730   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:54:26,840-Speed 3263.65 samples/sec   Loss 6.2666   LearningRate 0.0506   Epoch: 5   Global Step: 71740   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:54:29,932-Speed 3312.52 samples/sec   Loss 6.1813   LearningRate 0.0506   Epoch: 5   Global Step: 71750   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:54:32,974-Speed 3367.43 samples/sec   Loss 6.2980   LearningRate 0.0506   Epoch: 5   Global Step: 71760   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:54:36,039-Speed 3342.01 samples/sec   Loss 6.3193   LearningRate 0.0506   Epoch: 5   Global Step: 71770   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:54:39,175-Speed 3266.10 samples/sec   Loss 6.2392   LearningRate 0.0506   Epoch: 5   Global Step: 71780   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:54:42,294-Speed 3284.02 samples/sec   Loss 6.4058   LearningRate 0.0506   Epoch: 5   Global Step: 71790   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:54:45,325-Speed 3379.26 samples/sec   Loss 6.1296   LearningRate 0.0505   Epoch: 5   Global Step: 71800   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:54:48,393-Speed 3338.87 samples/sec   Loss 6.1958   LearningRate 0.0505   Epoch: 5   Global Step: 71810   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:54:51,505-Speed 3292.08 samples/sec   Loss 6.3196   LearningRate 0.0505   Epoch: 5   Global Step: 71820   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:54:54,614-Speed 3294.80 samples/sec   Loss 6.3076   LearningRate 0.0505   Epoch: 5   Global Step: 71830   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:54:57,675-Speed 3345.73 samples/sec   Loss 6.2678   LearningRate 0.0505   Epoch: 5   Global Step: 71840   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:55:00,784-Speed 3294.72 samples/sec   Loss 6.2466   LearningRate 0.0505   Epoch: 5   Global Step: 71850   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:55:03,876-Speed 3313.55 samples/sec   Loss 6.3136   LearningRate 0.0505   Epoch: 5   Global Step: 71860   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:55:06,962-Speed 3318.83 samples/sec   Loss 6.2742   LearningRate 0.0505   Epoch: 5   Global Step: 71870   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:55:10,029-Speed 3340.66 samples/sec   Loss 6.2142   LearningRate 0.0505   Epoch: 5   Global Step: 71880   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:55:13,106-Speed 3328.32 samples/sec   Loss 6.3531   LearningRate 0.0505   Epoch: 5   Global Step: 71890   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:55:16,182-Speed 3330.77 samples/sec   Loss 6.3577   LearningRate 0.0505   Epoch: 5   Global Step: 71900   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:55:19,248-Speed 3340.61 samples/sec   Loss 6.3313   LearningRate 0.0505   Epoch: 5   Global Step: 71910   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:55:22,295-Speed 3362.36 samples/sec   Loss 6.2683   LearningRate 0.0505   Epoch: 5   Global Step: 71920   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:55:25,380-Speed 3320.07 samples/sec   Loss 6.3461   LearningRate 0.0505   Epoch: 5   Global Step: 71930   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:55:28,412-Speed 3378.86 samples/sec   Loss 6.2236   LearningRate 0.0505   Epoch: 5   Global Step: 71940   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:55:31,462-Speed 3359.01 samples/sec   Loss 6.3236   LearningRate 0.0505   Epoch: 5   Global Step: 71950   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:55:34,516-Speed 3353.79 samples/sec   Loss 6.3273   LearningRate 0.0505   Epoch: 5   Global Step: 71960   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:55:37,604-Speed 3317.56 samples/sec   Loss 6.3423   LearningRate 0.0505   Epoch: 5   Global Step: 71970   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:55:40,670-Speed 3340.25 samples/sec   Loss 6.2393   LearningRate 0.0504   Epoch: 5   Global Step: 71980   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:55:43,759-Speed 3315.47 samples/sec   Loss 6.3654   LearningRate 0.0504   Epoch: 5   Global Step: 71990   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:55:46,806-Speed 3362.32 samples/sec   Loss 6.2894   LearningRate 0.0504   Epoch: 5   Global Step: 72000   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:55:49,931-Speed 3277.65 samples/sec   Loss 6.4130   LearningRate 0.0504   Epoch: 5   Global Step: 72010   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:55:53,041-Speed 3293.73 samples/sec   Loss 6.4319   LearningRate 0.0504   Epoch: 5   Global Step: 72020   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:55:56,141-Speed 3306.42 samples/sec   Loss 6.4728   LearningRate 0.0504   Epoch: 5   Global Step: 72030   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:55:59,225-Speed 3321.00 samples/sec   Loss 6.3150   LearningRate 0.0504   Epoch: 5   Global Step: 72040   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:56:02,312-Speed 3318.71 samples/sec   Loss 6.2894   LearningRate 0.0504   Epoch: 5   Global Step: 72050   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:56:05,425-Speed 3290.37 samples/sec   Loss 6.4020   LearningRate 0.0504   Epoch: 5   Global Step: 72060   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:56:08,494-Speed 3337.84 samples/sec   Loss 6.2806   LearningRate 0.0504   Epoch: 5   Global Step: 72070   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:56:11,624-Speed 3272.24 samples/sec   Loss 6.2552   LearningRate 0.0504   Epoch: 5   Global Step: 72080   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:56:14,780-Speed 3245.01 samples/sec   Loss 6.4009   LearningRate 0.0504   Epoch: 5   Global Step: 72090   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:56:17,965-Speed 3216.57 samples/sec   Loss 6.2258   LearningRate 0.0504   Epoch: 5   Global Step: 72100   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:56:21,028-Speed 3344.14 samples/sec   Loss 6.3147   LearningRate 0.0504   Epoch: 5   Global Step: 72110   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:56:24,110-Speed 3323.81 samples/sec   Loss 6.2886   LearningRate 0.0504   Epoch: 5   Global Step: 72120   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:56:27,225-Speed 3288.83 samples/sec   Loss 6.3037   LearningRate 0.0504   Epoch: 5   Global Step: 72130   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:56:30,286-Speed 3346.00 samples/sec   Loss 6.3011   LearningRate 0.0504   Epoch: 5   Global Step: 72140   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:56:33,333-Speed 3361.73 samples/sec   Loss 6.3302   LearningRate 0.0503   Epoch: 5   Global Step: 72150   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:56:36,398-Speed 3341.64 samples/sec   Loss 6.3550   LearningRate 0.0503   Epoch: 5   Global Step: 72160   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:56:39,511-Speed 3290.65 samples/sec   Loss 6.2910   LearningRate 0.0503   Epoch: 5   Global Step: 72170   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:56:42,637-Speed 3277.00 samples/sec   Loss 6.2859   LearningRate 0.0503   Epoch: 5   Global Step: 72180   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:56:45,700-Speed 3343.96 samples/sec   Loss 6.3600   LearningRate 0.0503   Epoch: 5   Global Step: 72190   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:56:48,773-Speed 3333.82 samples/sec   Loss 6.2625   LearningRate 0.0503   Epoch: 5   Global Step: 72200   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:56:51,884-Speed 3291.55 samples/sec   Loss 6.3771   LearningRate 0.0503   Epoch: 5   Global Step: 72210   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:56:55,079-Speed 3207.22 samples/sec   Loss 6.2553   LearningRate 0.0503   Epoch: 5   Global Step: 72220   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:56:58,136-Speed 3350.30 samples/sec   Loss 6.2683   LearningRate 0.0503   Epoch: 5   Global Step: 72230   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:57:01,183-Speed 3361.50 samples/sec   Loss 6.2306   LearningRate 0.0503   Epoch: 5   Global Step: 72240   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:57:04,325-Speed 3259.61 samples/sec   Loss 6.3706   LearningRate 0.0503   Epoch: 5   Global Step: 72250   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:57:07,408-Speed 3323.11 samples/sec   Loss 6.2289   LearningRate 0.0503   Epoch: 5   Global Step: 72260   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:57:10,462-Speed 3354.10 samples/sec   Loss 6.3350   LearningRate 0.0503   Epoch: 5   Global Step: 72270   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:57:13,530-Speed 3337.65 samples/sec   Loss 6.2897   LearningRate 0.0503   Epoch: 5   Global Step: 72280   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:57:16,666-Speed 3266.41 samples/sec   Loss 6.3339   LearningRate 0.0503   Epoch: 5   Global Step: 72290   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 07:57:19,800-Speed 3268.68 samples/sec   Loss 6.2252   LearningRate 0.0503   Epoch: 5   Global Step: 72300   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:57:22,906-Speed 3298.26 samples/sec   Loss 6.2570   LearningRate 0.0503   Epoch: 5   Global Step: 72310   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:57:25,969-Speed 3343.81 samples/sec   Loss 6.2935   LearningRate 0.0503   Epoch: 5   Global Step: 72320   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:57:29,069-Speed 3304.84 samples/sec   Loss 6.2629   LearningRate 0.0502   Epoch: 5   Global Step: 72330   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:57:32,140-Speed 3334.92 samples/sec   Loss 6.2972   LearningRate 0.0502   Epoch: 5   Global Step: 72340   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:57:35,211-Speed 3335.54 samples/sec   Loss 6.2868   LearningRate 0.0502   Epoch: 5   Global Step: 72350   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:57:38,284-Speed 3333.00 samples/sec   Loss 6.2205   LearningRate 0.0502   Epoch: 5   Global Step: 72360   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:57:41,356-Speed 3334.41 samples/sec   Loss 6.3646   LearningRate 0.0502   Epoch: 5   Global Step: 72370   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:57:44,424-Speed 3339.19 samples/sec   Loss 6.3295   LearningRate 0.0502   Epoch: 5   Global Step: 72380   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:57:47,530-Speed 3297.71 samples/sec   Loss 6.2792   LearningRate 0.0502   Epoch: 5   Global Step: 72390   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:57:50,623-Speed 3312.08 samples/sec   Loss 6.3407   LearningRate 0.0502   Epoch: 5   Global Step: 72400   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:57:53,697-Speed 3332.16 samples/sec   Loss 6.3138   LearningRate 0.0502   Epoch: 5   Global Step: 72410   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:57:56,776-Speed 3326.53 samples/sec   Loss 6.2695   LearningRate 0.0502   Epoch: 5   Global Step: 72420   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:57:59,929-Speed 3248.50 samples/sec   Loss 6.2568   LearningRate 0.0502   Epoch: 5   Global Step: 72430   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:58:03,090-Speed 3240.36 samples/sec   Loss 6.2650   LearningRate 0.0502   Epoch: 5   Global Step: 72440   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:58:06,192-Speed 3301.83 samples/sec   Loss 6.4949   LearningRate 0.0502   Epoch: 5   Global Step: 72450   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:58:09,309-Speed 3286.71 samples/sec   Loss 6.2381   LearningRate 0.0502   Epoch: 5   Global Step: 72460   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:58:12,382-Speed 3332.60 samples/sec   Loss 6.3514   LearningRate 0.0502   Epoch: 5   Global Step: 72470   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:58:15,586-Speed 3197.18 samples/sec   Loss 6.2856   LearningRate 0.0502   Epoch: 5   Global Step: 72480   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:58:18,674-Speed 3316.76 samples/sec   Loss 6.2849   LearningRate 0.0502   Epoch: 5   Global Step: 72490   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:58:21,760-Speed 3319.55 samples/sec   Loss 6.3438   LearningRate 0.0501   Epoch: 5   Global Step: 72500   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:58:24,880-Speed 3282.94 samples/sec   Loss 6.2794   LearningRate 0.0501   Epoch: 5   Global Step: 72510   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:58:28,017-Speed 3265.25 samples/sec   Loss 6.3387   LearningRate 0.0501   Epoch: 5   Global Step: 72520   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:58:31,187-Speed 3230.94 samples/sec   Loss 6.3165   LearningRate 0.0501   Epoch: 5   Global Step: 72530   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:58:34,240-Speed 3355.93 samples/sec   Loss 6.2784   LearningRate 0.0501   Epoch: 5   Global Step: 72540   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:58:37,408-Speed 3232.76 samples/sec   Loss 6.2534   LearningRate 0.0501   Epoch: 5   Global Step: 72550   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:58:40,492-Speed 3321.46 samples/sec   Loss 6.2200   LearningRate 0.0501   Epoch: 5   Global Step: 72560   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:58:43,592-Speed 3304.22 samples/sec   Loss 6.1647   LearningRate 0.0501   Epoch: 5   Global Step: 72570   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:58:46,676-Speed 3321.66 samples/sec   Loss 6.3135   LearningRate 0.0501   Epoch: 5   Global Step: 72580   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:58:49,780-Speed 3299.51 samples/sec   Loss 6.3593   LearningRate 0.0501   Epoch: 5   Global Step: 72590   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:58:52,888-Speed 3295.43 samples/sec   Loss 6.3034   LearningRate 0.0501   Epoch: 5   Global Step: 72600   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:58:55,947-Speed 3349.08 samples/sec   Loss 6.2809   LearningRate 0.0501   Epoch: 5   Global Step: 72610   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:58:59,029-Speed 3323.84 samples/sec   Loss 6.2639   LearningRate 0.0501   Epoch: 5   Global Step: 72620   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:59:02,220-Speed 3209.74 samples/sec   Loss 6.3119   LearningRate 0.0501   Epoch: 5   Global Step: 72630   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:59:05,335-Speed 3288.81 samples/sec   Loss 6.3411   LearningRate 0.0501   Epoch: 5   Global Step: 72640   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:59:08,417-Speed 3323.72 samples/sec   Loss 6.3416   LearningRate 0.0501   Epoch: 5   Global Step: 72650   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:59:11,499-Speed 3323.53 samples/sec   Loss 6.2724   LearningRate 0.0501   Epoch: 5   Global Step: 72660   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:59:14,617-Speed 3284.77 samples/sec   Loss 6.2588   LearningRate 0.0501   Epoch: 5   Global Step: 72670   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:59:17,791-Speed 3228.06 samples/sec   Loss 6.2634   LearningRate 0.0500   Epoch: 5   Global Step: 72680   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:59:20,859-Speed 3337.92 samples/sec   Loss 6.2916   LearningRate 0.0500   Epoch: 5   Global Step: 72690   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:59:23,986-Speed 3276.36 samples/sec   Loss 6.3505   LearningRate 0.0500   Epoch: 5   Global Step: 72700   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 07:59:27,153-Speed 3234.22 samples/sec   Loss 6.4173   LearningRate 0.0500   Epoch: 5   Global Step: 72710   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:59:30,244-Speed 3313.71 samples/sec   Loss 6.3127   LearningRate 0.0500   Epoch: 5   Global Step: 72720   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:59:33,327-Speed 3321.75 samples/sec   Loss 6.3922   LearningRate 0.0500   Epoch: 5   Global Step: 72730   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:59:36,376-Speed 3360.69 samples/sec   Loss 6.3293   LearningRate 0.0500   Epoch: 5   Global Step: 72740   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:59:39,482-Speed 3297.27 samples/sec   Loss 6.3645   LearningRate 0.0500   Epoch: 5   Global Step: 72750   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:59:42,589-Speed 3296.28 samples/sec   Loss 6.3261   LearningRate 0.0500   Epoch: 5   Global Step: 72760   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:59:45,659-Speed 3337.01 samples/sec   Loss 6.3679   LearningRate 0.0500   Epoch: 5   Global Step: 72770   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:59:48,790-Speed 3271.26 samples/sec   Loss 6.2195   LearningRate 0.0500   Epoch: 5   Global Step: 72780   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:59:51,954-Speed 3237.19 samples/sec   Loss 6.3138   LearningRate 0.0500   Epoch: 5   Global Step: 72790   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:59:55,053-Speed 3305.76 samples/sec   Loss 6.2513   LearningRate 0.0500   Epoch: 5   Global Step: 72800   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 07:59:58,090-Speed 3372.46 samples/sec   Loss 6.1813   LearningRate 0.0500   Epoch: 5   Global Step: 72810   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:00:01,140-Speed 3358.69 samples/sec   Loss 6.2094   LearningRate 0.0500   Epoch: 5   Global Step: 72820   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:00:04,208-Speed 3338.41 samples/sec   Loss 6.2608   LearningRate 0.0500   Epoch: 5   Global Step: 72830   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:00:07,269-Speed 3347.49 samples/sec   Loss 6.3219   LearningRate 0.0500   Epoch: 5   Global Step: 72840   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:00:10,327-Speed 3349.65 samples/sec   Loss 6.3747   LearningRate 0.0499   Epoch: 5   Global Step: 72850   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:00:13,437-Speed 3292.52 samples/sec   Loss 6.2975   LearningRate 0.0499   Epoch: 5   Global Step: 72860   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:00:16,507-Speed 3336.34 samples/sec   Loss 6.2618   LearningRate 0.0499   Epoch: 5   Global Step: 72870   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:00:19,585-Speed 3328.21 samples/sec   Loss 6.2139   LearningRate 0.0499   Epoch: 5   Global Step: 72880   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:00:22,677-Speed 3312.72 samples/sec   Loss 6.2852   LearningRate 0.0499   Epoch: 5   Global Step: 72890   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:00:25,794-Speed 3286.47 samples/sec   Loss 6.3636   LearningRate 0.0499   Epoch: 5   Global Step: 72900   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:00:28,856-Speed 3345.50 samples/sec   Loss 6.2480   LearningRate 0.0499   Epoch: 5   Global Step: 72910   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 08:00:31,980-Speed 3279.10 samples/sec   Loss 6.1731   LearningRate 0.0499   Epoch: 5   Global Step: 72920   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:00:35,169-Speed 3212.08 samples/sec   Loss 6.3054   LearningRate 0.0499   Epoch: 5   Global Step: 72930   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:00:38,269-Speed 3304.84 samples/sec   Loss 6.3081   LearningRate 0.0499   Epoch: 5   Global Step: 72940   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:00:41,397-Speed 3274.23 samples/sec   Loss 6.2684   LearningRate 0.0499   Epoch: 5   Global Step: 72950   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:00:44,461-Speed 3343.08 samples/sec   Loss 6.2990   LearningRate 0.0499   Epoch: 5   Global Step: 72960   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:00:47,533-Speed 3334.19 samples/sec   Loss 6.1646   LearningRate 0.0499   Epoch: 5   Global Step: 72970   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:00:50,641-Speed 3295.78 samples/sec   Loss 6.3666   LearningRate 0.0499   Epoch: 5   Global Step: 72980   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:00:53,763-Speed 3281.75 samples/sec   Loss 6.3265   LearningRate 0.0499   Epoch: 5   Global Step: 72990   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:00:56,823-Speed 3347.51 samples/sec   Loss 6.2656   LearningRate 0.0499   Epoch: 5   Global Step: 73000   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:00:59,891-Speed 3338.83 samples/sec   Loss 6.1653   LearningRate 0.0499   Epoch: 5   Global Step: 73010   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:01:03,157-Speed 3136.56 samples/sec   Loss 6.4417   LearningRate 0.0499   Epoch: 5   Global Step: 73020   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:01:06,243-Speed 3318.77 samples/sec   Loss 6.3665   LearningRate 0.0498   Epoch: 5   Global Step: 73030   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:01:09,298-Speed 3353.17 samples/sec   Loss 6.2862   LearningRate 0.0498   Epoch: 5   Global Step: 73040   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:01:12,474-Speed 3225.24 samples/sec   Loss 6.3002   LearningRate 0.0498   Epoch: 5   Global Step: 73050   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:01:15,587-Speed 3290.28 samples/sec   Loss 6.3379   LearningRate 0.0498   Epoch: 5   Global Step: 73060   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:01:18,685-Speed 3306.72 samples/sec   Loss 6.2308   LearningRate 0.0498   Epoch: 5   Global Step: 73070   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:01:21,781-Speed 3308.50 samples/sec   Loss 6.3083   LearningRate 0.0498   Epoch: 5   Global Step: 73080   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:01:24,923-Speed 3260.12 samples/sec   Loss 6.3065   LearningRate 0.0498   Epoch: 5   Global Step: 73090   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:01:28,007-Speed 3321.17 samples/sec   Loss 6.3973   LearningRate 0.0498   Epoch: 5   Global Step: 73100   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:01:31,082-Speed 3331.88 samples/sec   Loss 6.2530   LearningRate 0.0498   Epoch: 5   Global Step: 73110   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:01:34,181-Speed 3305.33 samples/sec   Loss 6.3576   LearningRate 0.0498   Epoch: 5   Global Step: 73120   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:01:37,336-Speed 3246.98 samples/sec   Loss 6.2302   LearningRate 0.0498   Epoch: 5   Global Step: 73130   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:01:40,449-Speed 3290.33 samples/sec   Loss 6.2738   LearningRate 0.0498   Epoch: 5   Global Step: 73140   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:01:43,599-Speed 3251.30 samples/sec   Loss 6.2988   LearningRate 0.0498   Epoch: 5   Global Step: 73150   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:01:46,666-Speed 3339.91 samples/sec   Loss 6.0998   LearningRate 0.0498   Epoch: 5   Global Step: 73160   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:01:49,736-Speed 3337.14 samples/sec   Loss 6.2387   LearningRate 0.0498   Epoch: 5   Global Step: 73170   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:01:52,832-Speed 3308.52 samples/sec   Loss 6.2364   LearningRate 0.0498   Epoch: 5   Global Step: 73180   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:01:55,951-Speed 3284.45 samples/sec   Loss 6.2026   LearningRate 0.0498   Epoch: 5   Global Step: 73190   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:01:59,033-Speed 3322.64 samples/sec   Loss 6.3697   LearningRate 0.0498   Epoch: 5   Global Step: 73200   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:02:02,237-Speed 3196.97 samples/sec   Loss 6.2902   LearningRate 0.0497   Epoch: 5   Global Step: 73210   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:02:05,369-Speed 3270.96 samples/sec   Loss 6.2131   LearningRate 0.0497   Epoch: 5   Global Step: 73220   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:02:08,459-Speed 3314.37 samples/sec   Loss 6.1600   LearningRate 0.0497   Epoch: 5   Global Step: 73230   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:02:11,548-Speed 3317.57 samples/sec   Loss 6.2844   LearningRate 0.0497   Epoch: 5   Global Step: 73240   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:02:14,730-Speed 3218.34 samples/sec   Loss 6.3148   LearningRate 0.0497   Epoch: 5   Global Step: 73250   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:02:17,837-Speed 3296.40 samples/sec   Loss 6.3010   LearningRate 0.0497   Epoch: 5   Global Step: 73260   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:02:20,944-Speed 3296.73 samples/sec   Loss 6.3671   LearningRate 0.0497   Epoch: 5   Global Step: 73270   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:02:24,007-Speed 3344.60 samples/sec   Loss 6.2645   LearningRate 0.0497   Epoch: 5   Global Step: 73280   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:02:27,101-Speed 3310.97 samples/sec   Loss 6.2652   LearningRate 0.0497   Epoch: 5   Global Step: 73290   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:02:30,202-Speed 3303.51 samples/sec   Loss 6.3802   LearningRate 0.0497   Epoch: 5   Global Step: 73300   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:02:33,310-Speed 3295.30 samples/sec   Loss 6.3517   LearningRate 0.0497   Epoch: 5   Global Step: 73310   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:02:36,421-Speed 3292.54 samples/sec   Loss 6.2593   LearningRate 0.0497   Epoch: 5   Global Step: 73320   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:02:39,513-Speed 3312.05 samples/sec   Loss 6.3278   LearningRate 0.0497   Epoch: 5   Global Step: 73330   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:02:42,630-Speed 3286.05 samples/sec   Loss 6.2773   LearningRate 0.0497   Epoch: 5   Global Step: 73340   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:02:45,698-Speed 3339.59 samples/sec   Loss 6.2582   LearningRate 0.0497   Epoch: 5   Global Step: 73350   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:02:48,778-Speed 3325.08 samples/sec   Loss 6.2736   LearningRate 0.0497   Epoch: 5   Global Step: 73360   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:02:51,934-Speed 3245.77 samples/sec   Loss 6.2241   LearningRate 0.0497   Epoch: 5   Global Step: 73370   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:02:54,997-Speed 3344.06 samples/sec   Loss 6.2900   LearningRate 0.0496   Epoch: 5   Global Step: 73380   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:02:58,067-Speed 3336.69 samples/sec   Loss 6.2165   LearningRate 0.0496   Epoch: 5   Global Step: 73390   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:03:01,214-Speed 3255.24 samples/sec   Loss 6.4148   LearningRate 0.0496   Epoch: 5   Global Step: 73400   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:03:04,324-Speed 3293.33 samples/sec   Loss 6.2760   LearningRate 0.0496   Epoch: 5   Global Step: 73410   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:03:07,476-Speed 3249.55 samples/sec   Loss 6.2413   LearningRate 0.0496   Epoch: 5   Global Step: 73420   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:03:10,533-Speed 3350.20 samples/sec   Loss 6.2680   LearningRate 0.0496   Epoch: 5   Global Step: 73430   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:03:13,626-Speed 3311.78 samples/sec   Loss 6.1916   LearningRate 0.0496   Epoch: 5   Global Step: 73440   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:03:16,718-Speed 3313.21 samples/sec   Loss 6.1366   LearningRate 0.0496   Epoch: 5   Global Step: 73450   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:03:19,836-Speed 3285.30 samples/sec   Loss 6.2282   LearningRate 0.0496   Epoch: 5   Global Step: 73460   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:03:22,936-Speed 3304.25 samples/sec   Loss 6.1743   LearningRate 0.0496   Epoch: 5   Global Step: 73470   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:03:26,048-Speed 3290.62 samples/sec   Loss 6.2259   LearningRate 0.0496   Epoch: 5   Global Step: 73480   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:03:29,146-Speed 3307.20 samples/sec   Loss 6.2915   LearningRate 0.0496   Epoch: 5   Global Step: 73490   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:03:32,207-Speed 3346.57 samples/sec   Loss 6.1911   LearningRate 0.0496   Epoch: 5   Global Step: 73500   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:03:35,279-Speed 3334.07 samples/sec   Loss 6.3540   LearningRate 0.0496   Epoch: 5   Global Step: 73510   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:03:38,421-Speed 3260.11 samples/sec   Loss 6.3235   LearningRate 0.0496   Epoch: 5   Global Step: 73520   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:03:41,569-Speed 3253.14 samples/sec   Loss 6.2505   LearningRate 0.0496   Epoch: 5   Global Step: 73530   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:03:44,691-Speed 3280.67 samples/sec   Loss 6.3697   LearningRate 0.0496   Epoch: 5   Global Step: 73540   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:03:47,882-Speed 3210.93 samples/sec   Loss 6.2254   LearningRate 0.0496   Epoch: 5   Global Step: 73550   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:03:51,020-Speed 3263.54 samples/sec   Loss 6.2147   LearningRate 0.0495   Epoch: 5   Global Step: 73560   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:03:54,137-Speed 3286.94 samples/sec   Loss 6.3621   LearningRate 0.0495   Epoch: 5   Global Step: 73570   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:03:57,241-Speed 3300.13 samples/sec   Loss 6.2853   LearningRate 0.0495   Epoch: 5   Global Step: 73580   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:04:00,376-Speed 3267.27 samples/sec   Loss 6.3387   LearningRate 0.0495   Epoch: 5   Global Step: 73590   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:04:03,440-Speed 3343.04 samples/sec   Loss 6.2433   LearningRate 0.0495   Epoch: 5   Global Step: 73600   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:04:06,617-Speed 3224.79 samples/sec   Loss 6.2322   LearningRate 0.0495   Epoch: 5   Global Step: 73610   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:04:09,704-Speed 3318.04 samples/sec   Loss 6.1974   LearningRate 0.0495   Epoch: 5   Global Step: 73620   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:04:12,843-Speed 3263.02 samples/sec   Loss 6.2891   LearningRate 0.0495   Epoch: 5   Global Step: 73630   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:04:16,033-Speed 3210.91 samples/sec   Loss 6.3169   LearningRate 0.0495   Epoch: 5   Global Step: 73640   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:04:19,145-Speed 3292.02 samples/sec   Loss 6.2347   LearningRate 0.0495   Epoch: 5   Global Step: 73650   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:04:22,211-Speed 3341.00 samples/sec   Loss 6.2239   LearningRate 0.0495   Epoch: 5   Global Step: 73660   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:04:25,288-Speed 3328.68 samples/sec   Loss 6.2423   LearningRate 0.0495   Epoch: 5   Global Step: 73670   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:04:28,473-Speed 3216.71 samples/sec   Loss 6.3270   LearningRate 0.0495   Epoch: 5   Global Step: 73680   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:04:31,568-Speed 3309.69 samples/sec   Loss 6.2402   LearningRate 0.0495   Epoch: 5   Global Step: 73690   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:04:34,622-Speed 3353.92 samples/sec   Loss 6.3277   LearningRate 0.0495   Epoch: 5   Global Step: 73700   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:04:37,731-Speed 3293.99 samples/sec   Loss 6.3248   LearningRate 0.0495   Epoch: 5   Global Step: 73710   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:04:40,822-Speed 3314.10 samples/sec   Loss 6.2115   LearningRate 0.0495   Epoch: 5   Global Step: 73720   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:04:43,890-Speed 3339.30 samples/sec   Loss 6.2974   LearningRate 0.0494   Epoch: 5   Global Step: 73730   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:04:46,995-Speed 3299.01 samples/sec   Loss 6.2707   LearningRate 0.0494   Epoch: 5   Global Step: 73740   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:04:50,078-Speed 3322.35 samples/sec   Loss 6.1733   LearningRate 0.0494   Epoch: 5   Global Step: 73750   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:04:53,147-Speed 3338.30 samples/sec   Loss 6.2295   LearningRate 0.0494   Epoch: 5   Global Step: 73760   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:04:56,229-Speed 3323.61 samples/sec   Loss 6.1850   LearningRate 0.0494   Epoch: 5   Global Step: 73770   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:04:59,264-Speed 3374.98 samples/sec   Loss 6.2945   LearningRate 0.0494   Epoch: 5   Global Step: 73780   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:05:02,352-Speed 3317.22 samples/sec   Loss 6.2407   LearningRate 0.0494   Epoch: 5   Global Step: 73790   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:05:05,441-Speed 3316.26 samples/sec   Loss 6.3133   LearningRate 0.0494   Epoch: 5   Global Step: 73800   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:05:08,519-Speed 3327.70 samples/sec   Loss 6.2973   LearningRate 0.0494   Epoch: 5   Global Step: 73810   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:05:11,636-Speed 3286.16 samples/sec   Loss 6.3923   LearningRate 0.0494   Epoch: 5   Global Step: 73820   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:05:14,726-Speed 3314.87 samples/sec   Loss 6.3285   LearningRate 0.0494   Epoch: 5   Global Step: 73830   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:05:17,815-Speed 3316.48 samples/sec   Loss 6.3933   LearningRate 0.0494   Epoch: 5   Global Step: 73840   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:05:20,882-Speed 3339.19 samples/sec   Loss 6.3440   LearningRate 0.0494   Epoch: 5   Global Step: 73850   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:05:23,942-Speed 3347.72 samples/sec   Loss 6.3097   LearningRate 0.0494   Epoch: 5   Global Step: 73860   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:05:27,024-Speed 3323.83 samples/sec   Loss 6.2182   LearningRate 0.0494   Epoch: 5   Global Step: 73870   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:05:30,146-Speed 3281.12 samples/sec   Loss 6.3199   LearningRate 0.0494   Epoch: 5   Global Step: 73880   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:05:33,203-Speed 3350.51 samples/sec   Loss 6.2989   LearningRate 0.0494   Epoch: 5   Global Step: 73890   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:05:36,305-Speed 3301.83 samples/sec   Loss 6.0992   LearningRate 0.0494   Epoch: 5   Global Step: 73900   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:05:39,407-Speed 3302.26 samples/sec   Loss 6.2055   LearningRate 0.0493   Epoch: 5   Global Step: 73910   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:05:42,495-Speed 3317.16 samples/sec   Loss 6.1223   LearningRate 0.0493   Epoch: 5   Global Step: 73920   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:05:45,541-Speed 3363.20 samples/sec   Loss 6.2764   LearningRate 0.0493   Epoch: 5   Global Step: 73930   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:05:48,610-Speed 3337.61 samples/sec   Loss 6.2793   LearningRate 0.0493   Epoch: 5   Global Step: 73940   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:05:51,726-Speed 3287.91 samples/sec   Loss 6.3196   LearningRate 0.0493   Epoch: 5   Global Step: 73950   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:05:54,789-Speed 3344.03 samples/sec   Loss 6.2065   LearningRate 0.0493   Epoch: 5   Global Step: 73960   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:05:57,875-Speed 3318.63 samples/sec   Loss 6.3523   LearningRate 0.0493   Epoch: 5   Global Step: 73970   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:06:01,001-Speed 3277.61 samples/sec   Loss 6.3104   LearningRate 0.0493   Epoch: 5   Global Step: 73980   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:06:04,087-Speed 3318.86 samples/sec   Loss 6.2612   LearningRate 0.0493   Epoch: 5   Global Step: 73990   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:06:07,209-Speed 3281.78 samples/sec   Loss 6.2482   LearningRate 0.0493   Epoch: 5   Global Step: 74000   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:06:10,280-Speed 3335.20 samples/sec   Loss 6.2392   LearningRate 0.0493   Epoch: 5   Global Step: 74010   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:06:13,378-Speed 3306.74 samples/sec   Loss 6.2243   LearningRate 0.0493   Epoch: 5   Global Step: 74020   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:06:16,431-Speed 3355.49 samples/sec   Loss 6.2284   LearningRate 0.0493   Epoch: 5   Global Step: 74030   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:06:19,493-Speed 3345.49 samples/sec   Loss 6.2381   LearningRate 0.0493   Epoch: 5   Global Step: 74040   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:06:22,580-Speed 3317.19 samples/sec   Loss 6.3222   LearningRate 0.0493   Epoch: 5   Global Step: 74050   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:06:25,690-Speed 3294.12 samples/sec   Loss 6.2339   LearningRate 0.0493   Epoch: 5   Global Step: 74060   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:06:28,908-Speed 3182.67 samples/sec   Loss 6.1948   LearningRate 0.0493   Epoch: 5   Global Step: 74070   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:06:32,044-Speed 3266.32 samples/sec   Loss 6.3048   LearningRate 0.0493   Epoch: 5   Global Step: 74080   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:06:35,152-Speed 3295.99 samples/sec   Loss 6.3242   LearningRate 0.0492   Epoch: 5   Global Step: 74090   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:06:38,225-Speed 3332.97 samples/sec   Loss 6.2989   LearningRate 0.0492   Epoch: 5   Global Step: 74100   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:06:41,369-Speed 3258.98 samples/sec   Loss 6.2322   LearningRate 0.0492   Epoch: 5   Global Step: 74110   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:06:44,499-Speed 3272.20 samples/sec   Loss 6.2936   LearningRate 0.0492   Epoch: 5   Global Step: 74120   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:06:47,615-Speed 3287.22 samples/sec   Loss 6.2197   LearningRate 0.0492   Epoch: 5   Global Step: 74130   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:06:50,674-Speed 3349.02 samples/sec   Loss 6.2669   LearningRate 0.0492   Epoch: 5   Global Step: 74140   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:06:53,795-Speed 3281.87 samples/sec   Loss 6.2111   LearningRate 0.0492   Epoch: 5   Global Step: 74150   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:06:56,856-Speed 3345.46 samples/sec   Loss 6.2244   LearningRate 0.0492   Epoch: 5   Global Step: 74160   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:06:59,944-Speed 3318.32 samples/sec   Loss 6.1487   LearningRate 0.0492   Epoch: 5   Global Step: 74170   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:07:03,065-Speed 3281.36 samples/sec   Loss 6.2687   LearningRate 0.0492   Epoch: 5   Global Step: 74180   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:07:06,253-Speed 3213.50 samples/sec   Loss 6.2501   LearningRate 0.0492   Epoch: 5   Global Step: 74190   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:07:09,312-Speed 3348.76 samples/sec   Loss 6.3486   LearningRate 0.0492   Epoch: 5   Global Step: 74200   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:07:12,474-Speed 3239.06 samples/sec   Loss 6.1878   LearningRate 0.0492   Epoch: 5   Global Step: 74210   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:07:15,521-Speed 3362.19 samples/sec   Loss 6.1210   LearningRate 0.0492   Epoch: 5   Global Step: 74220   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:07:18,689-Speed 3233.07 samples/sec   Loss 6.2695   LearningRate 0.0492   Epoch: 5   Global Step: 74230   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:07:21,773-Speed 3321.88 samples/sec   Loss 6.2484   LearningRate 0.0492   Epoch: 5   Global Step: 74240   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:07:24,831-Speed 3349.38 samples/sec   Loss 6.2515   LearningRate 0.0492   Epoch: 5   Global Step: 74250   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:07:27,896-Speed 3342.69 samples/sec   Loss 6.1832   LearningRate 0.0492   Epoch: 5   Global Step: 74260   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:07:31,038-Speed 3259.83 samples/sec   Loss 6.2729   LearningRate 0.0491   Epoch: 5   Global Step: 74270   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:07:34,135-Speed 3307.69 samples/sec   Loss 6.3415   LearningRate 0.0491   Epoch: 5   Global Step: 74280   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:07:37,244-Speed 3294.82 samples/sec   Loss 6.3540   LearningRate 0.0491   Epoch: 5   Global Step: 74290   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:07:40,307-Speed 3343.67 samples/sec   Loss 6.2442   LearningRate 0.0491   Epoch: 5   Global Step: 74300   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:07:43,450-Speed 3259.37 samples/sec   Loss 6.2128   LearningRate 0.0491   Epoch: 5   Global Step: 74310   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:07:46,507-Speed 3351.53 samples/sec   Loss 6.2052   LearningRate 0.0491   Epoch: 5   Global Step: 74320   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:07:49,634-Speed 3274.90 samples/sec   Loss 6.2251   LearningRate 0.0491   Epoch: 5   Global Step: 74330   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:07:52,772-Speed 3264.05 samples/sec   Loss 6.2853   LearningRate 0.0491   Epoch: 5   Global Step: 74340   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:07:55,887-Speed 3288.73 samples/sec   Loss 6.2365   LearningRate 0.0491   Epoch: 5   Global Step: 74350   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:07:58,964-Speed 3329.43 samples/sec   Loss 6.2190   LearningRate 0.0491   Epoch: 5   Global Step: 74360   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:08:02,099-Speed 3266.76 samples/sec   Loss 6.3035   LearningRate 0.0491   Epoch: 5   Global Step: 74370   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:08:05,286-Speed 3214.79 samples/sec   Loss 6.2790   LearningRate 0.0491   Epoch: 5   Global Step: 74380   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:08:08,364-Speed 3327.05 samples/sec   Loss 6.1136   LearningRate 0.0491   Epoch: 5   Global Step: 74390   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:08:11,432-Speed 3339.75 samples/sec   Loss 6.1470   LearningRate 0.0491   Epoch: 5   Global Step: 74400   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:08:14,511-Speed 3326.74 samples/sec   Loss 6.2829   LearningRate 0.0491   Epoch: 5   Global Step: 74410   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:08:17,543-Speed 3378.26 samples/sec   Loss 6.2237   LearningRate 0.0491   Epoch: 5   Global Step: 74420   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:08:20,628-Speed 3320.50 samples/sec   Loss 6.2684   LearningRate 0.0491   Epoch: 5   Global Step: 74430   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:08:23,780-Speed 3249.16 samples/sec   Loss 6.2244   LearningRate 0.0490   Epoch: 5   Global Step: 74440   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:08:26,897-Speed 3285.77 samples/sec   Loss 6.2075   LearningRate 0.0490   Epoch: 5   Global Step: 74450   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:08:30,031-Speed 3268.74 samples/sec   Loss 6.1391   LearningRate 0.0490   Epoch: 5   Global Step: 74460   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:08:33,164-Speed 3269.71 samples/sec   Loss 6.2403   LearningRate 0.0490   Epoch: 5   Global Step: 74470   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:08:36,350-Speed 3215.17 samples/sec   Loss 6.1697   LearningRate 0.0490   Epoch: 5   Global Step: 74480   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:08:39,458-Speed 3296.30 samples/sec   Loss 6.2027   LearningRate 0.0490   Epoch: 5   Global Step: 74490   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:08:42,570-Speed 3291.75 samples/sec   Loss 6.2742   LearningRate 0.0490   Epoch: 5   Global Step: 74500   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:08:45,625-Speed 3352.65 samples/sec   Loss 6.1386   LearningRate 0.0490   Epoch: 5   Global Step: 74510   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:08:48,917-Speed 3111.30 samples/sec   Loss 6.3432   LearningRate 0.0490   Epoch: 5   Global Step: 74520   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:09:20,724-Speed 321.97 samples/sec   Loss 5.6774   LearningRate 0.0490   Epoch: 6   Global Step: 74530   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:09:24,368-Speed 2810.78 samples/sec   Loss 4.9113   LearningRate 0.0490   Epoch: 6   Global Step: 74540   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:09:27,511-Speed 3259.13 samples/sec   Loss 4.8665   LearningRate 0.0490   Epoch: 6   Global Step: 74550   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:09:30,549-Speed 3373.01 samples/sec   Loss 4.8183   LearningRate 0.0490   Epoch: 6   Global Step: 74560   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:09:33,556-Speed 3405.59 samples/sec   Loss 4.7501   LearningRate 0.0490   Epoch: 6   Global Step: 74570   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:09:36,594-Speed 3371.62 samples/sec   Loss 4.7698   LearningRate 0.0490   Epoch: 6   Global Step: 74580   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:09:39,622-Speed 3382.96 samples/sec   Loss 4.7712   LearningRate 0.0490   Epoch: 6   Global Step: 74590   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:09:42,699-Speed 3329.00 samples/sec   Loss 4.6857   LearningRate 0.0490   Epoch: 6   Global Step: 74600   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:09:45,691-Speed 3423.78 samples/sec   Loss 4.7413   LearningRate 0.0490   Epoch: 6   Global Step: 74610   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:09:48,785-Speed 3310.22 samples/sec   Loss 4.7189   LearningRate 0.0489   Epoch: 6   Global Step: 74620   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:09:51,829-Speed 3365.85 samples/sec   Loss 4.7313   LearningRate 0.0489   Epoch: 6   Global Step: 74630   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:09:54,934-Speed 3298.97 samples/sec   Loss 4.8303   LearningRate 0.0489   Epoch: 6   Global Step: 74640   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:09:57,982-Speed 3361.04 samples/sec   Loss 4.7515   LearningRate 0.0489   Epoch: 6   Global Step: 74650   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:10:01,055-Speed 3333.47 samples/sec   Loss 4.7981   LearningRate 0.0489   Epoch: 6   Global Step: 74660   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:10:04,142-Speed 3317.56 samples/sec   Loss 4.9135   LearningRate 0.0489   Epoch: 6   Global Step: 74670   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:10:07,221-Speed 3327.55 samples/sec   Loss 4.8174   LearningRate 0.0489   Epoch: 6   Global Step: 74680   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:10:10,257-Speed 3373.50 samples/sec   Loss 4.8762   LearningRate 0.0489   Epoch: 6   Global Step: 74690   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:10:13,299-Speed 3367.40 samples/sec   Loss 4.8226   LearningRate 0.0489   Epoch: 6   Global Step: 74700   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:10:16,329-Speed 3380.15 samples/sec   Loss 4.7504   LearningRate 0.0489   Epoch: 6   Global Step: 74710   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:10:19,811-Speed 2942.16 samples/sec   Loss 4.8493   LearningRate 0.0489   Epoch: 6   Global Step: 74720   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:10:22,872-Speed 3345.82 samples/sec   Loss 4.8081   LearningRate 0.0489   Epoch: 6   Global Step: 74730   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:10:25,916-Speed 3364.40 samples/sec   Loss 4.7263   LearningRate 0.0489   Epoch: 6   Global Step: 74740   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:10:28,995-Speed 3327.05 samples/sec   Loss 4.9118   LearningRate 0.0489   Epoch: 6   Global Step: 74750   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:10:32,032-Speed 3373.89 samples/sec   Loss 4.8410   LearningRate 0.0489   Epoch: 6   Global Step: 74760   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:10:35,072-Speed 3369.67 samples/sec   Loss 4.7766   LearningRate 0.0489   Epoch: 6   Global Step: 74770   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:10:38,171-Speed 3304.81 samples/sec   Loss 4.8567   LearningRate 0.0489   Epoch: 6   Global Step: 74780   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:10:41,259-Speed 3317.48 samples/sec   Loss 4.8317   LearningRate 0.0489   Epoch: 6   Global Step: 74790   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:10:44,346-Speed 3318.18 samples/sec   Loss 4.7810   LearningRate 0.0488   Epoch: 6   Global Step: 74800   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:10:47,523-Speed 3223.63 samples/sec   Loss 4.8855   LearningRate 0.0488   Epoch: 6   Global Step: 74810   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:10:50,600-Speed 3329.00 samples/sec   Loss 4.9118   LearningRate 0.0488   Epoch: 6   Global Step: 74820   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:10:53,700-Speed 3304.00 samples/sec   Loss 4.8462   LearningRate 0.0488   Epoch: 6   Global Step: 74830   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:10:57,480-Speed 2709.87 samples/sec   Loss 4.8710   LearningRate 0.0488   Epoch: 6   Global Step: 74840   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:11:00,503-Speed 3389.05 samples/sec   Loss 5.0124   LearningRate 0.0488   Epoch: 6   Global Step: 74850   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:11:03,566-Speed 3343.43 samples/sec   Loss 4.8551   LearningRate 0.0488   Epoch: 6   Global Step: 74860   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:11:06,633-Speed 3340.93 samples/sec   Loss 4.8696   LearningRate 0.0488   Epoch: 6   Global Step: 74870   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:11:09,721-Speed 3316.74 samples/sec   Loss 4.8790   LearningRate 0.0488   Epoch: 6   Global Step: 74880   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:11:12,769-Speed 3360.48 samples/sec   Loss 4.8224   LearningRate 0.0488   Epoch: 6   Global Step: 74890   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:11:15,861-Speed 3312.67 samples/sec   Loss 4.8497   LearningRate 0.0488   Epoch: 6   Global Step: 74900   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:11:18,934-Speed 3332.95 samples/sec   Loss 4.8016   LearningRate 0.0488   Epoch: 6   Global Step: 74910   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:11:21,967-Speed 3377.38 samples/sec   Loss 4.8402   LearningRate 0.0488   Epoch: 6   Global Step: 74920   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:11:24,994-Speed 3384.70 samples/sec   Loss 4.9777   LearningRate 0.0488   Epoch: 6   Global Step: 74930   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:11:28,117-Speed 3279.26 samples/sec   Loss 4.9048   LearningRate 0.0488   Epoch: 6   Global Step: 74940   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:11:31,139-Speed 3390.02 samples/sec   Loss 4.8849   LearningRate 0.0488   Epoch: 6   Global Step: 74950   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:11:34,214-Speed 3330.53 samples/sec   Loss 4.7797   LearningRate 0.0488   Epoch: 6   Global Step: 74960   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:11:37,227-Speed 3400.75 samples/sec   Loss 4.9418   LearningRate 0.0488   Epoch: 6   Global Step: 74970   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:11:40,344-Speed 3285.64 samples/sec   Loss 4.9276   LearningRate 0.0487   Epoch: 6   Global Step: 74980   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:11:43,479-Speed 3266.95 samples/sec   Loss 4.9339   LearningRate 0.0487   Epoch: 6   Global Step: 74990   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:11:46,582-Speed 3301.13 samples/sec   Loss 4.9476   LearningRate 0.0487   Epoch: 6   Global Step: 75000   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:11:49,625-Speed 3366.29 samples/sec   Loss 4.9324   LearningRate 0.0487   Epoch: 6   Global Step: 75010   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:11:52,764-Speed 3262.98 samples/sec   Loss 4.9094   LearningRate 0.0487   Epoch: 6   Global Step: 75020   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:11:55,832-Speed 3339.68 samples/sec   Loss 4.9472   LearningRate 0.0487   Epoch: 6   Global Step: 75030   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:11:58,887-Speed 3352.46 samples/sec   Loss 4.9657   LearningRate 0.0487   Epoch: 6   Global Step: 75040   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:12:02,002-Speed 3288.34 samples/sec   Loss 4.9753   LearningRate 0.0487   Epoch: 6   Global Step: 75050   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:12:05,085-Speed 3322.76 samples/sec   Loss 4.8835   LearningRate 0.0487   Epoch: 6   Global Step: 75060   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:12:08,125-Speed 3369.96 samples/sec   Loss 4.9279   LearningRate 0.0487   Epoch: 6   Global Step: 75070   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:12:11,148-Speed 3387.83 samples/sec   Loss 4.9680   LearningRate 0.0487   Epoch: 6   Global Step: 75080   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:12:14,245-Speed 3307.60 samples/sec   Loss 4.8848   LearningRate 0.0487   Epoch: 6   Global Step: 75090   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:12:17,395-Speed 3252.26 samples/sec   Loss 4.9831   LearningRate 0.0487   Epoch: 6   Global Step: 75100   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:12:20,440-Speed 3363.93 samples/sec   Loss 4.9848   LearningRate 0.0487   Epoch: 6   Global Step: 75110   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:12:23,446-Speed 3407.55 samples/sec   Loss 4.9771   LearningRate 0.0487   Epoch: 6   Global Step: 75120   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:12:26,490-Speed 3365.00 samples/sec   Loss 4.9712   LearningRate 0.0487   Epoch: 6   Global Step: 75130   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:12:29,556-Speed 3340.42 samples/sec   Loss 4.9268   LearningRate 0.0487   Epoch: 6   Global Step: 75140   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:12:32,593-Speed 3373.72 samples/sec   Loss 5.0148   LearningRate 0.0486   Epoch: 6   Global Step: 75150   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:12:35,652-Speed 3348.72 samples/sec   Loss 4.9900   LearningRate 0.0486   Epoch: 6   Global Step: 75160   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:12:38,708-Speed 3351.87 samples/sec   Loss 4.9824   LearningRate 0.0486   Epoch: 6   Global Step: 75170   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:12:41,774-Speed 3340.15 samples/sec   Loss 4.8776   LearningRate 0.0486   Epoch: 6   Global Step: 75180   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:12:44,859-Speed 3321.38 samples/sec   Loss 5.1187   LearningRate 0.0486   Epoch: 6   Global Step: 75190   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:12:47,912-Speed 3354.87 samples/sec   Loss 4.9585   LearningRate 0.0486   Epoch: 6   Global Step: 75200   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 08:12:50,922-Speed 3402.62 samples/sec   Loss 4.9637   LearningRate 0.0486   Epoch: 6   Global Step: 75210   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:12:53,967-Speed 3364.91 samples/sec   Loss 4.9949   LearningRate 0.0486   Epoch: 6   Global Step: 75220   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:12:57,004-Speed 3372.43 samples/sec   Loss 5.0211   LearningRate 0.0486   Epoch: 6   Global Step: 75230   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:13:00,023-Speed 3392.35 samples/sec   Loss 5.0195   LearningRate 0.0486   Epoch: 6   Global Step: 75240   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:13:03,097-Speed 3332.53 samples/sec   Loss 5.0141   LearningRate 0.0486   Epoch: 6   Global Step: 75250   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:13:06,133-Speed 3374.18 samples/sec   Loss 5.0119   LearningRate 0.0486   Epoch: 6   Global Step: 75260   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:13:09,143-Speed 3402.59 samples/sec   Loss 4.9926   LearningRate 0.0486   Epoch: 6   Global Step: 75270   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:13:12,174-Speed 3379.49 samples/sec   Loss 4.9861   LearningRate 0.0486   Epoch: 6   Global Step: 75280   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:13:15,252-Speed 3328.53 samples/sec   Loss 5.0014   LearningRate 0.0486   Epoch: 6   Global Step: 75290   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:13:18,280-Speed 3381.86 samples/sec   Loss 4.9913   LearningRate 0.0486   Epoch: 6   Global Step: 75300   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:13:21,300-Speed 3392.10 samples/sec   Loss 4.9769   LearningRate 0.0486   Epoch: 6   Global Step: 75310   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:13:24,336-Speed 3374.62 samples/sec   Loss 5.0233   LearningRate 0.0486   Epoch: 6   Global Step: 75320   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:13:27,499-Speed 3238.27 samples/sec   Loss 5.0693   LearningRate 0.0485   Epoch: 6   Global Step: 75330   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:13:30,539-Speed 3369.96 samples/sec   Loss 4.9641   LearningRate 0.0485   Epoch: 6   Global Step: 75340   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:13:33,557-Speed 3393.31 samples/sec   Loss 5.0859   LearningRate 0.0485   Epoch: 6   Global Step: 75350   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:13:36,602-Speed 3364.67 samples/sec   Loss 5.0762   LearningRate 0.0485   Epoch: 6   Global Step: 75360   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:13:39,635-Speed 3377.60 samples/sec   Loss 5.0289   LearningRate 0.0485   Epoch: 6   Global Step: 75370   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:13:42,713-Speed 3328.11 samples/sec   Loss 4.9976   LearningRate 0.0485   Epoch: 6   Global Step: 75380   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:13:45,734-Speed 3391.24 samples/sec   Loss 5.1118   LearningRate 0.0485   Epoch: 6   Global Step: 75390   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:13:48,874-Speed 3261.50 samples/sec   Loss 5.0239   LearningRate 0.0485   Epoch: 6   Global Step: 75400   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:13:51,930-Speed 3351.93 samples/sec   Loss 5.0028   LearningRate 0.0485   Epoch: 6   Global Step: 75410   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:13:54,980-Speed 3358.45 samples/sec   Loss 5.0882   LearningRate 0.0485   Epoch: 6   Global Step: 75420   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:13:58,009-Speed 3382.54 samples/sec   Loss 5.0234   LearningRate 0.0485   Epoch: 6   Global Step: 75430   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:14:01,072-Speed 3344.39 samples/sec   Loss 5.0254   LearningRate 0.0485   Epoch: 6   Global Step: 75440   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:14:04,125-Speed 3354.56 samples/sec   Loss 5.0234   LearningRate 0.0485   Epoch: 6   Global Step: 75450   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:14:07,209-Speed 3321.26 samples/sec   Loss 4.9764   LearningRate 0.0485   Epoch: 6   Global Step: 75460   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:14:10,239-Speed 3380.42 samples/sec   Loss 5.0700   LearningRate 0.0485   Epoch: 6   Global Step: 75470   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:14:13,308-Speed 3338.84 samples/sec   Loss 5.0868   LearningRate 0.0485   Epoch: 6   Global Step: 75480   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:14:16,334-Speed 3384.53 samples/sec   Loss 4.9865   LearningRate 0.0485   Epoch: 6   Global Step: 75490   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:14:19,359-Speed 3386.59 samples/sec   Loss 5.0977   LearningRate 0.0485   Epoch: 6   Global Step: 75500   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:14:22,456-Speed 3306.71 samples/sec   Loss 5.0956   LearningRate 0.0484   Epoch: 6   Global Step: 75510   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:14:25,495-Speed 3371.44 samples/sec   Loss 5.0326   LearningRate 0.0484   Epoch: 6   Global Step: 75520   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:14:28,570-Speed 3330.98 samples/sec   Loss 5.0850   LearningRate 0.0484   Epoch: 6   Global Step: 75530   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:14:31,692-Speed 3280.82 samples/sec   Loss 5.1611   LearningRate 0.0484   Epoch: 6   Global Step: 75540   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:14:34,705-Speed 3399.68 samples/sec   Loss 5.0852   LearningRate 0.0484   Epoch: 6   Global Step: 75550   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:14:37,793-Speed 3317.06 samples/sec   Loss 5.1250   LearningRate 0.0484   Epoch: 6   Global Step: 75560   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:14:40,925-Speed 3270.25 samples/sec   Loss 5.1669   LearningRate 0.0484   Epoch: 6   Global Step: 75570   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:14:43,994-Speed 3337.97 samples/sec   Loss 5.0753   LearningRate 0.0484   Epoch: 6   Global Step: 75580   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:14:47,022-Speed 3382.53 samples/sec   Loss 5.0601   LearningRate 0.0484   Epoch: 6   Global Step: 75590   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:14:50,124-Speed 3302.14 samples/sec   Loss 5.1475   LearningRate 0.0484   Epoch: 6   Global Step: 75600   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:14:53,232-Speed 3296.36 samples/sec   Loss 5.1892   LearningRate 0.0484   Epoch: 6   Global Step: 75610   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:14:56,294-Speed 3345.37 samples/sec   Loss 5.0592   LearningRate 0.0484   Epoch: 6   Global Step: 75620   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:14:59,400-Speed 3297.40 samples/sec   Loss 5.2112   LearningRate 0.0484   Epoch: 6   Global Step: 75630   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:15:02,458-Speed 3350.10 samples/sec   Loss 5.1221   LearningRate 0.0484   Epoch: 6   Global Step: 75640   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:15:05,582-Speed 3278.55 samples/sec   Loss 5.2115   LearningRate 0.0484   Epoch: 6   Global Step: 75650   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:15:08,633-Speed 3357.14 samples/sec   Loss 5.1072   LearningRate 0.0484   Epoch: 6   Global Step: 75660   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:15:11,693-Speed 3348.17 samples/sec   Loss 5.1723   LearningRate 0.0484   Epoch: 6   Global Step: 75670   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:15:14,846-Speed 3248.36 samples/sec   Loss 5.1980   LearningRate 0.0484   Epoch: 6   Global Step: 75680   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:15:17,998-Speed 3249.71 samples/sec   Loss 5.2321   LearningRate 0.0483   Epoch: 6   Global Step: 75690   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:15:21,040-Speed 3367.24 samples/sec   Loss 5.2417   LearningRate 0.0483   Epoch: 6   Global Step: 75700   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:15:24,146-Speed 3297.66 samples/sec   Loss 5.1490   LearningRate 0.0483   Epoch: 6   Global Step: 75710   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:15:27,199-Speed 3355.63 samples/sec   Loss 5.1945   LearningRate 0.0483   Epoch: 6   Global Step: 75720   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:15:30,285-Speed 3318.63 samples/sec   Loss 5.0760   LearningRate 0.0483   Epoch: 6   Global Step: 75730   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:15:33,313-Speed 3383.63 samples/sec   Loss 5.1508   LearningRate 0.0483   Epoch: 6   Global Step: 75740   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:15:36,339-Speed 3385.10 samples/sec   Loss 5.1207   LearningRate 0.0483   Epoch: 6   Global Step: 75750   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:15:39,384-Speed 3363.68 samples/sec   Loss 5.1998   LearningRate 0.0483   Epoch: 6   Global Step: 75760   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:15:42,470-Speed 3319.47 samples/sec   Loss 5.1357   LearningRate 0.0483   Epoch: 6   Global Step: 75770   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:15:45,532-Speed 3344.86 samples/sec   Loss 5.1718   LearningRate 0.0483   Epoch: 6   Global Step: 75780   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:15:48,585-Speed 3355.91 samples/sec   Loss 5.2406   LearningRate 0.0483   Epoch: 6   Global Step: 75790   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:15:51,637-Speed 3355.60 samples/sec   Loss 5.2187   LearningRate 0.0483   Epoch: 6   Global Step: 75800   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:15:54,714-Speed 3329.63 samples/sec   Loss 5.1670   LearningRate 0.0483   Epoch: 6   Global Step: 75810   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:15:57,750-Speed 3373.02 samples/sec   Loss 5.2259   LearningRate 0.0483   Epoch: 6   Global Step: 75820   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:16:00,806-Speed 3351.65 samples/sec   Loss 5.1339   LearningRate 0.0483   Epoch: 6   Global Step: 75830   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:16:03,854-Speed 3360.34 samples/sec   Loss 5.1925   LearningRate 0.0483   Epoch: 6   Global Step: 75840   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:16:06,960-Speed 3298.91 samples/sec   Loss 5.1500   LearningRate 0.0483   Epoch: 6   Global Step: 75850   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:16:10,015-Speed 3352.80 samples/sec   Loss 5.1809   LearningRate 0.0483   Epoch: 6   Global Step: 75860   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:16:13,062-Speed 3361.85 samples/sec   Loss 5.1658   LearningRate 0.0482   Epoch: 6   Global Step: 75870   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 08:16:16,164-Speed 3302.34 samples/sec   Loss 5.2730   LearningRate 0.0482   Epoch: 6   Global Step: 75880   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 08:16:19,228-Speed 3343.76 samples/sec   Loss 5.1876   LearningRate 0.0482   Epoch: 6   Global Step: 75890   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:16:22,291-Speed 3343.94 samples/sec   Loss 5.1758   LearningRate 0.0482   Epoch: 6   Global Step: 75900   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:16:25,383-Speed 3312.69 samples/sec   Loss 5.2691   LearningRate 0.0482   Epoch: 6   Global Step: 75910   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:16:28,454-Speed 3335.72 samples/sec   Loss 5.1382   LearningRate 0.0482   Epoch: 6   Global Step: 75920   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:16:31,539-Speed 3320.82 samples/sec   Loss 5.0889   LearningRate 0.0482   Epoch: 6   Global Step: 75930   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:16:34,585-Speed 3362.36 samples/sec   Loss 5.1764   LearningRate 0.0482   Epoch: 6   Global Step: 75940   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:16:37,640-Speed 3353.57 samples/sec   Loss 5.0488   LearningRate 0.0482   Epoch: 6   Global Step: 75950   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:16:40,756-Speed 3286.35 samples/sec   Loss 5.1298   LearningRate 0.0482   Epoch: 6   Global Step: 75960   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:16:43,867-Speed 3292.49 samples/sec   Loss 5.1748   LearningRate 0.0482   Epoch: 6   Global Step: 75970   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:16:46,978-Speed 3293.57 samples/sec   Loss 5.2369   LearningRate 0.0482   Epoch: 6   Global Step: 75980   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:16:49,998-Speed 3391.55 samples/sec   Loss 5.1949   LearningRate 0.0482   Epoch: 6   Global Step: 75990   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:16:53,062-Speed 3342.93 samples/sec   Loss 5.2824   LearningRate 0.0482   Epoch: 6   Global Step: 76000   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:16:56,099-Speed 3372.99 samples/sec   Loss 5.2904   LearningRate 0.0482   Epoch: 6   Global Step: 76010   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:16:59,137-Speed 3371.18 samples/sec   Loss 5.2340   LearningRate 0.0482   Epoch: 6   Global Step: 76020   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:17:02,186-Speed 3360.35 samples/sec   Loss 5.2043   LearningRate 0.0482   Epoch: 6   Global Step: 76030   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:17:05,294-Speed 3295.61 samples/sec   Loss 5.2266   LearningRate 0.0482   Epoch: 6   Global Step: 76040   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:17:08,343-Speed 3358.78 samples/sec   Loss 5.2248   LearningRate 0.0481   Epoch: 6   Global Step: 76050   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:17:11,371-Speed 3383.52 samples/sec   Loss 5.2359   LearningRate 0.0481   Epoch: 6   Global Step: 76060   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:17:14,434-Speed 3343.47 samples/sec   Loss 5.1466   LearningRate 0.0481   Epoch: 6   Global Step: 76070   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:17:17,506-Speed 3334.56 samples/sec   Loss 5.1356   LearningRate 0.0481   Epoch: 6   Global Step: 76080   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:17:20,580-Speed 3333.11 samples/sec   Loss 5.2995   LearningRate 0.0481   Epoch: 6   Global Step: 76090   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:17:23,643-Speed 3343.42 samples/sec   Loss 5.2193   LearningRate 0.0481   Epoch: 6   Global Step: 76100   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:17:26,724-Speed 3325.23 samples/sec   Loss 5.2841   LearningRate 0.0481   Epoch: 6   Global Step: 76110   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:17:29,818-Speed 3310.36 samples/sec   Loss 5.3471   LearningRate 0.0481   Epoch: 6   Global Step: 76120   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:17:32,883-Speed 3342.61 samples/sec   Loss 5.1891   LearningRate 0.0481   Epoch: 6   Global Step: 76130   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:17:35,926-Speed 3366.20 samples/sec   Loss 5.2335   LearningRate 0.0481   Epoch: 6   Global Step: 76140   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:17:39,049-Speed 3279.77 samples/sec   Loss 5.1403   LearningRate 0.0481   Epoch: 6   Global Step: 76150   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:17:42,165-Speed 3286.72 samples/sec   Loss 5.2880   LearningRate 0.0481   Epoch: 6   Global Step: 76160   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:17:45,240-Speed 3332.33 samples/sec   Loss 5.3248   LearningRate 0.0481   Epoch: 6   Global Step: 76170   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:17:48,328-Speed 3316.16 samples/sec   Loss 5.2287   LearningRate 0.0481   Epoch: 6   Global Step: 76180   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:17:51,421-Speed 3311.84 samples/sec   Loss 5.2686   LearningRate 0.0481   Epoch: 6   Global Step: 76190   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:17:54,477-Speed 3352.39 samples/sec   Loss 5.1943   LearningRate 0.0481   Epoch: 6   Global Step: 76200   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:17:57,546-Speed 3338.08 samples/sec   Loss 5.2356   LearningRate 0.0481   Epoch: 6   Global Step: 76210   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:18:00,607-Speed 3345.92 samples/sec   Loss 5.1522   LearningRate 0.0480   Epoch: 6   Global Step: 76220   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:18:03,639-Speed 3378.41 samples/sec   Loss 5.2266   LearningRate 0.0480   Epoch: 6   Global Step: 76230   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:18:06,766-Speed 3275.80 samples/sec   Loss 5.2503   LearningRate 0.0480   Epoch: 6   Global Step: 76240   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:18:09,800-Speed 3375.99 samples/sec   Loss 5.2485   LearningRate 0.0480   Epoch: 6   Global Step: 76250   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 08:18:12,877-Speed 3329.49 samples/sec   Loss 5.2665   LearningRate 0.0480   Epoch: 6   Global Step: 76260   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:18:15,932-Speed 3352.32 samples/sec   Loss 5.2763   LearningRate 0.0480   Epoch: 6   Global Step: 76270   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:18:18,983-Speed 3357.12 samples/sec   Loss 5.2776   LearningRate 0.0480   Epoch: 6   Global Step: 76280   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:18:22,024-Speed 3368.77 samples/sec   Loss 5.4557   LearningRate 0.0480   Epoch: 6   Global Step: 76290   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:18:25,078-Speed 3355.21 samples/sec   Loss 5.2734   LearningRate 0.0480   Epoch: 6   Global Step: 76300   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:18:28,214-Speed 3266.04 samples/sec   Loss 5.3511   LearningRate 0.0480   Epoch: 6   Global Step: 76310   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:18:31,362-Speed 3254.94 samples/sec   Loss 5.2537   LearningRate 0.0480   Epoch: 6   Global Step: 76320   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:18:34,398-Speed 3373.64 samples/sec   Loss 5.2358   LearningRate 0.0480   Epoch: 6   Global Step: 76330   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:18:37,482-Speed 3321.07 samples/sec   Loss 5.3362   LearningRate 0.0480   Epoch: 6   Global Step: 76340   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:18:40,594-Speed 3291.48 samples/sec   Loss 5.3537   LearningRate 0.0480   Epoch: 6   Global Step: 76350   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:18:43,638-Speed 3365.35 samples/sec   Loss 5.2717   LearningRate 0.0480   Epoch: 6   Global Step: 76360   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 08:18:46,674-Speed 3374.36 samples/sec   Loss 5.3335   LearningRate 0.0480   Epoch: 6   Global Step: 76370   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 08:18:49,797-Speed 3279.93 samples/sec   Loss 5.3273   LearningRate 0.0480   Epoch: 6   Global Step: 76380   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:18:52,877-Speed 3324.91 samples/sec   Loss 5.2664   LearningRate 0.0480   Epoch: 6   Global Step: 76390   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:18:55,921-Speed 3365.41 samples/sec   Loss 5.3513   LearningRate 0.0479   Epoch: 6   Global Step: 76400   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:18:59,045-Speed 3278.48 samples/sec   Loss 5.3969   LearningRate 0.0479   Epoch: 6   Global Step: 76410   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:19:02,073-Speed 3382.77 samples/sec   Loss 5.3042   LearningRate 0.0479   Epoch: 6   Global Step: 76420   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:19:05,184-Speed 3293.59 samples/sec   Loss 5.3133   LearningRate 0.0479   Epoch: 6   Global Step: 76430   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:19:08,278-Speed 3309.74 samples/sec   Loss 5.3582   LearningRate 0.0479   Epoch: 6   Global Step: 76440   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:19:11,341-Speed 3344.76 samples/sec   Loss 5.2769   LearningRate 0.0479   Epoch: 6   Global Step: 76450   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:19:14,435-Speed 3310.47 samples/sec   Loss 5.3687   LearningRate 0.0479   Epoch: 6   Global Step: 76460   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:19:17,515-Speed 3325.75 samples/sec   Loss 5.3558   LearningRate 0.0479   Epoch: 6   Global Step: 76470   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:19:20,551-Speed 3373.57 samples/sec   Loss 5.3185   LearningRate 0.0479   Epoch: 6   Global Step: 76480   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:19:23,734-Speed 3218.20 samples/sec   Loss 5.3610   LearningRate 0.0479   Epoch: 6   Global Step: 76490   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:19:26,887-Speed 3249.63 samples/sec   Loss 5.2880   LearningRate 0.0479   Epoch: 6   Global Step: 76500   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:19:30,045-Speed 3243.33 samples/sec   Loss 5.3829   LearningRate 0.0479   Epoch: 6   Global Step: 76510   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:19:33,119-Speed 3332.43 samples/sec   Loss 5.3390   LearningRate 0.0479   Epoch: 6   Global Step: 76520   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:19:36,203-Speed 3321.43 samples/sec   Loss 5.2862   LearningRate 0.0479   Epoch: 6   Global Step: 76530   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:19:39,320-Speed 3286.49 samples/sec   Loss 5.4550   LearningRate 0.0479   Epoch: 6   Global Step: 76540   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:19:42,377-Speed 3350.84 samples/sec   Loss 5.3905   LearningRate 0.0479   Epoch: 6   Global Step: 76550   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:19:45,440-Speed 3344.12 samples/sec   Loss 5.4095   LearningRate 0.0479   Epoch: 6   Global Step: 76560   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:19:48,527-Speed 3318.86 samples/sec   Loss 5.4039   LearningRate 0.0479   Epoch: 6   Global Step: 76570   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:19:51,636-Speed 3294.72 samples/sec   Loss 5.3502   LearningRate 0.0478   Epoch: 6   Global Step: 76580   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:19:54,697-Speed 3346.13 samples/sec   Loss 5.3959   LearningRate 0.0478   Epoch: 6   Global Step: 76590   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:19:57,732-Speed 3375.53 samples/sec   Loss 5.4502   LearningRate 0.0478   Epoch: 6   Global Step: 76600   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:20:00,804-Speed 3333.85 samples/sec   Loss 5.3998   LearningRate 0.0478   Epoch: 6   Global Step: 76610   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:20:03,921-Speed 3286.50 samples/sec   Loss 5.3547   LearningRate 0.0478   Epoch: 6   Global Step: 76620   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:20:07,018-Speed 3307.75 samples/sec   Loss 5.3750   LearningRate 0.0478   Epoch: 6   Global Step: 76630   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:20:10,103-Speed 3320.66 samples/sec   Loss 5.4835   LearningRate 0.0478   Epoch: 6   Global Step: 76640   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:20:13,192-Speed 3316.08 samples/sec   Loss 5.3442   LearningRate 0.0478   Epoch: 6   Global Step: 76650   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:20:16,297-Speed 3298.95 samples/sec   Loss 5.3424   LearningRate 0.0478   Epoch: 6   Global Step: 76660   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:20:19,391-Speed 3310.69 samples/sec   Loss 5.3876   LearningRate 0.0478   Epoch: 6   Global Step: 76670   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:20:22,449-Speed 3349.89 samples/sec   Loss 5.3281   LearningRate 0.0478   Epoch: 6   Global Step: 76680   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:20:25,613-Speed 3237.72 samples/sec   Loss 5.4168   LearningRate 0.0478   Epoch: 6   Global Step: 76690   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:20:28,725-Speed 3291.28 samples/sec   Loss 5.2822   LearningRate 0.0478   Epoch: 6   Global Step: 76700   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:20:31,815-Speed 3315.73 samples/sec   Loss 5.3335   LearningRate 0.0478   Epoch: 6   Global Step: 76710   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:20:34,876-Speed 3345.59 samples/sec   Loss 5.2969   LearningRate 0.0478   Epoch: 6   Global Step: 76720   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:20:38,050-Speed 3227.28 samples/sec   Loss 5.3501   LearningRate 0.0478   Epoch: 6   Global Step: 76730   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 08:20:41,152-Speed 3303.17 samples/sec   Loss 5.4308   LearningRate 0.0478   Epoch: 6   Global Step: 76740   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:20:44,210-Speed 3349.76 samples/sec   Loss 5.3273   LearningRate 0.0478   Epoch: 6   Global Step: 76750   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:20:47,295-Speed 3319.69 samples/sec   Loss 5.4312   LearningRate 0.0477   Epoch: 6   Global Step: 76760   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:20:50,447-Speed 3250.56 samples/sec   Loss 5.3119   LearningRate 0.0477   Epoch: 6   Global Step: 76770   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:20:53,565-Speed 3284.73 samples/sec   Loss 5.4035   LearningRate 0.0477   Epoch: 6   Global Step: 76780   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:20:56,649-Speed 3321.80 samples/sec   Loss 5.2767   LearningRate 0.0477   Epoch: 6   Global Step: 76790   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:20:59,744-Speed 3309.51 samples/sec   Loss 5.3539   LearningRate 0.0477   Epoch: 6   Global Step: 76800   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:21:02,836-Speed 3312.45 samples/sec   Loss 5.2793   LearningRate 0.0477   Epoch: 6   Global Step: 76810   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:21:05,898-Speed 3345.47 samples/sec   Loss 5.3194   LearningRate 0.0477   Epoch: 6   Global Step: 76820   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:21:08,955-Speed 3350.78 samples/sec   Loss 5.3087   LearningRate 0.0477   Epoch: 6   Global Step: 76830   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:21:12,035-Speed 3325.59 samples/sec   Loss 5.2792   LearningRate 0.0477   Epoch: 6   Global Step: 76840   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:21:15,124-Speed 3316.64 samples/sec   Loss 5.3557   LearningRate 0.0477   Epoch: 6   Global Step: 76850   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:21:18,264-Speed 3262.23 samples/sec   Loss 5.4066   LearningRate 0.0477   Epoch: 6   Global Step: 76860   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:21:21,323-Speed 3348.58 samples/sec   Loss 5.4089   LearningRate 0.0477   Epoch: 6   Global Step: 76870   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:21:24,409-Speed 3319.40 samples/sec   Loss 5.3958   LearningRate 0.0477   Epoch: 6   Global Step: 76880   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:21:27,500-Speed 3313.87 samples/sec   Loss 5.3874   LearningRate 0.0477   Epoch: 6   Global Step: 76890   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:21:30,587-Speed 3318.18 samples/sec   Loss 5.4487   LearningRate 0.0477   Epoch: 6   Global Step: 76900   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:21:33,642-Speed 3352.62 samples/sec   Loss 5.3655   LearningRate 0.0477   Epoch: 6   Global Step: 76910   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:21:36,706-Speed 3343.23 samples/sec   Loss 5.3529   LearningRate 0.0477   Epoch: 6   Global Step: 76920   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:21:39,788-Speed 3324.15 samples/sec   Loss 5.3577   LearningRate 0.0477   Epoch: 6   Global Step: 76930   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:21:42,890-Speed 3302.29 samples/sec   Loss 5.3680   LearningRate 0.0476   Epoch: 6   Global Step: 76940   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:21:45,946-Speed 3351.35 samples/sec   Loss 5.4940   LearningRate 0.0476   Epoch: 6   Global Step: 76950   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:21:49,060-Speed 3289.76 samples/sec   Loss 5.3652   LearningRate 0.0476   Epoch: 6   Global Step: 76960   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:21:52,132-Speed 3334.80 samples/sec   Loss 5.3834   LearningRate 0.0476   Epoch: 6   Global Step: 76970   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:21:55,298-Speed 3234.85 samples/sec   Loss 5.5264   LearningRate 0.0476   Epoch: 6   Global Step: 76980   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:21:58,326-Speed 3383.37 samples/sec   Loss 5.4155   LearningRate 0.0476   Epoch: 6   Global Step: 76990   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:22:01,395-Speed 3338.30 samples/sec   Loss 5.4310   LearningRate 0.0476   Epoch: 6   Global Step: 77000   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:22:04,543-Speed 3253.64 samples/sec   Loss 5.3508   LearningRate 0.0476   Epoch: 6   Global Step: 77010   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:22:07,671-Speed 3274.35 samples/sec   Loss 5.4325   LearningRate 0.0476   Epoch: 6   Global Step: 77020   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:22:10,721-Speed 3358.42 samples/sec   Loss 5.4194   LearningRate 0.0476   Epoch: 6   Global Step: 77030   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:22:13,847-Speed 3277.47 samples/sec   Loss 5.3813   LearningRate 0.0476   Epoch: 6   Global Step: 77040   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:22:16,977-Speed 3271.91 samples/sec   Loss 5.4467   LearningRate 0.0476   Epoch: 6   Global Step: 77050   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:22:20,076-Speed 3305.94 samples/sec   Loss 5.3650   LearningRate 0.0476   Epoch: 6   Global Step: 77060   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:22:23,164-Speed 3317.33 samples/sec   Loss 5.5105   LearningRate 0.0476   Epoch: 6   Global Step: 77070   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:22:26,243-Speed 3326.68 samples/sec   Loss 5.3321   LearningRate 0.0476   Epoch: 6   Global Step: 77080   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:22:29,332-Speed 3315.96 samples/sec   Loss 5.3975   LearningRate 0.0476   Epoch: 6   Global Step: 77090   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:22:32,385-Speed 3354.88 samples/sec   Loss 5.4762   LearningRate 0.0476   Epoch: 6   Global Step: 77100   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:22:35,473-Speed 3317.34 samples/sec   Loss 5.3388   LearningRate 0.0476   Epoch: 6   Global Step: 77110   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:22:38,606-Speed 3269.69 samples/sec   Loss 5.5537   LearningRate 0.0475   Epoch: 6   Global Step: 77120   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:22:41,730-Speed 3279.19 samples/sec   Loss 5.3971   LearningRate 0.0475   Epoch: 6   Global Step: 77130   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:22:44,789-Speed 3348.71 samples/sec   Loss 5.5282   LearningRate 0.0475   Epoch: 6   Global Step: 77140   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:22:47,895-Speed 3297.39 samples/sec   Loss 5.4556   LearningRate 0.0475   Epoch: 6   Global Step: 77150   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:22:50,974-Speed 3327.21 samples/sec   Loss 5.4759   LearningRate 0.0475   Epoch: 6   Global Step: 77160   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:22:54,066-Speed 3313.21 samples/sec   Loss 5.4843   LearningRate 0.0475   Epoch: 6   Global Step: 77170   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:22:57,107-Speed 3368.08 samples/sec   Loss 5.4190   LearningRate 0.0475   Epoch: 6   Global Step: 77180   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:23:00,236-Speed 3274.73 samples/sec   Loss 5.5282   LearningRate 0.0475   Epoch: 6   Global Step: 77190   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:23:03,322-Speed 3319.23 samples/sec   Loss 5.4699   LearningRate 0.0475   Epoch: 6   Global Step: 77200   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:23:06,436-Speed 3289.47 samples/sec   Loss 5.4941   LearningRate 0.0475   Epoch: 6   Global Step: 77210   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:23:09,527-Speed 3313.33 samples/sec   Loss 5.4576   LearningRate 0.0475   Epoch: 6   Global Step: 77220   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:23:12,623-Speed 3309.21 samples/sec   Loss 5.4832   LearningRate 0.0475   Epoch: 6   Global Step: 77230   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:23:15,705-Speed 3323.65 samples/sec   Loss 5.3385   LearningRate 0.0475   Epoch: 6   Global Step: 77240   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:23:18,806-Speed 3302.83 samples/sec   Loss 5.4002   LearningRate 0.0475   Epoch: 6   Global Step: 77250   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:23:21,870-Speed 3344.07 samples/sec   Loss 5.3775   LearningRate 0.0475   Epoch: 6   Global Step: 77260   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:23:24,963-Speed 3311.10 samples/sec   Loss 5.5144   LearningRate 0.0475   Epoch: 6   Global Step: 77270   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:23:28,090-Speed 3276.09 samples/sec   Loss 5.4621   LearningRate 0.0475   Epoch: 6   Global Step: 77280   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:23:31,176-Speed 3318.57 samples/sec   Loss 5.3980   LearningRate 0.0475   Epoch: 6   Global Step: 77290   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:23:34,253-Speed 3329.41 samples/sec   Loss 5.5243   LearningRate 0.0474   Epoch: 6   Global Step: 77300   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:23:37,333-Speed 3326.12 samples/sec   Loss 5.4815   LearningRate 0.0474   Epoch: 6   Global Step: 77310   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:23:40,431-Speed 3306.04 samples/sec   Loss 5.3886   LearningRate 0.0474   Epoch: 6   Global Step: 77320   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:23:43,527-Speed 3308.87 samples/sec   Loss 5.4316   LearningRate 0.0474   Epoch: 6   Global Step: 77330   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:23:46,574-Speed 3361.35 samples/sec   Loss 5.4769   LearningRate 0.0474   Epoch: 6   Global Step: 77340   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:23:49,615-Speed 3368.45 samples/sec   Loss 5.5475   LearningRate 0.0474   Epoch: 6   Global Step: 77350   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:23:52,677-Speed 3344.95 samples/sec   Loss 5.4513   LearningRate 0.0474   Epoch: 6   Global Step: 77360   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:23:55,749-Speed 3335.04 samples/sec   Loss 5.4129   LearningRate 0.0474   Epoch: 6   Global Step: 77370   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:23:58,812-Speed 3343.73 samples/sec   Loss 5.4373   LearningRate 0.0474   Epoch: 6   Global Step: 77380   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:24:01,933-Speed 3282.40 samples/sec   Loss 5.4257   LearningRate 0.0474   Epoch: 6   Global Step: 77390   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:24:05,082-Speed 3252.37 samples/sec   Loss 5.4721   LearningRate 0.0474   Epoch: 6   Global Step: 77400   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:24:08,153-Speed 3335.74 samples/sec   Loss 5.5921   LearningRate 0.0474   Epoch: 6   Global Step: 77410   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:24:11,219-Speed 3341.17 samples/sec   Loss 5.5352   LearningRate 0.0474   Epoch: 6   Global Step: 77420   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:24:14,340-Speed 3281.93 samples/sec   Loss 5.5432   LearningRate 0.0474   Epoch: 6   Global Step: 77430   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:24:17,404-Speed 3343.54 samples/sec   Loss 5.4314   LearningRate 0.0474   Epoch: 6   Global Step: 77440   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:24:20,475-Speed 3335.72 samples/sec   Loss 5.4492   LearningRate 0.0474   Epoch: 6   Global Step: 77450   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:24:23,580-Speed 3298.69 samples/sec   Loss 5.5936   LearningRate 0.0474   Epoch: 6   Global Step: 77460   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:24:26,645-Speed 3341.32 samples/sec   Loss 5.4344   LearningRate 0.0474   Epoch: 6   Global Step: 77470   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:24:29,754-Speed 3294.67 samples/sec   Loss 5.4787   LearningRate 0.0473   Epoch: 6   Global Step: 77480   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:24:32,790-Speed 3374.59 samples/sec   Loss 5.5633   LearningRate 0.0473   Epoch: 6   Global Step: 77490   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:24:35,894-Speed 3299.84 samples/sec   Loss 5.4381   LearningRate 0.0473   Epoch: 6   Global Step: 77500   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:24:39,015-Speed 3282.56 samples/sec   Loss 5.5224   LearningRate 0.0473   Epoch: 6   Global Step: 77510   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:24:42,106-Speed 3313.18 samples/sec   Loss 5.6054   LearningRate 0.0473   Epoch: 6   Global Step: 77520   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:24:45,171-Speed 3342.23 samples/sec   Loss 5.5270   LearningRate 0.0473   Epoch: 6   Global Step: 77530   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:24:48,251-Speed 3326.37 samples/sec   Loss 5.4757   LearningRate 0.0473   Epoch: 6   Global Step: 77540   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:24:51,315-Speed 3342.82 samples/sec   Loss 5.6294   LearningRate 0.0473   Epoch: 6   Global Step: 77550   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:24:54,417-Speed 3302.22 samples/sec   Loss 5.4601   LearningRate 0.0473   Epoch: 6   Global Step: 77560   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:24:57,464-Speed 3361.80 samples/sec   Loss 5.3458   LearningRate 0.0473   Epoch: 6   Global Step: 77570   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:25:00,503-Speed 3370.18 samples/sec   Loss 5.4251   LearningRate 0.0473   Epoch: 6   Global Step: 77580   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:25:03,584-Speed 3324.50 samples/sec   Loss 5.5015   LearningRate 0.0473   Epoch: 6   Global Step: 77590   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:25:06,697-Speed 3291.49 samples/sec   Loss 5.3798   LearningRate 0.0473   Epoch: 6   Global Step: 77600   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:25:09,747-Speed 3357.72 samples/sec   Loss 5.5380   LearningRate 0.0473   Epoch: 6   Global Step: 77610   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:25:12,826-Speed 3327.01 samples/sec   Loss 5.5250   LearningRate 0.0473   Epoch: 6   Global Step: 77620   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:25:16,019-Speed 3208.10 samples/sec   Loss 5.5410   LearningRate 0.0473   Epoch: 6   Global Step: 77630   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:25:19,080-Speed 3347.35 samples/sec   Loss 5.5828   LearningRate 0.0473   Epoch: 6   Global Step: 77640   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:25:22,167-Speed 3317.78 samples/sec   Loss 5.4179   LearningRate 0.0473   Epoch: 6   Global Step: 77650   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:25:25,222-Speed 3353.08 samples/sec   Loss 5.5501   LearningRate 0.0472   Epoch: 6   Global Step: 77660   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:25:28,312-Speed 3314.52 samples/sec   Loss 5.4523   LearningRate 0.0472   Epoch: 6   Global Step: 77670   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:25:31,415-Speed 3301.83 samples/sec   Loss 5.4447   LearningRate 0.0472   Epoch: 6   Global Step: 77680   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:25:34,508-Speed 3311.26 samples/sec   Loss 5.4936   LearningRate 0.0472   Epoch: 6   Global Step: 77690   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:25:37,657-Speed 3252.76 samples/sec   Loss 5.5209   LearningRate 0.0472   Epoch: 6   Global Step: 77700   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:25:40,874-Speed 3184.36 samples/sec   Loss 5.6091   LearningRate 0.0472   Epoch: 6   Global Step: 77710   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:25:43,991-Speed 3285.21 samples/sec   Loss 5.5608   LearningRate 0.0472   Epoch: 6   Global Step: 77720   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:25:47,168-Speed 3224.03 samples/sec   Loss 5.4946   LearningRate 0.0472   Epoch: 6   Global Step: 77730   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:25:50,279-Speed 3292.81 samples/sec   Loss 5.5674   LearningRate 0.0472   Epoch: 6   Global Step: 77740   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-04-27 08:25:53,401-Speed 3281.85 samples/sec   Loss 5.5783   LearningRate 0.0472   Epoch: 6   Global Step: 77750   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:25:56,472-Speed 3334.67 samples/sec   Loss 5.6238   LearningRate 0.0472   Epoch: 6   Global Step: 77760   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:25:59,546-Speed 3333.02 samples/sec   Loss 5.5321   LearningRate 0.0472   Epoch: 6   Global Step: 77770   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:26:02,614-Speed 3338.28 samples/sec   Loss 5.5750   LearningRate 0.0472   Epoch: 6   Global Step: 77780   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:26:05,676-Speed 3345.19 samples/sec   Loss 5.5270   LearningRate 0.0472   Epoch: 6   Global Step: 77790   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:26:08,737-Speed 3346.71 samples/sec   Loss 5.5007   LearningRate 0.0472   Epoch: 6   Global Step: 77800   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:26:11,830-Speed 3312.00 samples/sec   Loss 5.5370   LearningRate 0.0472   Epoch: 6   Global Step: 77810   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:26:15,052-Speed 3179.14 samples/sec   Loss 5.5574   LearningRate 0.0472   Epoch: 6   Global Step: 77820   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:26:18,144-Speed 3312.86 samples/sec   Loss 5.5188   LearningRate 0.0472   Epoch: 6   Global Step: 77830   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:26:21,200-Speed 3352.23 samples/sec   Loss 5.5693   LearningRate 0.0472   Epoch: 6   Global Step: 77840   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:26:24,263-Speed 3344.23 samples/sec   Loss 5.5047   LearningRate 0.0471   Epoch: 6   Global Step: 77850   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:26:27,337-Speed 3331.55 samples/sec   Loss 5.4865   LearningRate 0.0471   Epoch: 6   Global Step: 77860   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:26:30,510-Speed 3228.14 samples/sec   Loss 5.4817   LearningRate 0.0471   Epoch: 6   Global Step: 77870   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:26:33,609-Speed 3305.43 samples/sec   Loss 5.5977   LearningRate 0.0471   Epoch: 6   Global Step: 77880   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:26:36,767-Speed 3243.44 samples/sec   Loss 5.6062   LearningRate 0.0471   Epoch: 6   Global Step: 77890   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:26:39,869-Speed 3302.32 samples/sec   Loss 5.5833   LearningRate 0.0471   Epoch: 6   Global Step: 77900   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:26:43,020-Speed 3251.44 samples/sec   Loss 5.5448   LearningRate 0.0471   Epoch: 6   Global Step: 77910   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:26:46,083-Speed 3343.96 samples/sec   Loss 5.5578   LearningRate 0.0471   Epoch: 6   Global Step: 77920   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:26:49,195-Speed 3291.38 samples/sec   Loss 5.5085   LearningRate 0.0471   Epoch: 6   Global Step: 77930   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:26:52,316-Speed 3282.91 samples/sec   Loss 5.4967   LearningRate 0.0471   Epoch: 6   Global Step: 77940   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:26:55,433-Speed 3286.25 samples/sec   Loss 5.6302   LearningRate 0.0471   Epoch: 6   Global Step: 77950   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:26:58,549-Speed 3287.30 samples/sec   Loss 5.5819   LearningRate 0.0471   Epoch: 6   Global Step: 77960   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:27:01,648-Speed 3304.86 samples/sec   Loss 5.4418   LearningRate 0.0471   Epoch: 6   Global Step: 77970   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:27:04,758-Speed 3293.78 samples/sec   Loss 5.6176   LearningRate 0.0471   Epoch: 6   Global Step: 77980   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:27:07,855-Speed 3307.33 samples/sec   Loss 5.5639   LearningRate 0.0471   Epoch: 6   Global Step: 77990   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:27:10,913-Speed 3350.00 samples/sec   Loss 5.5061   LearningRate 0.0471   Epoch: 6   Global Step: 78000   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:27:14,074-Speed 3240.62 samples/sec   Loss 5.5279   LearningRate 0.0471   Epoch: 6   Global Step: 78010   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:27:17,168-Speed 3310.48 samples/sec   Loss 5.5704   LearningRate 0.0471   Epoch: 6   Global Step: 78020   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:27:20,255-Speed 3317.24 samples/sec   Loss 5.6311   LearningRate 0.0470   Epoch: 6   Global Step: 78030   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:27:23,396-Speed 3261.07 samples/sec   Loss 5.5369   LearningRate 0.0470   Epoch: 6   Global Step: 78040   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:27:26,533-Speed 3265.77 samples/sec   Loss 5.6350   LearningRate 0.0470   Epoch: 6   Global Step: 78050   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:27:29,658-Speed 3277.33 samples/sec   Loss 5.4820   LearningRate 0.0470   Epoch: 6   Global Step: 78060   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:27:32,734-Speed 3330.22 samples/sec   Loss 5.5625   LearningRate 0.0470   Epoch: 6   Global Step: 78070   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:27:35,877-Speed 3258.96 samples/sec   Loss 5.4674   LearningRate 0.0470   Epoch: 6   Global Step: 78080   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:27:39,017-Speed 3262.18 samples/sec   Loss 5.4850   LearningRate 0.0470   Epoch: 6   Global Step: 78090   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:27:42,106-Speed 3316.27 samples/sec   Loss 5.5392   LearningRate 0.0470   Epoch: 6   Global Step: 78100   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:27:45,153-Speed 3361.99 samples/sec   Loss 5.5622   LearningRate 0.0470   Epoch: 6   Global Step: 78110   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:27:48,305-Speed 3249.51 samples/sec   Loss 5.5719   LearningRate 0.0470   Epoch: 6   Global Step: 78120   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:27:51,459-Speed 3247.73 samples/sec   Loss 5.5404   LearningRate 0.0470   Epoch: 6   Global Step: 78130   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:27:54,550-Speed 3313.39 samples/sec   Loss 5.5538   LearningRate 0.0470   Epoch: 6   Global Step: 78140   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:27:57,581-Speed 3379.67 samples/sec   Loss 5.4793   LearningRate 0.0470   Epoch: 6   Global Step: 78150   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:28:00,635-Speed 3354.14 samples/sec   Loss 5.6062   LearningRate 0.0470   Epoch: 6   Global Step: 78160   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:28:03,717-Speed 3324.08 samples/sec   Loss 5.5881   LearningRate 0.0470   Epoch: 6   Global Step: 78170   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:28:06,811-Speed 3310.81 samples/sec   Loss 5.6292   LearningRate 0.0470   Epoch: 6   Global Step: 78180   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:28:09,940-Speed 3272.96 samples/sec   Loss 5.6443   LearningRate 0.0470   Epoch: 6   Global Step: 78190   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:28:13,016-Speed 3330.45 samples/sec   Loss 5.6076   LearningRate 0.0470   Epoch: 6   Global Step: 78200   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:28:16,161-Speed 3256.93 samples/sec   Loss 5.6418   LearningRate 0.0469   Epoch: 6   Global Step: 78210   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:28:19,287-Speed 3276.86 samples/sec   Loss 5.5269   LearningRate 0.0469   Epoch: 6   Global Step: 78220   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:28:22,351-Speed 3343.00 samples/sec   Loss 5.6074   LearningRate 0.0469   Epoch: 6   Global Step: 78230   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:28:25,471-Speed 3283.16 samples/sec   Loss 5.5915   LearningRate 0.0469   Epoch: 6   Global Step: 78240   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:28:28,558-Speed 3317.71 samples/sec   Loss 5.5214   LearningRate 0.0469   Epoch: 6   Global Step: 78250   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:28:31,704-Speed 3255.65 samples/sec   Loss 5.6774   LearningRate 0.0469   Epoch: 6   Global Step: 78260   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:28:34,803-Speed 3305.23 samples/sec   Loss 5.5418   LearningRate 0.0469   Epoch: 6   Global Step: 78270   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:28:37,966-Speed 3238.28 samples/sec   Loss 5.6638   LearningRate 0.0469   Epoch: 6   Global Step: 78280   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:28:41,097-Speed 3272.55 samples/sec   Loss 5.6027   LearningRate 0.0469   Epoch: 6   Global Step: 78290   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:28:44,178-Speed 3323.71 samples/sec   Loss 5.6129   LearningRate 0.0469   Epoch: 6   Global Step: 78300   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:28:47,270-Speed 3312.79 samples/sec   Loss 5.6073   LearningRate 0.0469   Epoch: 6   Global Step: 78310   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:28:50,339-Speed 3338.04 samples/sec   Loss 5.6285   LearningRate 0.0469   Epoch: 6   Global Step: 78320   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:28:53,542-Speed 3198.15 samples/sec   Loss 5.5382   LearningRate 0.0469   Epoch: 6   Global Step: 78330   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:28:56,640-Speed 3306.32 samples/sec   Loss 5.6352   LearningRate 0.0469   Epoch: 6   Global Step: 78340   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:28:59,696-Speed 3352.32 samples/sec   Loss 5.6679   LearningRate 0.0469   Epoch: 6   Global Step: 78350   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:29:02,841-Speed 3256.87 samples/sec   Loss 5.5967   LearningRate 0.0469   Epoch: 6   Global Step: 78360   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:29:05,940-Speed 3305.18 samples/sec   Loss 5.4606   LearningRate 0.0469   Epoch: 6   Global Step: 78370   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:29:09,034-Speed 3310.03 samples/sec   Loss 5.5617   LearningRate 0.0469   Epoch: 6   Global Step: 78380   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:29:12,102-Speed 3338.84 samples/sec   Loss 5.6265   LearningRate 0.0468   Epoch: 6   Global Step: 78390   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:29:15,256-Speed 3248.26 samples/sec   Loss 5.5223   LearningRate 0.0468   Epoch: 6   Global Step: 78400   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:29:18,404-Speed 3253.81 samples/sec   Loss 5.5235   LearningRate 0.0468   Epoch: 6   Global Step: 78410   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:29:21,437-Speed 3376.87 samples/sec   Loss 5.5569   LearningRate 0.0468   Epoch: 6   Global Step: 78420   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:29:24,559-Speed 3280.84 samples/sec   Loss 5.5431   LearningRate 0.0468   Epoch: 6   Global Step: 78430   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:29:27,608-Speed 3359.84 samples/sec   Loss 5.6078   LearningRate 0.0468   Epoch: 6   Global Step: 78440   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:29:30,687-Speed 3327.57 samples/sec   Loss 5.6653   LearningRate 0.0468   Epoch: 6   Global Step: 78450   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:29:33,758-Speed 3334.91 samples/sec   Loss 5.6032   LearningRate 0.0468   Epoch: 6   Global Step: 78460   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:29:36,823-Speed 3341.54 samples/sec   Loss 5.6040   LearningRate 0.0468   Epoch: 6   Global Step: 78470   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:29:39,888-Speed 3342.64 samples/sec   Loss 5.5987   LearningRate 0.0468   Epoch: 6   Global Step: 78480   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:29:42,976-Speed 3317.20 samples/sec   Loss 5.7325   LearningRate 0.0468   Epoch: 6   Global Step: 78490   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:29:46,036-Speed 3346.97 samples/sec   Loss 5.6198   LearningRate 0.0468   Epoch: 6   Global Step: 78500   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:29:49,144-Speed 3295.90 samples/sec   Loss 5.6226   LearningRate 0.0468   Epoch: 6   Global Step: 78510   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:29:52,208-Speed 3343.45 samples/sec   Loss 5.5812   LearningRate 0.0468   Epoch: 6   Global Step: 78520   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:29:55,332-Speed 3279.54 samples/sec   Loss 5.6279   LearningRate 0.0468   Epoch: 6   Global Step: 78530   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:29:58,414-Speed 3323.64 samples/sec   Loss 5.5432   LearningRate 0.0468   Epoch: 6   Global Step: 78540   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:30:01,619-Speed 3195.18 samples/sec   Loss 5.6241   LearningRate 0.0468   Epoch: 6   Global Step: 78550   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:30:04,712-Speed 3311.43 samples/sec   Loss 5.5813   LearningRate 0.0468   Epoch: 6   Global Step: 78560   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:30:07,797-Speed 3320.83 samples/sec   Loss 5.5960   LearningRate 0.0467   Epoch: 6   Global Step: 78570   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-04-27 08:30:10,888-Speed 3314.51 samples/sec   Loss 5.5736   LearningRate 0.0467   Epoch: 6   Global Step: 78580   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:30:14,048-Speed 3241.38 samples/sec   Loss 5.5905   LearningRate 0.0467   Epoch: 6   Global Step: 78590   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:30:17,213-Speed 3235.52 samples/sec   Loss 5.5592   LearningRate 0.0467   Epoch: 6   Global Step: 78600   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:30:20,349-Speed 3266.93 samples/sec   Loss 5.5943   LearningRate 0.0467   Epoch: 6   Global Step: 78610   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:30:23,436-Speed 3317.74 samples/sec   Loss 5.7186   LearningRate 0.0467   Epoch: 6   Global Step: 78620   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:30:26,516-Speed 3326.57 samples/sec   Loss 5.6286   LearningRate 0.0467   Epoch: 6   Global Step: 78630   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:30:29,594-Speed 3327.64 samples/sec   Loss 5.5563   LearningRate 0.0467   Epoch: 6   Global Step: 78640   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:30:32,693-Speed 3304.82 samples/sec   Loss 5.6154   LearningRate 0.0467   Epoch: 6   Global Step: 78650   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:30:35,859-Speed 3235.86 samples/sec   Loss 5.5606   LearningRate 0.0467   Epoch: 6   Global Step: 78660   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:30:38,968-Speed 3294.06 samples/sec   Loss 5.5502   LearningRate 0.0467   Epoch: 6   Global Step: 78670   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:30:42,079-Speed 3292.54 samples/sec   Loss 5.6490   LearningRate 0.0467   Epoch: 6   Global Step: 78680   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:30:45,137-Speed 3350.10 samples/sec   Loss 5.5558   LearningRate 0.0467   Epoch: 6   Global Step: 78690   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:30:48,232-Speed 3310.20 samples/sec   Loss 5.5197   LearningRate 0.0467   Epoch: 6   Global Step: 78700   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:30:51,306-Speed 3331.26 samples/sec   Loss 5.6832   LearningRate 0.0467   Epoch: 6   Global Step: 78710   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:30:54,388-Speed 3324.18 samples/sec   Loss 5.5815   LearningRate 0.0467   Epoch: 6   Global Step: 78720   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:30:57,470-Speed 3324.00 samples/sec   Loss 5.6216   LearningRate 0.0467   Epoch: 6   Global Step: 78730   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:31:00,570-Speed 3304.10 samples/sec   Loss 5.5728   LearningRate 0.0467   Epoch: 6   Global Step: 78740   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:31:03,721-Speed 3249.92 samples/sec   Loss 5.6735   LearningRate 0.0466   Epoch: 6   Global Step: 78750   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:31:06,836-Speed 3288.91 samples/sec   Loss 5.6544   LearningRate 0.0466   Epoch: 6   Global Step: 78760   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:31:09,902-Speed 3340.70 samples/sec   Loss 5.6967   LearningRate 0.0466   Epoch: 6   Global Step: 78770   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:31:12,961-Speed 3349.09 samples/sec   Loss 5.6840   LearningRate 0.0466   Epoch: 6   Global Step: 78780   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:31:16,019-Speed 3349.50 samples/sec   Loss 5.7265   LearningRate 0.0466   Epoch: 6   Global Step: 78790   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:31:19,098-Speed 3326.43 samples/sec   Loss 5.6670   LearningRate 0.0466   Epoch: 6   Global Step: 78800   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:31:22,167-Speed 3338.34 samples/sec   Loss 5.5912   LearningRate 0.0466   Epoch: 6   Global Step: 78810   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:31:25,270-Speed 3301.33 samples/sec   Loss 5.5593   LearningRate 0.0466   Epoch: 6   Global Step: 78820   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:31:28,374-Speed 3299.80 samples/sec   Loss 5.6596   LearningRate 0.0466   Epoch: 6   Global Step: 78830   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:31:31,442-Speed 3338.96 samples/sec   Loss 5.6502   LearningRate 0.0466   Epoch: 6   Global Step: 78840   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-04-27 08:31:34,482-Speed 3369.80 samples/sec   Loss 5.5644   LearningRate 0.0466   Epoch: 6   Global Step: 78850   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:31:37,556-Speed 3331.85 samples/sec   Loss 5.6784   LearningRate 0.0466   Epoch: 6   Global Step: 78860   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:31:40,620-Speed 3344.04 samples/sec   Loss 5.6171   LearningRate 0.0466   Epoch: 6   Global Step: 78870   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:31:43,677-Speed 3350.30 samples/sec   Loss 5.6726   LearningRate 0.0466   Epoch: 6   Global Step: 78880   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:31:46,769-Speed 3312.75 samples/sec   Loss 5.6747   LearningRate 0.0466   Epoch: 6   Global Step: 78890   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-04-27 08:31:49,841-Speed 3334.02 samples/sec   Loss 5.6569   LearningRate 0.0466   Epoch: 6   Global Step: 78900   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:31:52,918-Speed 3329.98 samples/sec   Loss 5.5416   LearningRate 0.0466   Epoch: 6   Global Step: 78910   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:31:55,978-Speed 3346.39 samples/sec   Loss 5.6043   LearningRate 0.0466   Epoch: 6   Global Step: 78920   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:31:59,075-Speed 3308.34 samples/sec   Loss 5.6578   LearningRate 0.0465   Epoch: 6   Global Step: 78930   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:32:02,207-Speed 3270.18 samples/sec   Loss 5.6879   LearningRate 0.0465   Epoch: 6   Global Step: 78940   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:32:05,303-Speed 3308.69 samples/sec   Loss 5.6071   LearningRate 0.0465   Epoch: 6   Global Step: 78950   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:32:08,363-Speed 3347.80 samples/sec   Loss 5.7144   LearningRate 0.0465   Epoch: 6   Global Step: 78960   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:32:11,450-Speed 3317.51 samples/sec   Loss 5.6537   LearningRate 0.0465   Epoch: 6   Global Step: 78970   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:32:14,550-Speed 3304.20 samples/sec   Loss 5.6292   LearningRate 0.0465   Epoch: 6   Global Step: 78980   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:32:17,616-Speed 3341.34 samples/sec   Loss 5.7396   LearningRate 0.0465   Epoch: 6   Global Step: 78990   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:32:20,714-Speed 3306.66 samples/sec   Loss 5.5658   LearningRate 0.0465   Epoch: 6   Global Step: 79000   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:32:23,814-Speed 3303.83 samples/sec   Loss 5.5639   LearningRate 0.0465   Epoch: 6   Global Step: 79010   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:32:26,900-Speed 3319.80 samples/sec   Loss 5.6291   LearningRate 0.0465   Epoch: 6   Global Step: 79020   Fp16 Grad Scale: 8192   Required: 14 hours
Training: 2022-04-27 08:32:29,995-Speed 3308.96 samples/sec   Loss 5.7939   LearningRate 0.0465   Epoch: 6   Global Step: 79030   Fp16 Grad Scale: 8192   Required: 14 hours
Training: 2022-04-27 08:32:33,087-Speed 3313.70 samples/sec   Loss 5.5624   LearningRate 0.0465   Epoch: 6   Global Step: 79040   Fp16 Grad Scale: 8192   Required: 14 hours
Training: 2022-04-27 08:32:36,135-Speed 3360.54 samples/sec   Loss 5.7365   LearningRate 0.0465   Epoch: 6   Global Step: 79050   Fp16 Grad Scale: 8192   Required: 14 hours
Training: 2022-04-27 08:32:39,199-Speed 3342.98 samples/sec   Loss 5.6821   LearningRate 0.0465   Epoch: 6   Global Step: 79060   Fp16 Grad Scale: 8192   Required: 14 hours
Training: 2022-04-27 08:32:42,287-Speed 3317.81 samples/sec   Loss 5.5100   LearningRate 0.0465   Epoch: 6   Global Step: 79070   Fp16 Grad Scale: 8192   Required: 14 hours
Training: 2022-04-27 08:32:45,333-Speed 3362.54 samples/sec   Loss 5.6233   LearningRate 0.0465   Epoch: 6   Global Step: 79080   Fp16 Grad Scale: 8192   Required: 14 hours
Training: 2022-04-27 08:32:48,429-Speed 3308.88 samples/sec   Loss 5.6612   LearningRate 0.0465   Epoch: 6   Global Step: 79090   Fp16 Grad Scale: 8192   Required: 14 hours
Training: 2022-04-27 08:32:51,590-Speed 3240.72 samples/sec   Loss 5.7307   LearningRate 0.0465   Epoch: 6   Global Step: 79100   Fp16 Grad Scale: 8192   Required: 14 hours
Training: 2022-04-27 08:32:55,471-Speed 2638.85 samples/sec   Loss 5.7701   LearningRate 0.0465   Epoch: 6   Global Step: 79110   Fp16 Grad Scale: 8192   Required: 14 hours
Training: 2022-04-27 08:32:58,525-Speed 3353.79 samples/sec   Loss 5.6808   LearningRate 0.0464   Epoch: 6   Global Step: 79120   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:33:01,602-Speed 3329.40 samples/sec   Loss 5.7852   LearningRate 0.0464   Epoch: 6   Global Step: 79130   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:33:04,708-Speed 3297.81 samples/sec   Loss 5.6803   LearningRate 0.0464   Epoch: 6   Global Step: 79140   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:33:07,786-Speed 3328.22 samples/sec   Loss 5.6747   LearningRate 0.0464   Epoch: 6   Global Step: 79150   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:33:10,878-Speed 3312.38 samples/sec   Loss 5.7684   LearningRate 0.0464   Epoch: 6   Global Step: 79160   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:33:13,949-Speed 3335.77 samples/sec   Loss 5.6301   LearningRate 0.0464   Epoch: 6   Global Step: 79170   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:33:17,042-Speed 3312.12 samples/sec   Loss 5.7325   LearningRate 0.0464   Epoch: 6   Global Step: 79180   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:33:20,175-Speed 3269.75 samples/sec   Loss 5.5323   LearningRate 0.0464   Epoch: 6   Global Step: 79190   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:33:23,266-Speed 3313.37 samples/sec   Loss 5.6842   LearningRate 0.0464   Epoch: 6   Global Step: 79200   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:33:26,365-Speed 3305.72 samples/sec   Loss 5.6414   LearningRate 0.0464   Epoch: 6   Global Step: 79210   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:33:29,413-Speed 3360.72 samples/sec   Loss 5.6238   LearningRate 0.0464   Epoch: 6   Global Step: 79220   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:33:32,529-Speed 3287.32 samples/sec   Loss 5.5980   LearningRate 0.0464   Epoch: 6   Global Step: 79230   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:33:35,608-Speed 3326.79 samples/sec   Loss 5.7063   LearningRate 0.0464   Epoch: 6   Global Step: 79240   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:33:38,847-Speed 3162.96 samples/sec   Loss 5.6660   LearningRate 0.0464   Epoch: 6   Global Step: 79250   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:33:42,027-Speed 3220.41 samples/sec   Loss 5.6566   LearningRate 0.0464   Epoch: 6   Global Step: 79260   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:33:45,107-Speed 3326.33 samples/sec   Loss 5.7086   LearningRate 0.0464   Epoch: 6   Global Step: 79270   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:33:48,222-Speed 3287.84 samples/sec   Loss 5.7611   LearningRate 0.0464   Epoch: 6   Global Step: 79280   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:33:51,281-Speed 3348.99 samples/sec   Loss 5.6292   LearningRate 0.0464   Epoch: 6   Global Step: 79290   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:33:54,394-Speed 3290.42 samples/sec   Loss 5.6948   LearningRate 0.0463   Epoch: 6   Global Step: 79300   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:33:57,464-Speed 3336.84 samples/sec   Loss 5.6216   LearningRate 0.0463   Epoch: 6   Global Step: 79310   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:34:00,551-Speed 3318.25 samples/sec   Loss 5.5460   LearningRate 0.0463   Epoch: 6   Global Step: 79320   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:34:03,646-Speed 3309.49 samples/sec   Loss 5.7050   LearningRate 0.0463   Epoch: 6   Global Step: 79330   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:34:06,751-Speed 3299.16 samples/sec   Loss 5.6754   LearningRate 0.0463   Epoch: 6   Global Step: 79340   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:34:09,799-Speed 3360.38 samples/sec   Loss 5.7000   LearningRate 0.0463   Epoch: 6   Global Step: 79350   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:34:12,854-Speed 3353.28 samples/sec   Loss 5.7245   LearningRate 0.0463   Epoch: 6   Global Step: 79360   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:34:15,934-Speed 3325.36 samples/sec   Loss 5.7624   LearningRate 0.0463   Epoch: 6   Global Step: 79370   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:34:19,068-Speed 3268.38 samples/sec   Loss 5.6062   LearningRate 0.0463   Epoch: 6   Global Step: 79380   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:34:22,132-Speed 3343.58 samples/sec   Loss 5.6917   LearningRate 0.0463   Epoch: 6   Global Step: 79390   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:34:25,224-Speed 3312.86 samples/sec   Loss 5.7602   LearningRate 0.0463   Epoch: 6   Global Step: 79400   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:34:28,334-Speed 3292.90 samples/sec   Loss 5.6536   LearningRate 0.0463   Epoch: 6   Global Step: 79410   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:34:32,691-Speed 2350.68 samples/sec   Loss 5.6729   LearningRate 0.0463   Epoch: 6   Global Step: 79420   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:34:36,920-Speed 2422.29 samples/sec   Loss 5.6854   LearningRate 0.0463   Epoch: 6   Global Step: 79430   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:34:40,057-Speed 3265.00 samples/sec   Loss 5.6358   LearningRate 0.0463   Epoch: 6   Global Step: 79440   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:34:43,145-Speed 3317.63 samples/sec   Loss 5.7253   LearningRate 0.0463   Epoch: 6   Global Step: 79450   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:34:46,224-Speed 3326.68 samples/sec   Loss 5.7650   LearningRate 0.0463   Epoch: 6   Global Step: 79460   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:34:49,287-Speed 3344.29 samples/sec   Loss 5.7606   LearningRate 0.0463   Epoch: 6   Global Step: 79470   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:34:52,423-Speed 3266.33 samples/sec   Loss 5.6244   LearningRate 0.0462   Epoch: 6   Global Step: 79480   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:34:55,494-Speed 3335.81 samples/sec   Loss 5.6231   LearningRate 0.0462   Epoch: 6   Global Step: 79490   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:34:58,567-Speed 3333.33 samples/sec   Loss 5.6800   LearningRate 0.0462   Epoch: 6   Global Step: 79500   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:35:01,720-Speed 3248.15 samples/sec   Loss 5.7302   LearningRate 0.0462   Epoch: 6   Global Step: 79510   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:35:04,801-Speed 3325.13 samples/sec   Loss 5.5766   LearningRate 0.0462   Epoch: 6   Global Step: 79520   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:35:07,872-Speed 3335.38 samples/sec   Loss 5.7133   LearningRate 0.0462   Epoch: 6   Global Step: 79530   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:35:10,956-Speed 3321.51 samples/sec   Loss 5.6954   LearningRate 0.0462   Epoch: 6   Global Step: 79540   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:35:14,029-Speed 3333.30 samples/sec   Loss 5.7743   LearningRate 0.0462   Epoch: 6   Global Step: 79550   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:35:17,107-Speed 3327.43 samples/sec   Loss 5.6884   LearningRate 0.0462   Epoch: 6   Global Step: 79560   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:35:20,185-Speed 3328.27 samples/sec   Loss 5.6395   LearningRate 0.0462   Epoch: 6   Global Step: 79570   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:35:23,307-Speed 3280.66 samples/sec   Loss 5.8140   LearningRate 0.0462   Epoch: 6   Global Step: 79580   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:35:26,356-Speed 3359.67 samples/sec   Loss 5.6804   LearningRate 0.0462   Epoch: 6   Global Step: 79590   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:35:29,438-Speed 3323.29 samples/sec   Loss 5.7495   LearningRate 0.0462   Epoch: 6   Global Step: 79600   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:35:32,508-Speed 3336.54 samples/sec   Loss 5.6500   LearningRate 0.0462   Epoch: 6   Global Step: 79610   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:35:35,636-Speed 3274.92 samples/sec   Loss 5.7470   LearningRate 0.0462   Epoch: 6   Global Step: 79620   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:35:38,732-Speed 3308.97 samples/sec   Loss 5.6890   LearningRate 0.0462   Epoch: 6   Global Step: 79630   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:35:41,820-Speed 3317.04 samples/sec   Loss 5.6222   LearningRate 0.0462   Epoch: 6   Global Step: 79640   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:35:44,896-Speed 3329.95 samples/sec   Loss 5.7324   LearningRate 0.0462   Epoch: 6   Global Step: 79650   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:35:47,942-Speed 3363.26 samples/sec   Loss 5.6142   LearningRate 0.0461   Epoch: 6   Global Step: 79660   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:35:51,009-Speed 3339.02 samples/sec   Loss 5.7467   LearningRate 0.0461   Epoch: 6   Global Step: 79670   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:35:54,112-Speed 3300.66 samples/sec   Loss 5.7576   LearningRate 0.0461   Epoch: 6   Global Step: 79680   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:35:57,189-Speed 3329.67 samples/sec   Loss 5.6665   LearningRate 0.0461   Epoch: 6   Global Step: 79690   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:36:00,282-Speed 3311.18 samples/sec   Loss 5.6770   LearningRate 0.0461   Epoch: 6   Global Step: 79700   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:36:03,362-Speed 3326.30 samples/sec   Loss 5.7122   LearningRate 0.0461   Epoch: 6   Global Step: 79710   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:36:06,423-Speed 3346.57 samples/sec   Loss 5.6550   LearningRate 0.0461   Epoch: 6   Global Step: 79720   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:36:09,497-Speed 3331.74 samples/sec   Loss 5.7949   LearningRate 0.0461   Epoch: 6   Global Step: 79730   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:36:12,621-Speed 3279.66 samples/sec   Loss 5.6488   LearningRate 0.0461   Epoch: 6   Global Step: 79740   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:36:15,751-Speed 3272.01 samples/sec   Loss 5.6485   LearningRate 0.0461   Epoch: 6   Global Step: 79750   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:36:18,807-Speed 3352.57 samples/sec   Loss 5.7193   LearningRate 0.0461   Epoch: 6   Global Step: 79760   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:36:21,878-Speed 3335.65 samples/sec   Loss 5.7494   LearningRate 0.0461   Epoch: 6   Global Step: 79770   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:36:24,993-Speed 3287.45 samples/sec   Loss 5.8339   LearningRate 0.0461   Epoch: 6   Global Step: 79780   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:36:28,108-Speed 3288.68 samples/sec   Loss 5.8068   LearningRate 0.0461   Epoch: 6   Global Step: 79790   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:36:31,236-Speed 3274.84 samples/sec   Loss 5.6476   LearningRate 0.0461   Epoch: 6   Global Step: 79800   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:36:34,299-Speed 3344.23 samples/sec   Loss 5.5885   LearningRate 0.0461   Epoch: 6   Global Step: 79810   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:36:37,372-Speed 3332.89 samples/sec   Loss 5.6425   LearningRate 0.0461   Epoch: 6   Global Step: 79820   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:36:40,504-Speed 3270.15 samples/sec   Loss 5.6616   LearningRate 0.0461   Epoch: 6   Global Step: 79830   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:36:43,606-Speed 3302.99 samples/sec   Loss 5.6261   LearningRate 0.0461   Epoch: 6   Global Step: 79840   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:36:46,693-Speed 3318.16 samples/sec   Loss 5.6235   LearningRate 0.0460   Epoch: 6   Global Step: 79850   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:36:49,756-Speed 3344.22 samples/sec   Loss 5.6119   LearningRate 0.0460   Epoch: 6   Global Step: 79860   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:36:52,900-Speed 3257.21 samples/sec   Loss 5.7501   LearningRate 0.0460   Epoch: 6   Global Step: 79870   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:36:55,983-Speed 3322.59 samples/sec   Loss 5.7794   LearningRate 0.0460   Epoch: 6   Global Step: 79880   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:36:59,038-Speed 3353.44 samples/sec   Loss 5.6675   LearningRate 0.0460   Epoch: 6   Global Step: 79890   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:37:02,153-Speed 3288.93 samples/sec   Loss 5.7089   LearningRate 0.0460   Epoch: 6   Global Step: 79900   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:37:05,218-Speed 3342.37 samples/sec   Loss 5.6645   LearningRate 0.0460   Epoch: 6   Global Step: 79910   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:37:08,302-Speed 3320.35 samples/sec   Loss 5.7310   LearningRate 0.0460   Epoch: 6   Global Step: 79920   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:37:11,377-Speed 3330.86 samples/sec   Loss 5.7109   LearningRate 0.0460   Epoch: 6   Global Step: 79930   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:37:14,452-Speed 3332.28 samples/sec   Loss 5.8439   LearningRate 0.0460   Epoch: 6   Global Step: 79940   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:37:17,514-Speed 3345.10 samples/sec   Loss 5.7832   LearningRate 0.0460   Epoch: 6   Global Step: 79950   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:37:20,560-Speed 3362.46 samples/sec   Loss 5.6520   LearningRate 0.0460   Epoch: 6   Global Step: 79960   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:37:23,728-Speed 3233.20 samples/sec   Loss 5.7867   LearningRate 0.0460   Epoch: 6   Global Step: 79970   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:37:26,803-Speed 3331.21 samples/sec   Loss 5.6619   LearningRate 0.0460   Epoch: 6   Global Step: 79980   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:37:29,915-Speed 3291.53 samples/sec   Loss 5.6834   LearningRate 0.0460   Epoch: 6   Global Step: 79990   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:37:33,008-Speed 3312.09 samples/sec   Loss 5.7391   LearningRate 0.0460   Epoch: 6   Global Step: 80000   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:37:36,150-Speed 3259.89 samples/sec   Loss 5.6290   LearningRate 0.0460   Epoch: 6   Global Step: 80010   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:37:39,241-Speed 3313.64 samples/sec   Loss 5.6763   LearningRate 0.0460   Epoch: 6   Global Step: 80020   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:37:42,344-Speed 3300.80 samples/sec   Loss 5.6940   LearningRate 0.0459   Epoch: 6   Global Step: 80030   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:37:45,412-Speed 3339.70 samples/sec   Loss 5.6797   LearningRate 0.0459   Epoch: 6   Global Step: 80040   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:37:48,471-Speed 3348.52 samples/sec   Loss 5.7992   LearningRate 0.0459   Epoch: 6   Global Step: 80050   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:37:51,547-Speed 3329.90 samples/sec   Loss 5.7690   LearningRate 0.0459   Epoch: 6   Global Step: 80060   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:37:54,629-Speed 3323.27 samples/sec   Loss 5.7484   LearningRate 0.0459   Epoch: 6   Global Step: 80070   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:37:57,745-Speed 3287.25 samples/sec   Loss 5.7127   LearningRate 0.0459   Epoch: 6   Global Step: 80080   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:38:00,798-Speed 3355.64 samples/sec   Loss 5.7983   LearningRate 0.0459   Epoch: 6   Global Step: 80090   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:38:03,894-Speed 3308.13 samples/sec   Loss 5.6889   LearningRate 0.0459   Epoch: 6   Global Step: 80100   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:38:07,063-Speed 3232.57 samples/sec   Loss 5.7669   LearningRate 0.0459   Epoch: 6   Global Step: 80110   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:38:10,130-Speed 3339.85 samples/sec   Loss 5.7563   LearningRate 0.0459   Epoch: 6   Global Step: 80120   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:38:13,253-Speed 3280.09 samples/sec   Loss 5.6938   LearningRate 0.0459   Epoch: 6   Global Step: 80130   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:38:16,375-Speed 3280.68 samples/sec   Loss 5.7202   LearningRate 0.0459   Epoch: 6   Global Step: 80140   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:38:19,455-Speed 3325.86 samples/sec   Loss 5.7397   LearningRate 0.0459   Epoch: 6   Global Step: 80150   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:38:22,524-Speed 3337.18 samples/sec   Loss 5.7526   LearningRate 0.0459   Epoch: 6   Global Step: 80160   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:38:25,672-Speed 3254.85 samples/sec   Loss 5.7292   LearningRate 0.0459   Epoch: 6   Global Step: 80170   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:38:28,796-Speed 3278.81 samples/sec   Loss 5.7417   LearningRate 0.0459   Epoch: 6   Global Step: 80180   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:38:31,908-Speed 3290.94 samples/sec   Loss 5.8612   LearningRate 0.0459   Epoch: 6   Global Step: 80190   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:38:34,975-Speed 3340.21 samples/sec   Loss 5.6825   LearningRate 0.0459   Epoch: 6   Global Step: 80200   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:38:38,068-Speed 3311.29 samples/sec   Loss 5.7293   LearningRate 0.0458   Epoch: 6   Global Step: 80210   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:38:41,236-Speed 3233.55 samples/sec   Loss 5.7629   LearningRate 0.0458   Epoch: 6   Global Step: 80220   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:38:44,317-Speed 3325.09 samples/sec   Loss 5.7933   LearningRate 0.0458   Epoch: 6   Global Step: 80230   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:38:47,379-Speed 3345.34 samples/sec   Loss 5.7369   LearningRate 0.0458   Epoch: 6   Global Step: 80240   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:38:50,576-Speed 3203.51 samples/sec   Loss 5.7708   LearningRate 0.0458   Epoch: 6   Global Step: 80250   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:38:53,675-Speed 3306.05 samples/sec   Loss 5.7803   LearningRate 0.0458   Epoch: 6   Global Step: 80260   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:38:56,750-Speed 3330.60 samples/sec   Loss 5.7134   LearningRate 0.0458   Epoch: 6   Global Step: 80270   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:38:59,785-Speed 3375.51 samples/sec   Loss 5.7261   LearningRate 0.0458   Epoch: 6   Global Step: 80280   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:39:02,898-Speed 3289.95 samples/sec   Loss 5.8196   LearningRate 0.0458   Epoch: 6   Global Step: 80290   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:39:05,968-Speed 3337.54 samples/sec   Loss 5.7838   LearningRate 0.0458   Epoch: 6   Global Step: 80300   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:39:09,039-Speed 3334.77 samples/sec   Loss 5.6616   LearningRate 0.0458   Epoch: 6   Global Step: 80310   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:39:12,197-Speed 3243.88 samples/sec   Loss 5.8469   LearningRate 0.0458   Epoch: 6   Global Step: 80320   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:39:15,296-Speed 3305.72 samples/sec   Loss 5.8013   LearningRate 0.0458   Epoch: 6   Global Step: 80330   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:39:18,358-Speed 3344.53 samples/sec   Loss 5.7678   LearningRate 0.0458   Epoch: 6   Global Step: 80340   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:39:21,429-Speed 3336.19 samples/sec   Loss 5.7228   LearningRate 0.0458   Epoch: 6   Global Step: 80350   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:39:24,549-Speed 3282.59 samples/sec   Loss 5.8595   LearningRate 0.0458   Epoch: 6   Global Step: 80360   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:39:27,621-Speed 3334.28 samples/sec   Loss 5.7530   LearningRate 0.0458   Epoch: 6   Global Step: 80370   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:39:30,757-Speed 3266.10 samples/sec   Loss 5.7368   LearningRate 0.0458   Epoch: 6   Global Step: 80380   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:39:33,863-Speed 3298.07 samples/sec   Loss 5.7343   LearningRate 0.0458   Epoch: 6   Global Step: 80390   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:39:37,011-Speed 3254.68 samples/sec   Loss 5.9009   LearningRate 0.0457   Epoch: 6   Global Step: 80400   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:39:40,168-Speed 3244.52 samples/sec   Loss 5.6890   LearningRate 0.0457   Epoch: 6   Global Step: 80410   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:39:43,326-Speed 3243.48 samples/sec   Loss 5.7836   LearningRate 0.0457   Epoch: 6   Global Step: 80420   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:39:46,387-Speed 3346.91 samples/sec   Loss 5.7128   LearningRate 0.0457   Epoch: 6   Global Step: 80430   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:39:49,448-Speed 3346.12 samples/sec   Loss 5.6966   LearningRate 0.0457   Epoch: 6   Global Step: 80440   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:39:52,506-Speed 3349.11 samples/sec   Loss 5.7573   LearningRate 0.0457   Epoch: 6   Global Step: 80450   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:39:55,643-Speed 3265.63 samples/sec   Loss 5.7164   LearningRate 0.0457   Epoch: 6   Global Step: 80460   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:39:58,696-Speed 3354.94 samples/sec   Loss 5.7664   LearningRate 0.0457   Epoch: 6   Global Step: 80470   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:40:01,838-Speed 3260.50 samples/sec   Loss 5.6760   LearningRate 0.0457   Epoch: 6   Global Step: 80480   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 08:40:04,937-Speed 3305.27 samples/sec   Loss 5.6964   LearningRate 0.0457   Epoch: 6   Global Step: 80490   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:40:08,019-Speed 3323.45 samples/sec   Loss 5.6695   LearningRate 0.0457   Epoch: 6   Global Step: 80500   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:40:11,090-Speed 3335.25 samples/sec   Loss 5.8003   LearningRate 0.0457   Epoch: 6   Global Step: 80510   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:40:14,238-Speed 3254.78 samples/sec   Loss 5.8241   LearningRate 0.0457   Epoch: 6   Global Step: 80520   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:40:17,359-Speed 3281.60 samples/sec   Loss 5.7967   LearningRate 0.0457   Epoch: 6   Global Step: 80530   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:40:20,428-Speed 3337.63 samples/sec   Loss 5.8124   LearningRate 0.0457   Epoch: 6   Global Step: 80540   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:40:23,461-Speed 3377.33 samples/sec   Loss 5.6802   LearningRate 0.0457   Epoch: 6   Global Step: 80550   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:40:26,561-Speed 3304.89 samples/sec   Loss 5.8234   LearningRate 0.0457   Epoch: 6   Global Step: 80560   Fp16 Grad Scale: 8192   Required: 14 hours
Training: 2022-04-27 08:40:29,644-Speed 3322.56 samples/sec   Loss 5.7837   LearningRate 0.0457   Epoch: 6   Global Step: 80570   Fp16 Grad Scale: 8192   Required: 14 hours
Training: 2022-04-27 08:40:32,736-Speed 3312.23 samples/sec   Loss 5.7451   LearningRate 0.0456   Epoch: 6   Global Step: 80580   Fp16 Grad Scale: 8192   Required: 14 hours
Training: 2022-04-27 08:40:35,795-Speed 3349.16 samples/sec   Loss 5.9305   LearningRate 0.0456   Epoch: 6   Global Step: 80590   Fp16 Grad Scale: 8192   Required: 14 hours
Training: 2022-04-27 08:40:38,870-Speed 3330.31 samples/sec   Loss 5.7512   LearningRate 0.0456   Epoch: 6   Global Step: 80600   Fp16 Grad Scale: 8192   Required: 14 hours
Training: 2022-04-27 08:40:41,939-Speed 3337.86 samples/sec   Loss 5.6604   LearningRate 0.0456   Epoch: 6   Global Step: 80610   Fp16 Grad Scale: 8192   Required: 14 hours
Training: 2022-04-27 08:40:45,002-Speed 3345.13 samples/sec   Loss 5.8019   LearningRate 0.0456   Epoch: 6   Global Step: 80620   Fp16 Grad Scale: 8192   Required: 14 hours
Training: 2022-04-27 08:40:48,081-Speed 3325.92 samples/sec   Loss 5.7531   LearningRate 0.0456   Epoch: 6   Global Step: 80630   Fp16 Grad Scale: 8192   Required: 14 hours
Training: 2022-04-27 08:40:51,221-Speed 3262.36 samples/sec   Loss 5.7540   LearningRate 0.0456   Epoch: 6   Global Step: 80640   Fp16 Grad Scale: 8192   Required: 14 hours
Training: 2022-04-27 08:40:54,347-Speed 3277.45 samples/sec   Loss 5.7478   LearningRate 0.0456   Epoch: 6   Global Step: 80650   Fp16 Grad Scale: 8192   Required: 14 hours
Training: 2022-04-27 08:40:57,409-Speed 3344.55 samples/sec   Loss 5.7408   LearningRate 0.0456   Epoch: 6   Global Step: 80660   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:41:00,466-Speed 3350.20 samples/sec   Loss 5.6583   LearningRate 0.0456   Epoch: 6   Global Step: 80670   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:41:03,587-Speed 3282.58 samples/sec   Loss 5.6862   LearningRate 0.0456   Epoch: 6   Global Step: 80680   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:41:06,667-Speed 3325.58 samples/sec   Loss 5.6594   LearningRate 0.0456   Epoch: 6   Global Step: 80690   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:41:09,768-Speed 3303.28 samples/sec   Loss 5.6504   LearningRate 0.0456   Epoch: 6   Global Step: 80700   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:41:12,871-Speed 3300.71 samples/sec   Loss 5.7564   LearningRate 0.0456   Epoch: 6   Global Step: 80710   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:41:16,038-Speed 3235.06 samples/sec   Loss 5.8389   LearningRate 0.0456   Epoch: 6   Global Step: 80720   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:41:19,165-Speed 3275.11 samples/sec   Loss 5.7710   LearningRate 0.0456   Epoch: 6   Global Step: 80730   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:41:22,237-Speed 3334.62 samples/sec   Loss 5.7043   LearningRate 0.0456   Epoch: 6   Global Step: 80740   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:41:25,433-Speed 3205.03 samples/sec   Loss 5.9026   LearningRate 0.0456   Epoch: 6   Global Step: 80750   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:41:28,515-Speed 3323.82 samples/sec   Loss 5.7974   LearningRate 0.0455   Epoch: 6   Global Step: 80760   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:41:31,644-Speed 3273.67 samples/sec   Loss 5.7782   LearningRate 0.0455   Epoch: 6   Global Step: 80770   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:41:34,722-Speed 3327.56 samples/sec   Loss 5.7794   LearningRate 0.0455   Epoch: 6   Global Step: 80780   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:41:37,801-Speed 3327.86 samples/sec   Loss 5.7887   LearningRate 0.0455   Epoch: 6   Global Step: 80790   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:41:40,882-Speed 3325.05 samples/sec   Loss 5.7628   LearningRate 0.0455   Epoch: 6   Global Step: 80800   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:41:43,953-Speed 3334.26 samples/sec   Loss 5.7575   LearningRate 0.0455   Epoch: 6   Global Step: 80810   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:41:47,098-Speed 3257.55 samples/sec   Loss 5.7628   LearningRate 0.0455   Epoch: 6   Global Step: 80820   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:41:50,218-Speed 3283.06 samples/sec   Loss 5.8736   LearningRate 0.0455   Epoch: 6   Global Step: 80830   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:41:53,400-Speed 3218.92 samples/sec   Loss 5.7356   LearningRate 0.0455   Epoch: 6   Global Step: 80840   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:41:56,483-Speed 3322.92 samples/sec   Loss 5.7147   LearningRate 0.0455   Epoch: 6   Global Step: 80850   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:41:59,561-Speed 3327.95 samples/sec   Loss 5.7495   LearningRate 0.0455   Epoch: 6   Global Step: 80860   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:42:02,638-Speed 3328.54 samples/sec   Loss 5.6467   LearningRate 0.0455   Epoch: 6   Global Step: 80870   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:42:05,698-Speed 3347.72 samples/sec   Loss 5.5813   LearningRate 0.0455   Epoch: 6   Global Step: 80880   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:42:08,748-Speed 3358.61 samples/sec   Loss 5.8311   LearningRate 0.0455   Epoch: 6   Global Step: 80890   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:42:11,879-Speed 3271.57 samples/sec   Loss 5.7636   LearningRate 0.0455   Epoch: 6   Global Step: 80900   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:42:15,025-Speed 3255.98 samples/sec   Loss 5.8243   LearningRate 0.0455   Epoch: 6   Global Step: 80910   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:42:18,176-Speed 3250.72 samples/sec   Loss 5.8053   LearningRate 0.0455   Epoch: 6   Global Step: 80920   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:42:21,258-Speed 3322.98 samples/sec   Loss 5.7784   LearningRate 0.0455   Epoch: 6   Global Step: 80930   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:42:24,349-Speed 3314.85 samples/sec   Loss 5.7745   LearningRate 0.0455   Epoch: 6   Global Step: 80940   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:42:27,408-Speed 3348.52 samples/sec   Loss 5.7906   LearningRate 0.0454   Epoch: 6   Global Step: 80950   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:42:30,544-Speed 3266.05 samples/sec   Loss 5.7290   LearningRate 0.0454   Epoch: 6   Global Step: 80960   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:42:33,661-Speed 3285.54 samples/sec   Loss 5.8477   LearningRate 0.0454   Epoch: 6   Global Step: 80970   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:42:36,763-Speed 3302.92 samples/sec   Loss 5.8760   LearningRate 0.0454   Epoch: 6   Global Step: 80980   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:42:39,929-Speed 3235.27 samples/sec   Loss 5.7688   LearningRate 0.0454   Epoch: 6   Global Step: 80990   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:42:43,123-Speed 3206.22 samples/sec   Loss 5.7160   LearningRate 0.0454   Epoch: 6   Global Step: 81000   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:42:46,216-Speed 3312.46 samples/sec   Loss 5.7101   LearningRate 0.0454   Epoch: 6   Global Step: 81010   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:42:49,385-Speed 3231.74 samples/sec   Loss 5.8248   LearningRate 0.0454   Epoch: 6   Global Step: 81020   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:42:52,574-Speed 3212.36 samples/sec   Loss 5.7609   LearningRate 0.0454   Epoch: 6   Global Step: 81030   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:42:55,662-Speed 3317.66 samples/sec   Loss 5.7772   LearningRate 0.0454   Epoch: 6   Global Step: 81040   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:42:58,745-Speed 3321.88 samples/sec   Loss 5.9235   LearningRate 0.0454   Epoch: 6   Global Step: 81050   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:43:01,872-Speed 3275.94 samples/sec   Loss 5.7161   LearningRate 0.0454   Epoch: 6   Global Step: 81060   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 08:43:04,980-Speed 3296.03 samples/sec   Loss 5.8775   LearningRate 0.0454   Epoch: 6   Global Step: 81070   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:43:08,075-Speed 3309.80 samples/sec   Loss 5.7085   LearningRate 0.0454   Epoch: 6   Global Step: 81080   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:43:11,143-Speed 3338.67 samples/sec   Loss 5.8521   LearningRate 0.0454   Epoch: 6   Global Step: 81090   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:43:14,250-Speed 3296.29 samples/sec   Loss 5.7816   LearningRate 0.0454   Epoch: 6   Global Step: 81100   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:43:17,389-Speed 3263.71 samples/sec   Loss 5.8076   LearningRate 0.0454   Epoch: 6   Global Step: 81110   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:43:20,445-Speed 3351.75 samples/sec   Loss 5.8608   LearningRate 0.0454   Epoch: 6   Global Step: 81120   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:43:23,538-Speed 3311.47 samples/sec   Loss 5.7840   LearningRate 0.0453   Epoch: 6   Global Step: 81130   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:43:26,607-Speed 3337.40 samples/sec   Loss 5.6938   LearningRate 0.0453   Epoch: 6   Global Step: 81140   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:43:29,822-Speed 3185.74 samples/sec   Loss 5.6285   LearningRate 0.0453   Epoch: 6   Global Step: 81150   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:43:32,925-Speed 3301.43 samples/sec   Loss 5.7772   LearningRate 0.0453   Epoch: 6   Global Step: 81160   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:43:36,070-Speed 3257.12 samples/sec   Loss 5.7540   LearningRate 0.0453   Epoch: 6   Global Step: 81170   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:43:39,203-Speed 3269.60 samples/sec   Loss 5.7439   LearningRate 0.0453   Epoch: 6   Global Step: 81180   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:43:42,331-Speed 3274.56 samples/sec   Loss 5.7774   LearningRate 0.0453   Epoch: 6   Global Step: 81190   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:43:45,414-Speed 3322.15 samples/sec   Loss 5.8375   LearningRate 0.0453   Epoch: 6   Global Step: 81200   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:43:48,495-Speed 3324.59 samples/sec   Loss 5.8886   LearningRate 0.0453   Epoch: 6   Global Step: 81210   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:43:51,602-Speed 3297.24 samples/sec   Loss 5.7786   LearningRate 0.0453   Epoch: 6   Global Step: 81220   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:43:54,726-Speed 3279.26 samples/sec   Loss 5.8951   LearningRate 0.0453   Epoch: 6   Global Step: 81230   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:43:57,831-Speed 3298.04 samples/sec   Loss 5.7889   LearningRate 0.0453   Epoch: 6   Global Step: 81240   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:44:00,912-Speed 3324.84 samples/sec   Loss 5.7453   LearningRate 0.0453   Epoch: 6   Global Step: 81250   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:44:03,957-Speed 3364.67 samples/sec   Loss 5.7977   LearningRate 0.0453   Epoch: 6   Global Step: 81260   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:44:07,095-Speed 3263.88 samples/sec   Loss 5.8486   LearningRate 0.0453   Epoch: 6   Global Step: 81270   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:44:10,160-Speed 3342.44 samples/sec   Loss 5.7974   LearningRate 0.0453   Epoch: 6   Global Step: 81280   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:44:13,297-Speed 3265.28 samples/sec   Loss 5.6878   LearningRate 0.0453   Epoch: 6   Global Step: 81290   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:44:16,491-Speed 3207.13 samples/sec   Loss 5.7781   LearningRate 0.0453   Epoch: 6   Global Step: 81300   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:44:19,572-Speed 3324.67 samples/sec   Loss 5.6598   LearningRate 0.0453   Epoch: 6   Global Step: 81310   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:44:22,627-Speed 3352.71 samples/sec   Loss 5.8162   LearningRate 0.0452   Epoch: 6   Global Step: 81320   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:44:25,716-Speed 3316.59 samples/sec   Loss 5.7815   LearningRate 0.0452   Epoch: 6   Global Step: 81330   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:44:28,824-Speed 3295.53 samples/sec   Loss 5.8364   LearningRate 0.0452   Epoch: 6   Global Step: 81340   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:44:31,885-Speed 3346.33 samples/sec   Loss 5.8118   LearningRate 0.0452   Epoch: 6   Global Step: 81350   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:44:34,964-Speed 3326.40 samples/sec   Loss 5.7722   LearningRate 0.0452   Epoch: 6   Global Step: 81360   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:44:38,083-Speed 3284.41 samples/sec   Loss 5.7888   LearningRate 0.0452   Epoch: 6   Global Step: 81370   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:44:41,229-Speed 3255.45 samples/sec   Loss 5.7876   LearningRate 0.0452   Epoch: 6   Global Step: 81380   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:44:44,331-Speed 3302.79 samples/sec   Loss 5.8830   LearningRate 0.0452   Epoch: 6   Global Step: 81390   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:44:47,459-Speed 3274.29 samples/sec   Loss 5.8546   LearningRate 0.0452   Epoch: 6   Global Step: 81400   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:44:50,568-Speed 3295.18 samples/sec   Loss 5.7080   LearningRate 0.0452   Epoch: 6   Global Step: 81410   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:44:53,624-Speed 3351.53 samples/sec   Loss 5.8878   LearningRate 0.0452   Epoch: 6   Global Step: 81420   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:44:56,714-Speed 3315.31 samples/sec   Loss 5.8247   LearningRate 0.0452   Epoch: 6   Global Step: 81430   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:44:59,852-Speed 3264.08 samples/sec   Loss 5.8606   LearningRate 0.0452   Epoch: 6   Global Step: 81440   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:45:02,933-Speed 3325.13 samples/sec   Loss 5.7354   LearningRate 0.0452   Epoch: 6   Global Step: 81450   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:45:06,049-Speed 3286.83 samples/sec   Loss 5.7807   LearningRate 0.0452   Epoch: 6   Global Step: 81460   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:45:09,118-Speed 3337.68 samples/sec   Loss 5.7470   LearningRate 0.0452   Epoch: 6   Global Step: 81470   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:45:12,173-Speed 3352.58 samples/sec   Loss 5.8233   LearningRate 0.0452   Epoch: 6   Global Step: 81480   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:45:15,262-Speed 3316.88 samples/sec   Loss 5.7801   LearningRate 0.0452   Epoch: 6   Global Step: 81490   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:45:18,314-Speed 3355.71 samples/sec   Loss 5.7194   LearningRate 0.0451   Epoch: 6   Global Step: 81500   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:45:21,387-Speed 3333.80 samples/sec   Loss 5.7496   LearningRate 0.0451   Epoch: 6   Global Step: 81510   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:45:24,480-Speed 3311.40 samples/sec   Loss 5.7973   LearningRate 0.0451   Epoch: 6   Global Step: 81520   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:45:27,580-Speed 3304.74 samples/sec   Loss 5.6993   LearningRate 0.0451   Epoch: 6   Global Step: 81530   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:45:30,707-Speed 3275.72 samples/sec   Loss 5.7618   LearningRate 0.0451   Epoch: 6   Global Step: 81540   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:45:33,838-Speed 3271.16 samples/sec   Loss 5.7960   LearningRate 0.0451   Epoch: 6   Global Step: 81550   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:45:36,946-Speed 3296.21 samples/sec   Loss 5.8473   LearningRate 0.0451   Epoch: 6   Global Step: 81560   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:45:40,113-Speed 3233.49 samples/sec   Loss 5.7771   LearningRate 0.0451   Epoch: 6   Global Step: 81570   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:45:43,193-Speed 3326.47 samples/sec   Loss 5.8259   LearningRate 0.0451   Epoch: 6   Global Step: 81580   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:45:46,323-Speed 3271.83 samples/sec   Loss 5.8194   LearningRate 0.0451   Epoch: 6   Global Step: 81590   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:45:49,524-Speed 3199.71 samples/sec   Loss 5.8521   LearningRate 0.0451   Epoch: 6   Global Step: 81600   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:45:52,657-Speed 3269.35 samples/sec   Loss 5.8378   LearningRate 0.0451   Epoch: 6   Global Step: 81610   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:45:55,718-Speed 3346.63 samples/sec   Loss 5.8337   LearningRate 0.0451   Epoch: 6   Global Step: 81620   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:45:58,871-Speed 3249.37 samples/sec   Loss 5.7351   LearningRate 0.0451   Epoch: 6   Global Step: 81630   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:46:02,017-Speed 3254.99 samples/sec   Loss 5.7938   LearningRate 0.0451   Epoch: 6   Global Step: 81640   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:46:05,158-Speed 3261.87 samples/sec   Loss 5.7414   LearningRate 0.0451   Epoch: 6   Global Step: 81650   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:46:08,271-Speed 3290.93 samples/sec   Loss 5.7307   LearningRate 0.0451   Epoch: 6   Global Step: 81660   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:46:11,346-Speed 3330.48 samples/sec   Loss 5.8595   LearningRate 0.0451   Epoch: 6   Global Step: 81670   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:46:14,454-Speed 3295.62 samples/sec   Loss 5.7524   LearningRate 0.0451   Epoch: 6   Global Step: 81680   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:46:17,540-Speed 3320.05 samples/sec   Loss 5.7594   LearningRate 0.0450   Epoch: 6   Global Step: 81690   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:46:20,591-Speed 3357.06 samples/sec   Loss 5.8660   LearningRate 0.0450   Epoch: 6   Global Step: 81700   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:46:23,674-Speed 3322.52 samples/sec   Loss 5.8181   LearningRate 0.0450   Epoch: 6   Global Step: 81710   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:46:26,793-Speed 3283.78 samples/sec   Loss 5.7695   LearningRate 0.0450   Epoch: 6   Global Step: 81720   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:46:29,936-Speed 3258.69 samples/sec   Loss 5.7786   LearningRate 0.0450   Epoch: 6   Global Step: 81730   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:46:33,037-Speed 3303.80 samples/sec   Loss 5.8558   LearningRate 0.0450   Epoch: 6   Global Step: 81740   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:46:36,115-Speed 3327.99 samples/sec   Loss 5.8015   LearningRate 0.0450   Epoch: 6   Global Step: 81750   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:46:39,207-Speed 3312.70 samples/sec   Loss 5.7903   LearningRate 0.0450   Epoch: 6   Global Step: 81760   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:46:42,321-Speed 3289.45 samples/sec   Loss 5.7643   LearningRate 0.0450   Epoch: 6   Global Step: 81770   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:46:45,397-Speed 3329.08 samples/sec   Loss 5.9200   LearningRate 0.0450   Epoch: 6   Global Step: 81780   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:46:48,485-Speed 3317.38 samples/sec   Loss 5.7750   LearningRate 0.0450   Epoch: 6   Global Step: 81790   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:46:51,591-Speed 3298.00 samples/sec   Loss 5.8696   LearningRate 0.0450   Epoch: 6   Global Step: 81800   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:46:54,656-Speed 3341.57 samples/sec   Loss 5.8000   LearningRate 0.0450   Epoch: 6   Global Step: 81810   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:46:57,706-Speed 3359.50 samples/sec   Loss 5.7008   LearningRate 0.0450   Epoch: 6   Global Step: 81820   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:47:00,771-Speed 3341.78 samples/sec   Loss 5.8897   LearningRate 0.0450   Epoch: 6   Global Step: 81830   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:47:03,916-Speed 3256.89 samples/sec   Loss 5.7808   LearningRate 0.0450   Epoch: 6   Global Step: 81840   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:47:07,040-Speed 3279.15 samples/sec   Loss 5.8727   LearningRate 0.0450   Epoch: 6   Global Step: 81850   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:47:10,114-Speed 3332.58 samples/sec   Loss 5.6585   LearningRate 0.0450   Epoch: 6   Global Step: 81860   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:47:13,215-Speed 3302.99 samples/sec   Loss 5.7344   LearningRate 0.0449   Epoch: 6   Global Step: 81870   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:47:16,274-Speed 3348.75 samples/sec   Loss 5.8298   LearningRate 0.0449   Epoch: 6   Global Step: 81880   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:47:19,377-Speed 3300.62 samples/sec   Loss 5.8251   LearningRate 0.0449   Epoch: 6   Global Step: 81890   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:47:22,482-Speed 3299.62 samples/sec   Loss 5.8413   LearningRate 0.0449   Epoch: 6   Global Step: 81900   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:47:25,564-Speed 3323.68 samples/sec   Loss 5.8083   LearningRate 0.0449   Epoch: 6   Global Step: 81910   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:47:28,760-Speed 3204.38 samples/sec   Loss 5.8537   LearningRate 0.0449   Epoch: 6   Global Step: 81920   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:47:31,850-Speed 3314.96 samples/sec   Loss 5.8991   LearningRate 0.0449   Epoch: 6   Global Step: 81930   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:47:34,956-Speed 3298.88 samples/sec   Loss 5.9025   LearningRate 0.0449   Epoch: 6   Global Step: 81940   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:47:38,103-Speed 3254.84 samples/sec   Loss 5.7021   LearningRate 0.0449   Epoch: 6   Global Step: 81950   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:47:41,255-Speed 3249.73 samples/sec   Loss 5.6870   LearningRate 0.0449   Epoch: 6   Global Step: 81960   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:47:44,393-Speed 3263.38 samples/sec   Loss 5.8189   LearningRate 0.0449   Epoch: 6   Global Step: 81970   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:47:47,484-Speed 3315.20 samples/sec   Loss 5.7964   LearningRate 0.0449   Epoch: 6   Global Step: 81980   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:47:50,552-Speed 3338.48 samples/sec   Loss 5.8714   LearningRate 0.0449   Epoch: 6   Global Step: 81990   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:47:53,655-Speed 3301.35 samples/sec   Loss 5.7357   LearningRate 0.0449   Epoch: 6   Global Step: 82000   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:47:56,749-Speed 3310.64 samples/sec   Loss 5.7511   LearningRate 0.0449   Epoch: 6   Global Step: 82010   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:47:59,835-Speed 3318.78 samples/sec   Loss 5.7761   LearningRate 0.0449   Epoch: 6   Global Step: 82020   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:48:02,926-Speed 3313.82 samples/sec   Loss 5.8361   LearningRate 0.0449   Epoch: 6   Global Step: 82030   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 08:48:06,064-Speed 3264.50 samples/sec   Loss 5.7670   LearningRate 0.0449   Epoch: 6   Global Step: 82040   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 08:48:09,121-Speed 3350.68 samples/sec   Loss 5.7881   LearningRate 0.0449   Epoch: 6   Global Step: 82050   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:48:12,244-Speed 3279.94 samples/sec   Loss 5.8322   LearningRate 0.0448   Epoch: 6   Global Step: 82060   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:48:15,391-Speed 3255.10 samples/sec   Loss 5.7794   LearningRate 0.0448   Epoch: 6   Global Step: 82070   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:48:18,627-Speed 3165.03 samples/sec   Loss 5.8126   LearningRate 0.0448   Epoch: 6   Global Step: 82080   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:48:21,700-Speed 3333.44 samples/sec   Loss 5.7671   LearningRate 0.0448   Epoch: 6   Global Step: 82090   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:48:24,822-Speed 3281.17 samples/sec   Loss 5.8004   LearningRate 0.0448   Epoch: 6   Global Step: 82100   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:48:27,886-Speed 3343.05 samples/sec   Loss 5.8931   LearningRate 0.0448   Epoch: 6   Global Step: 82110   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:48:30,962-Speed 3330.25 samples/sec   Loss 5.8016   LearningRate 0.0448   Epoch: 6   Global Step: 82120   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:48:34,032-Speed 3336.35 samples/sec   Loss 5.7691   LearningRate 0.0448   Epoch: 6   Global Step: 82130   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:48:37,096-Speed 3342.70 samples/sec   Loss 5.9062   LearningRate 0.0448   Epoch: 6   Global Step: 82140   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:48:40,170-Speed 3332.47 samples/sec   Loss 5.7576   LearningRate 0.0448   Epoch: 6   Global Step: 82150   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:48:43,251-Speed 3324.63 samples/sec   Loss 5.8247   LearningRate 0.0448   Epoch: 6   Global Step: 82160   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:48:46,328-Speed 3328.95 samples/sec   Loss 5.7121   LearningRate 0.0448   Epoch: 6   Global Step: 82170   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:48:49,412-Speed 3321.85 samples/sec   Loss 5.7580   LearningRate 0.0448   Epoch: 6   Global Step: 82180   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:48:52,538-Speed 3276.21 samples/sec   Loss 5.7383   LearningRate 0.0448   Epoch: 6   Global Step: 82190   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:48:55,592-Speed 3354.36 samples/sec   Loss 5.8347   LearningRate 0.0448   Epoch: 6   Global Step: 82200   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:48:58,677-Speed 3319.97 samples/sec   Loss 5.7671   LearningRate 0.0448   Epoch: 6   Global Step: 82210   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:49:01,849-Speed 3229.25 samples/sec   Loss 5.8020   LearningRate 0.0448   Epoch: 6   Global Step: 82220   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:49:04,981-Speed 3270.43 samples/sec   Loss 5.7119   LearningRate 0.0448   Epoch: 6   Global Step: 82230   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:49:08,167-Speed 3215.24 samples/sec   Loss 5.8917   LearningRate 0.0447   Epoch: 6   Global Step: 82240   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:49:11,278-Speed 3293.10 samples/sec   Loss 5.7949   LearningRate 0.0447   Epoch: 6   Global Step: 82250   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:49:14,372-Speed 3310.70 samples/sec   Loss 5.7877   LearningRate 0.0447   Epoch: 6   Global Step: 82260   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:49:17,580-Speed 3193.40 samples/sec   Loss 5.7368   LearningRate 0.0447   Epoch: 6   Global Step: 82270   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:49:20,683-Speed 3301.21 samples/sec   Loss 5.9164   LearningRate 0.0447   Epoch: 6   Global Step: 82280   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:49:23,766-Speed 3321.94 samples/sec   Loss 5.9065   LearningRate 0.0447   Epoch: 6   Global Step: 82290   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:49:26,861-Speed 3309.87 samples/sec   Loss 5.7811   LearningRate 0.0447   Epoch: 6   Global Step: 82300   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:49:29,960-Speed 3305.12 samples/sec   Loss 5.8515   LearningRate 0.0447   Epoch: 6   Global Step: 82310   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:49:33,026-Speed 3340.30 samples/sec   Loss 5.7958   LearningRate 0.0447   Epoch: 6   Global Step: 82320   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:49:36,252-Speed 3175.72 samples/sec   Loss 5.7881   LearningRate 0.0447   Epoch: 6   Global Step: 82330   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:49:39,388-Speed 3266.21 samples/sec   Loss 5.8809   LearningRate 0.0447   Epoch: 6   Global Step: 82340   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:49:42,510-Speed 3281.01 samples/sec   Loss 5.7645   LearningRate 0.0447   Epoch: 6   Global Step: 82350   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:49:45,582-Speed 3334.59 samples/sec   Loss 5.8593   LearningRate 0.0447   Epoch: 6   Global Step: 82360   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:49:48,760-Speed 3223.42 samples/sec   Loss 5.6844   LearningRate 0.0447   Epoch: 6   Global Step: 82370   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:49:51,859-Speed 3304.66 samples/sec   Loss 5.8954   LearningRate 0.0447   Epoch: 6   Global Step: 82380   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:49:54,946-Speed 3318.00 samples/sec   Loss 5.7286   LearningRate 0.0447   Epoch: 6   Global Step: 82390   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:49:58,036-Speed 3315.77 samples/sec   Loss 5.8767   LearningRate 0.0447   Epoch: 6   Global Step: 82400   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:50:01,169-Speed 3270.51 samples/sec   Loss 5.8322   LearningRate 0.0447   Epoch: 6   Global Step: 82410   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:50:04,316-Speed 3255.46 samples/sec   Loss 5.7163   LearningRate 0.0447   Epoch: 6   Global Step: 82420   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:50:07,427-Speed 3292.11 samples/sec   Loss 5.6920   LearningRate 0.0446   Epoch: 6   Global Step: 82430   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:50:10,486-Speed 3349.15 samples/sec   Loss 5.7734   LearningRate 0.0446   Epoch: 6   Global Step: 82440   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:50:13,591-Speed 3298.14 samples/sec   Loss 5.7343   LearningRate 0.0446   Epoch: 6   Global Step: 82450   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:50:16,737-Speed 3256.82 samples/sec   Loss 5.7171   LearningRate 0.0446   Epoch: 6   Global Step: 82460   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:50:19,909-Speed 3228.82 samples/sec   Loss 5.8513   LearningRate 0.0446   Epoch: 6   Global Step: 82470   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:50:23,033-Speed 3279.00 samples/sec   Loss 5.7224   LearningRate 0.0446   Epoch: 6   Global Step: 82480   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:50:26,167-Speed 3268.10 samples/sec   Loss 5.8443   LearningRate 0.0446   Epoch: 6   Global Step: 82490   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:50:29,241-Speed 3332.22 samples/sec   Loss 5.7516   LearningRate 0.0446   Epoch: 6   Global Step: 82500   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:50:32,313-Speed 3333.64 samples/sec   Loss 5.8361   LearningRate 0.0446   Epoch: 6   Global Step: 82510   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:50:35,551-Speed 3164.40 samples/sec   Loss 5.9624   LearningRate 0.0446   Epoch: 6   Global Step: 82520   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:50:38,698-Speed 3254.24 samples/sec   Loss 5.8449   LearningRate 0.0446   Epoch: 6   Global Step: 82530   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:50:41,868-Speed 3232.16 samples/sec   Loss 5.8964   LearningRate 0.0446   Epoch: 6   Global Step: 82540   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:50:44,979-Speed 3292.43 samples/sec   Loss 5.9307   LearningRate 0.0446   Epoch: 6   Global Step: 82550   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:50:48,105-Speed 3276.99 samples/sec   Loss 5.8318   LearningRate 0.0446   Epoch: 6   Global Step: 82560   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:50:51,232-Speed 3274.78 samples/sec   Loss 5.7240   LearningRate 0.0446   Epoch: 6   Global Step: 82570   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:50:54,380-Speed 3254.33 samples/sec   Loss 5.7614   LearningRate 0.0446   Epoch: 6   Global Step: 82580   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:50:57,462-Speed 3323.57 samples/sec   Loss 5.7721   LearningRate 0.0446   Epoch: 6   Global Step: 82590   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:51:00,554-Speed 3314.27 samples/sec   Loss 5.7651   LearningRate 0.0446   Epoch: 6   Global Step: 82600   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:51:03,778-Speed 3176.86 samples/sec   Loss 5.7643   LearningRate 0.0446   Epoch: 6   Global Step: 82610   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:51:06,883-Speed 3298.93 samples/sec   Loss 5.8287   LearningRate 0.0445   Epoch: 6   Global Step: 82620   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:51:09,955-Speed 3334.74 samples/sec   Loss 5.8110   LearningRate 0.0445   Epoch: 6   Global Step: 82630   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:51:13,046-Speed 3313.68 samples/sec   Loss 5.7547   LearningRate 0.0445   Epoch: 6   Global Step: 82640   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:51:16,166-Speed 3283.19 samples/sec   Loss 5.7328   LearningRate 0.0445   Epoch: 6   Global Step: 82650   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:51:19,231-Speed 3342.32 samples/sec   Loss 5.6667   LearningRate 0.0445   Epoch: 6   Global Step: 82660   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:51:22,305-Speed 3332.45 samples/sec   Loss 5.6682   LearningRate 0.0445   Epoch: 6   Global Step: 82670   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:51:25,388-Speed 3321.86 samples/sec   Loss 5.8565   LearningRate 0.0445   Epoch: 6   Global Step: 82680   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:51:28,579-Speed 3210.45 samples/sec   Loss 5.8066   LearningRate 0.0445   Epoch: 6   Global Step: 82690   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:51:31,713-Speed 3268.42 samples/sec   Loss 5.8188   LearningRate 0.0445   Epoch: 6   Global Step: 82700   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:51:34,804-Speed 3314.15 samples/sec   Loss 5.8442   LearningRate 0.0445   Epoch: 6   Global Step: 82710   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:51:37,906-Speed 3301.81 samples/sec   Loss 5.8001   LearningRate 0.0445   Epoch: 6   Global Step: 82720   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:51:41,007-Speed 3304.06 samples/sec   Loss 5.8538   LearningRate 0.0445   Epoch: 6   Global Step: 82730   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:51:44,084-Speed 3328.00 samples/sec   Loss 5.7473   LearningRate 0.0445   Epoch: 6   Global Step: 82740   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:51:47,208-Speed 3279.89 samples/sec   Loss 5.7700   LearningRate 0.0445   Epoch: 6   Global Step: 82750   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:51:50,308-Speed 3304.12 samples/sec   Loss 5.8124   LearningRate 0.0445   Epoch: 6   Global Step: 82760   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:51:53,392-Speed 3321.40 samples/sec   Loss 5.7176   LearningRate 0.0445   Epoch: 6   Global Step: 82770   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:51:56,485-Speed 3310.95 samples/sec   Loss 5.8333   LearningRate 0.0445   Epoch: 6   Global Step: 82780   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:51:59,551-Speed 3341.09 samples/sec   Loss 5.8673   LearningRate 0.0445   Epoch: 6   Global Step: 82790   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:52:02,659-Speed 3296.02 samples/sec   Loss 5.7572   LearningRate 0.0444   Epoch: 6   Global Step: 82800   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:52:05,785-Speed 3276.48 samples/sec   Loss 5.8611   LearningRate 0.0444   Epoch: 6   Global Step: 82810   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:52:08,840-Speed 3353.20 samples/sec   Loss 5.7669   LearningRate 0.0444   Epoch: 6   Global Step: 82820   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:52:11,972-Speed 3270.05 samples/sec   Loss 5.7432   LearningRate 0.0444   Epoch: 6   Global Step: 82830   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:52:15,053-Speed 3325.65 samples/sec   Loss 5.6576   LearningRate 0.0444   Epoch: 6   Global Step: 82840   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:52:18,148-Speed 3309.58 samples/sec   Loss 5.8123   LearningRate 0.0444   Epoch: 6   Global Step: 82850   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:52:21,225-Speed 3328.63 samples/sec   Loss 5.8666   LearningRate 0.0444   Epoch: 6   Global Step: 82860   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:52:24,329-Speed 3299.84 samples/sec   Loss 5.7867   LearningRate 0.0444   Epoch: 6   Global Step: 82870   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:52:27,425-Speed 3308.23 samples/sec   Loss 5.7703   LearningRate 0.0444   Epoch: 6   Global Step: 82880   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:52:30,584-Speed 3242.68 samples/sec   Loss 5.7853   LearningRate 0.0444   Epoch: 6   Global Step: 82890   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:52:33,640-Speed 3352.10 samples/sec   Loss 5.7607   LearningRate 0.0444   Epoch: 6   Global Step: 82900   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:52:36,779-Speed 3263.62 samples/sec   Loss 5.7877   LearningRate 0.0444   Epoch: 6   Global Step: 82910   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:52:39,893-Speed 3289.39 samples/sec   Loss 5.7563   LearningRate 0.0444   Epoch: 6   Global Step: 82920   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:52:42,988-Speed 3309.34 samples/sec   Loss 5.8244   LearningRate 0.0444   Epoch: 6   Global Step: 82930   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:52:46,041-Speed 3355.93 samples/sec   Loss 5.9285   LearningRate 0.0444   Epoch: 6   Global Step: 82940   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:52:49,164-Speed 3279.59 samples/sec   Loss 5.8955   LearningRate 0.0444   Epoch: 6   Global Step: 82950   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:52:52,253-Speed 3315.54 samples/sec   Loss 5.9225   LearningRate 0.0444   Epoch: 6   Global Step: 82960   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:52:55,384-Speed 3272.22 samples/sec   Loss 5.8825   LearningRate 0.0444   Epoch: 6   Global Step: 82970   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:52:58,446-Speed 3344.69 samples/sec   Loss 5.7701   LearningRate 0.0444   Epoch: 6   Global Step: 82980   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:53:01,508-Speed 3346.28 samples/sec   Loss 5.8484   LearningRate 0.0443   Epoch: 6   Global Step: 82990   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:53:04,588-Speed 3325.25 samples/sec   Loss 5.6881   LearningRate 0.0443   Epoch: 6   Global Step: 83000   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:53:07,667-Speed 3326.71 samples/sec   Loss 5.7804   LearningRate 0.0443   Epoch: 6   Global Step: 83010   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:53:10,728-Speed 3346.86 samples/sec   Loss 5.9176   LearningRate 0.0443   Epoch: 6   Global Step: 83020   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:53:13,843-Speed 3287.82 samples/sec   Loss 5.7801   LearningRate 0.0443   Epoch: 6   Global Step: 83030   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:53:16,924-Speed 3324.67 samples/sec   Loss 5.7476   LearningRate 0.0443   Epoch: 6   Global Step: 83040   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:53:19,997-Speed 3333.62 samples/sec   Loss 5.9139   LearningRate 0.0443   Epoch: 6   Global Step: 83050   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:53:23,098-Speed 3303.30 samples/sec   Loss 5.8263   LearningRate 0.0443   Epoch: 6   Global Step: 83060   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:53:26,190-Speed 3312.26 samples/sec   Loss 5.8725   LearningRate 0.0443   Epoch: 6   Global Step: 83070   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:53:29,239-Speed 3359.69 samples/sec   Loss 5.8580   LearningRate 0.0443   Epoch: 6   Global Step: 83080   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:53:32,341-Speed 3302.19 samples/sec   Loss 5.7704   LearningRate 0.0443   Epoch: 6   Global Step: 83090   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:53:35,418-Speed 3328.88 samples/sec   Loss 5.7885   LearningRate 0.0443   Epoch: 6   Global Step: 83100   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:53:38,500-Speed 3323.47 samples/sec   Loss 5.7725   LearningRate 0.0443   Epoch: 6   Global Step: 83110   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:53:41,586-Speed 3319.51 samples/sec   Loss 5.8135   LearningRate 0.0443   Epoch: 6   Global Step: 83120   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:53:44,656-Speed 3336.53 samples/sec   Loss 5.8773   LearningRate 0.0443   Epoch: 6   Global Step: 83130   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:53:47,735-Speed 3327.55 samples/sec   Loss 5.8136   LearningRate 0.0443   Epoch: 6   Global Step: 83140   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:53:50,830-Speed 3309.18 samples/sec   Loss 5.7611   LearningRate 0.0443   Epoch: 6   Global Step: 83150   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:53:53,889-Speed 3347.84 samples/sec   Loss 5.8632   LearningRate 0.0443   Epoch: 6   Global Step: 83160   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:53:56,938-Speed 3360.00 samples/sec   Loss 5.8049   LearningRate 0.0442   Epoch: 6   Global Step: 83170   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:54:00,041-Speed 3300.74 samples/sec   Loss 5.8250   LearningRate 0.0442   Epoch: 6   Global Step: 83180   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:54:03,186-Speed 3257.30 samples/sec   Loss 5.7209   LearningRate 0.0442   Epoch: 6   Global Step: 83190   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:54:06,334-Speed 3253.88 samples/sec   Loss 5.8464   LearningRate 0.0442   Epoch: 6   Global Step: 83200   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:54:09,389-Speed 3353.57 samples/sec   Loss 5.7928   LearningRate 0.0442   Epoch: 6   Global Step: 83210   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:54:12,464-Speed 3331.04 samples/sec   Loss 5.8880   LearningRate 0.0442   Epoch: 6   Global Step: 83220   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:54:15,523-Speed 3348.92 samples/sec   Loss 5.8237   LearningRate 0.0442   Epoch: 6   Global Step: 83230   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:54:18,588-Speed 3342.20 samples/sec   Loss 5.8016   LearningRate 0.0442   Epoch: 6   Global Step: 83240   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:54:21,639-Speed 3357.13 samples/sec   Loss 5.7313   LearningRate 0.0442   Epoch: 6   Global Step: 83250   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:54:24,729-Speed 3314.11 samples/sec   Loss 5.8565   LearningRate 0.0442   Epoch: 6   Global Step: 83260   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:54:27,906-Speed 3224.79 samples/sec   Loss 5.8078   LearningRate 0.0442   Epoch: 6   Global Step: 83270   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:54:31,059-Speed 3249.20 samples/sec   Loss 5.7827   LearningRate 0.0442   Epoch: 6   Global Step: 83280   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:54:34,126-Speed 3339.62 samples/sec   Loss 5.7548   LearningRate 0.0442   Epoch: 6   Global Step: 83290   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:54:37,209-Speed 3322.39 samples/sec   Loss 5.7397   LearningRate 0.0442   Epoch: 6   Global Step: 83300   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:54:40,289-Speed 3325.93 samples/sec   Loss 5.7906   LearningRate 0.0442   Epoch: 6   Global Step: 83310   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:54:43,413-Speed 3278.61 samples/sec   Loss 5.7283   LearningRate 0.0442   Epoch: 6   Global Step: 83320   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:54:46,482-Speed 3337.99 samples/sec   Loss 5.8438   LearningRate 0.0442   Epoch: 6   Global Step: 83330   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:54:49,561-Speed 3326.23 samples/sec   Loss 5.7938   LearningRate 0.0442   Epoch: 6   Global Step: 83340   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:54:52,636-Speed 3331.00 samples/sec   Loss 5.8657   LearningRate 0.0442   Epoch: 6   Global Step: 83350   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:54:55,721-Speed 3321.16 samples/sec   Loss 5.8138   LearningRate 0.0441   Epoch: 6   Global Step: 83360   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:54:58,876-Speed 3246.46 samples/sec   Loss 5.7743   LearningRate 0.0441   Epoch: 6   Global Step: 83370   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:55:02,022-Speed 3256.65 samples/sec   Loss 5.7284   LearningRate 0.0441   Epoch: 6   Global Step: 83380   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:55:05,065-Speed 3364.95 samples/sec   Loss 5.6770   LearningRate 0.0441   Epoch: 6   Global Step: 83390   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:55:08,177-Speed 3291.94 samples/sec   Loss 5.7993   LearningRate 0.0441   Epoch: 6   Global Step: 83400   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:55:11,257-Speed 3325.90 samples/sec   Loss 5.8013   LearningRate 0.0441   Epoch: 6   Global Step: 83410   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:55:14,409-Speed 3249.52 samples/sec   Loss 5.9635   LearningRate 0.0441   Epoch: 6   Global Step: 83420   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:55:17,518-Speed 3295.31 samples/sec   Loss 5.7764   LearningRate 0.0441   Epoch: 6   Global Step: 83430   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:55:20,591-Speed 3333.46 samples/sec   Loss 5.7210   LearningRate 0.0441   Epoch: 6   Global Step: 83440   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:55:23,674-Speed 3322.03 samples/sec   Loss 5.8915   LearningRate 0.0441   Epoch: 6   Global Step: 83450   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:55:26,734-Speed 3348.32 samples/sec   Loss 5.9030   LearningRate 0.0441   Epoch: 6   Global Step: 83460   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:55:29,835-Speed 3303.30 samples/sec   Loss 5.7949   LearningRate 0.0441   Epoch: 6   Global Step: 83470   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:55:32,951-Speed 3286.52 samples/sec   Loss 5.8992   LearningRate 0.0441   Epoch: 6   Global Step: 83480   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:55:36,066-Speed 3288.52 samples/sec   Loss 5.9511   LearningRate 0.0441   Epoch: 6   Global Step: 83490   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:55:39,226-Speed 3242.04 samples/sec   Loss 5.7530   LearningRate 0.0441   Epoch: 6   Global Step: 83500   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:55:42,301-Speed 3331.37 samples/sec   Loss 5.7686   LearningRate 0.0441   Epoch: 6   Global Step: 83510   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:55:45,399-Speed 3306.42 samples/sec   Loss 5.7075   LearningRate 0.0441   Epoch: 6   Global Step: 83520   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:55:48,479-Speed 3325.65 samples/sec   Loss 5.7806   LearningRate 0.0441   Epoch: 6   Global Step: 83530   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:55:51,580-Speed 3303.77 samples/sec   Loss 5.7624   LearningRate 0.0441   Epoch: 6   Global Step: 83540   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:55:54,714-Speed 3267.41 samples/sec   Loss 5.7895   LearningRate 0.0440   Epoch: 6   Global Step: 83550   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:55:57,793-Speed 3328.15 samples/sec   Loss 5.8728   LearningRate 0.0440   Epoch: 6   Global Step: 83560   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:56:00,888-Speed 3308.87 samples/sec   Loss 5.8349   LearningRate 0.0440   Epoch: 6   Global Step: 83570   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:56:04,019-Speed 3271.33 samples/sec   Loss 5.7528   LearningRate 0.0440   Epoch: 6   Global Step: 83580   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:56:07,185-Speed 3235.75 samples/sec   Loss 5.8117   LearningRate 0.0440   Epoch: 6   Global Step: 83590   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:56:10,299-Speed 3289.49 samples/sec   Loss 5.7194   LearningRate 0.0440   Epoch: 6   Global Step: 83600   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:56:13,467-Speed 3233.25 samples/sec   Loss 5.8925   LearningRate 0.0440   Epoch: 6   Global Step: 83610   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:56:16,575-Speed 3295.77 samples/sec   Loss 5.7788   LearningRate 0.0440   Epoch: 6   Global Step: 83620   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:56:19,742-Speed 3234.36 samples/sec   Loss 5.7723   LearningRate 0.0440   Epoch: 6   Global Step: 83630   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:56:22,849-Speed 3297.02 samples/sec   Loss 5.7668   LearningRate 0.0440   Epoch: 6   Global Step: 83640   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:56:25,932-Speed 3322.99 samples/sec   Loss 5.8339   LearningRate 0.0440   Epoch: 6   Global Step: 83650   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:56:29,043-Speed 3292.19 samples/sec   Loss 5.7182   LearningRate 0.0440   Epoch: 6   Global Step: 83660   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:56:32,104-Speed 3345.83 samples/sec   Loss 5.7378   LearningRate 0.0440   Epoch: 6   Global Step: 83670   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:56:35,237-Speed 3270.04 samples/sec   Loss 5.8392   LearningRate 0.0440   Epoch: 6   Global Step: 83680   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:56:38,355-Speed 3285.49 samples/sec   Loss 5.6985   LearningRate 0.0440   Epoch: 6   Global Step: 83690   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:56:41,557-Speed 3198.58 samples/sec   Loss 5.6714   LearningRate 0.0440   Epoch: 6   Global Step: 83700   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:56:44,649-Speed 3312.57 samples/sec   Loss 5.7844   LearningRate 0.0440   Epoch: 6   Global Step: 83710   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:56:47,797-Speed 3253.93 samples/sec   Loss 5.7766   LearningRate 0.0440   Epoch: 6   Global Step: 83720   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:56:50,938-Speed 3261.20 samples/sec   Loss 5.7215   LearningRate 0.0440   Epoch: 6   Global Step: 83730   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:56:54,084-Speed 3256.07 samples/sec   Loss 5.6461   LearningRate 0.0439   Epoch: 6   Global Step: 83740   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:56:57,141-Speed 3350.52 samples/sec   Loss 5.7781   LearningRate 0.0439   Epoch: 6   Global Step: 83750   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:57:00,260-Speed 3284.93 samples/sec   Loss 5.8634   LearningRate 0.0439   Epoch: 6   Global Step: 83760   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:57:03,310-Speed 3357.39 samples/sec   Loss 5.8706   LearningRate 0.0439   Epoch: 6   Global Step: 83770   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:57:06,443-Speed 3269.65 samples/sec   Loss 5.7834   LearningRate 0.0439   Epoch: 6   Global Step: 83780   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:57:09,552-Speed 3295.56 samples/sec   Loss 5.8308   LearningRate 0.0439   Epoch: 6   Global Step: 83790   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:57:12,689-Speed 3265.21 samples/sec   Loss 5.7909   LearningRate 0.0439   Epoch: 6   Global Step: 83800   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:57:15,843-Speed 3247.14 samples/sec   Loss 5.8773   LearningRate 0.0439   Epoch: 6   Global Step: 83810   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:57:18,988-Speed 3257.15 samples/sec   Loss 5.8300   LearningRate 0.0439   Epoch: 6   Global Step: 83820   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:57:22,083-Speed 3309.88 samples/sec   Loss 5.7091   LearningRate 0.0439   Epoch: 6   Global Step: 83830   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:57:25,227-Speed 3257.28 samples/sec   Loss 5.8399   LearningRate 0.0439   Epoch: 6   Global Step: 83840   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:57:28,316-Speed 3317.09 samples/sec   Loss 5.7575   LearningRate 0.0439   Epoch: 6   Global Step: 83850   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:57:31,421-Speed 3298.96 samples/sec   Loss 5.8692   LearningRate 0.0439   Epoch: 6   Global Step: 83860   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:57:34,531-Speed 3293.43 samples/sec   Loss 5.7550   LearningRate 0.0439   Epoch: 6   Global Step: 83870   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:57:37,662-Speed 3270.93 samples/sec   Loss 5.9253   LearningRate 0.0439   Epoch: 6   Global Step: 83880   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:57:40,798-Speed 3266.80 samples/sec   Loss 5.8084   LearningRate 0.0439   Epoch: 6   Global Step: 83890   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:57:43,944-Speed 3255.18 samples/sec   Loss 5.9204   LearningRate 0.0439   Epoch: 6   Global Step: 83900   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:57:47,093-Speed 3253.12 samples/sec   Loss 5.8166   LearningRate 0.0439   Epoch: 6   Global Step: 83910   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:57:50,195-Speed 3303.01 samples/sec   Loss 5.8416   LearningRate 0.0438   Epoch: 6   Global Step: 83920   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:57:53,298-Speed 3300.67 samples/sec   Loss 5.7582   LearningRate 0.0438   Epoch: 6   Global Step: 83930   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:57:56,356-Speed 3349.85 samples/sec   Loss 5.7940   LearningRate 0.0438   Epoch: 6   Global Step: 83940   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:57:59,458-Speed 3302.07 samples/sec   Loss 5.8760   LearningRate 0.0438   Epoch: 6   Global Step: 83950   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:58:02,613-Speed 3247.15 samples/sec   Loss 5.7531   LearningRate 0.0438   Epoch: 6   Global Step: 83960   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:58:05,742-Speed 3273.65 samples/sec   Loss 5.7701   LearningRate 0.0438   Epoch: 6   Global Step: 83970   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:58:08,801-Speed 3347.99 samples/sec   Loss 5.7988   LearningRate 0.0438   Epoch: 6   Global Step: 83980   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:58:11,861-Speed 3347.49 samples/sec   Loss 5.6795   LearningRate 0.0438   Epoch: 6   Global Step: 83990   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:58:14,995-Speed 3268.46 samples/sec   Loss 5.7655   LearningRate 0.0438   Epoch: 6   Global Step: 84000   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:58:18,051-Speed 3352.19 samples/sec   Loss 5.8277   LearningRate 0.0438   Epoch: 6   Global Step: 84010   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:58:21,110-Speed 3348.35 samples/sec   Loss 5.6452   LearningRate 0.0438   Epoch: 6   Global Step: 84020   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:58:24,194-Speed 3321.86 samples/sec   Loss 5.8807   LearningRate 0.0438   Epoch: 6   Global Step: 84030   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:58:27,294-Speed 3304.57 samples/sec   Loss 5.8717   LearningRate 0.0438   Epoch: 6   Global Step: 84040   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:58:30,381-Speed 3317.25 samples/sec   Loss 5.8280   LearningRate 0.0438   Epoch: 6   Global Step: 84050   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:58:33,436-Speed 3353.49 samples/sec   Loss 5.6650   LearningRate 0.0438   Epoch: 6   Global Step: 84060   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 08:58:36,507-Speed 3335.56 samples/sec   Loss 5.8289   LearningRate 0.0438   Epoch: 6   Global Step: 84070   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-04-27 08:58:39,578-Speed 3335.38 samples/sec   Loss 5.7141   LearningRate 0.0438   Epoch: 6   Global Step: 84080   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:58:42,671-Speed 3311.94 samples/sec   Loss 5.8203   LearningRate 0.0438   Epoch: 6   Global Step: 84090   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:58:45,721-Speed 3358.04 samples/sec   Loss 5.7435   LearningRate 0.0438   Epoch: 6   Global Step: 84100   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:58:48,815-Speed 3310.51 samples/sec   Loss 5.7350   LearningRate 0.0437   Epoch: 6   Global Step: 84110   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:58:51,972-Speed 3245.56 samples/sec   Loss 5.8158   LearningRate 0.0437   Epoch: 6   Global Step: 84120   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:58:55,102-Speed 3272.56 samples/sec   Loss 5.8241   LearningRate 0.0437   Epoch: 6   Global Step: 84130   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:58:58,200-Speed 3305.90 samples/sec   Loss 5.8174   LearningRate 0.0437   Epoch: 6   Global Step: 84140   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:59:01,336-Speed 3266.87 samples/sec   Loss 5.7898   LearningRate 0.0437   Epoch: 6   Global Step: 84150   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:59:04,476-Speed 3262.65 samples/sec   Loss 5.8147   LearningRate 0.0437   Epoch: 6   Global Step: 84160   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:59:07,541-Speed 3341.99 samples/sec   Loss 5.7694   LearningRate 0.0437   Epoch: 6   Global Step: 84170   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:59:10,615-Speed 3332.05 samples/sec   Loss 5.8402   LearningRate 0.0437   Epoch: 6   Global Step: 84180   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:59:13,713-Speed 3306.79 samples/sec   Loss 5.8705   LearningRate 0.0437   Epoch: 6   Global Step: 84190   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:59:16,826-Speed 3290.31 samples/sec   Loss 5.7875   LearningRate 0.0437   Epoch: 6   Global Step: 84200   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:59:19,919-Speed 3311.91 samples/sec   Loss 5.8888   LearningRate 0.0437   Epoch: 6   Global Step: 84210   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:59:23,034-Speed 3288.38 samples/sec   Loss 5.7449   LearningRate 0.0437   Epoch: 6   Global Step: 84220   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:59:26,098-Speed 3342.96 samples/sec   Loss 5.7661   LearningRate 0.0437   Epoch: 6   Global Step: 84230   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:59:29,194-Speed 3308.39 samples/sec   Loss 5.7415   LearningRate 0.0437   Epoch: 6   Global Step: 84240   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:59:32,243-Speed 3359.11 samples/sec   Loss 5.8082   LearningRate 0.0437   Epoch: 6   Global Step: 84250   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:59:35,357-Speed 3289.72 samples/sec   Loss 5.8275   LearningRate 0.0437   Epoch: 6   Global Step: 84260   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 08:59:38,437-Speed 3325.48 samples/sec   Loss 5.8741   LearningRate 0.0437   Epoch: 6   Global Step: 84270   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:59:41,543-Speed 3297.88 samples/sec   Loss 5.7239   LearningRate 0.0437   Epoch: 6   Global Step: 84280   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:59:44,607-Speed 3343.42 samples/sec   Loss 5.7741   LearningRate 0.0437   Epoch: 6   Global Step: 84290   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:59:47,705-Speed 3306.35 samples/sec   Loss 5.7865   LearningRate 0.0436   Epoch: 6   Global Step: 84300   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:59:50,788-Speed 3322.56 samples/sec   Loss 5.7935   LearningRate 0.0436   Epoch: 6   Global Step: 84310   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:59:53,918-Speed 3272.15 samples/sec   Loss 5.7681   LearningRate 0.0436   Epoch: 6   Global Step: 84320   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 08:59:56,992-Speed 3332.79 samples/sec   Loss 5.7295   LearningRate 0.0436   Epoch: 6   Global Step: 84330   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:00:00,079-Speed 3317.50 samples/sec   Loss 5.7303   LearningRate 0.0436   Epoch: 6   Global Step: 84340   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:00:03,149-Speed 3337.29 samples/sec   Loss 5.8531   LearningRate 0.0436   Epoch: 6   Global Step: 84350   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:00:06,279-Speed 3272.66 samples/sec   Loss 5.8108   LearningRate 0.0436   Epoch: 6   Global Step: 84360   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:00:09,338-Speed 3348.23 samples/sec   Loss 5.8326   LearningRate 0.0436   Epoch: 6   Global Step: 84370   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:00:12,431-Speed 3311.20 samples/sec   Loss 5.7332   LearningRate 0.0436   Epoch: 6   Global Step: 84380   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:00:15,520-Speed 3317.12 samples/sec   Loss 5.8059   LearningRate 0.0436   Epoch: 6   Global Step: 84390   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:00:18,666-Speed 3255.79 samples/sec   Loss 5.7601   LearningRate 0.0436   Epoch: 6   Global Step: 84400   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:00:21,732-Speed 3340.74 samples/sec   Loss 5.8006   LearningRate 0.0436   Epoch: 6   Global Step: 84410   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:00:24,883-Speed 3250.26 samples/sec   Loss 5.8132   LearningRate 0.0436   Epoch: 6   Global Step: 84420   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:00:27,988-Speed 3298.90 samples/sec   Loss 5.7413   LearningRate 0.0436   Epoch: 6   Global Step: 84430   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:00:31,139-Speed 3251.28 samples/sec   Loss 5.7650   LearningRate 0.0436   Epoch: 6   Global Step: 84440   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:00:34,214-Speed 3331.55 samples/sec   Loss 5.7646   LearningRate 0.0436   Epoch: 6   Global Step: 84450   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:00:37,289-Speed 3330.90 samples/sec   Loss 5.7923   LearningRate 0.0436   Epoch: 6   Global Step: 84460   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:00:40,393-Speed 3299.08 samples/sec   Loss 5.8027   LearningRate 0.0436   Epoch: 6   Global Step: 84470   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:00:43,504-Speed 3293.17 samples/sec   Loss 5.8188   LearningRate 0.0436   Epoch: 6   Global Step: 84480   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:00:46,568-Speed 3342.61 samples/sec   Loss 5.9025   LearningRate 0.0435   Epoch: 6   Global Step: 84490   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:00:49,667-Speed 3305.85 samples/sec   Loss 5.7080   LearningRate 0.0435   Epoch: 6   Global Step: 84500   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:00:52,798-Speed 3271.93 samples/sec   Loss 5.7931   LearningRate 0.0435   Epoch: 6   Global Step: 84510   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:00:55,893-Speed 3309.11 samples/sec   Loss 5.7984   LearningRate 0.0435   Epoch: 6   Global Step: 84520   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:00:58,972-Speed 3327.03 samples/sec   Loss 5.8051   LearningRate 0.0435   Epoch: 6   Global Step: 84530   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:01:02,068-Speed 3309.05 samples/sec   Loss 5.7329   LearningRate 0.0435   Epoch: 6   Global Step: 84540   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:01:05,224-Speed 3244.88 samples/sec   Loss 5.8304   LearningRate 0.0435   Epoch: 6   Global Step: 84550   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:01:08,380-Speed 3246.55 samples/sec   Loss 5.7915   LearningRate 0.0435   Epoch: 6   Global Step: 84560   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:01:11,504-Speed 3277.67 samples/sec   Loss 5.9059   LearningRate 0.0435   Epoch: 6   Global Step: 84570   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:01:14,667-Speed 3239.14 samples/sec   Loss 5.8427   LearningRate 0.0435   Epoch: 6   Global Step: 84580   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:01:17,864-Speed 3203.32 samples/sec   Loss 5.8106   LearningRate 0.0435   Epoch: 6   Global Step: 84590   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:01:20,954-Speed 3315.35 samples/sec   Loss 5.8160   LearningRate 0.0435   Epoch: 6   Global Step: 84600   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:01:24,056-Speed 3302.87 samples/sec   Loss 5.8309   LearningRate 0.0435   Epoch: 6   Global Step: 84610   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:01:27,159-Speed 3300.83 samples/sec   Loss 5.7796   LearningRate 0.0435   Epoch: 6   Global Step: 84620   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:01:30,289-Speed 3271.97 samples/sec   Loss 5.8418   LearningRate 0.0435   Epoch: 6   Global Step: 84630   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:01:33,378-Speed 3316.49 samples/sec   Loss 5.8425   LearningRate 0.0435   Epoch: 6   Global Step: 84640   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:01:36,441-Speed 3344.62 samples/sec   Loss 5.6664   LearningRate 0.0435   Epoch: 6   Global Step: 84650   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:01:39,554-Speed 3289.67 samples/sec   Loss 5.8098   LearningRate 0.0435   Epoch: 6   Global Step: 84660   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:01:42,746-Speed 3209.47 samples/sec   Loss 5.7854   LearningRate 0.0434   Epoch: 6   Global Step: 84670   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:01:45,874-Speed 3274.26 samples/sec   Loss 5.8737   LearningRate 0.0434   Epoch: 6   Global Step: 84680   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:01:49,115-Speed 3160.75 samples/sec   Loss 5.7188   LearningRate 0.0434   Epoch: 6   Global Step: 84690   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:01:52,292-Speed 3224.98 samples/sec   Loss 5.7974   LearningRate 0.0434   Epoch: 6   Global Step: 84700   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:01:55,357-Speed 3341.19 samples/sec   Loss 6.0032   LearningRate 0.0434   Epoch: 6   Global Step: 84710   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:01:58,497-Speed 3261.96 samples/sec   Loss 5.7544   LearningRate 0.0434   Epoch: 6   Global Step: 84720   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:02:01,572-Speed 3331.37 samples/sec   Loss 5.8361   LearningRate 0.0434   Epoch: 6   Global Step: 84730   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:02:04,684-Speed 3291.15 samples/sec   Loss 5.8464   LearningRate 0.0434   Epoch: 6   Global Step: 84740   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:02:07,767-Speed 3322.35 samples/sec   Loss 5.8191   LearningRate 0.0434   Epoch: 6   Global Step: 84750   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:02:10,857-Speed 3315.28 samples/sec   Loss 5.8211   LearningRate 0.0434   Epoch: 6   Global Step: 84760   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:02:13,985-Speed 3275.25 samples/sec   Loss 5.8850   LearningRate 0.0434   Epoch: 6   Global Step: 84770   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:02:17,067-Speed 3323.46 samples/sec   Loss 5.7636   LearningRate 0.0434   Epoch: 6   Global Step: 84780   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:02:20,148-Speed 3324.76 samples/sec   Loss 5.7469   LearningRate 0.0434   Epoch: 6   Global Step: 84790   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:02:23,257-Speed 3294.43 samples/sec   Loss 5.9097   LearningRate 0.0434   Epoch: 6   Global Step: 84800   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:02:26,422-Speed 3236.87 samples/sec   Loss 5.8461   LearningRate 0.0434   Epoch: 6   Global Step: 84810   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:02:29,613-Speed 3209.75 samples/sec   Loss 5.7350   LearningRate 0.0434   Epoch: 6   Global Step: 84820   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:02:32,694-Speed 3324.64 samples/sec   Loss 5.7499   LearningRate 0.0434   Epoch: 6   Global Step: 84830   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:02:35,792-Speed 3306.46 samples/sec   Loss 5.8704   LearningRate 0.0434   Epoch: 6   Global Step: 84840   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:02:38,878-Speed 3319.49 samples/sec   Loss 5.7963   LearningRate 0.0434   Epoch: 6   Global Step: 84850   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:02:41,952-Speed 3332.14 samples/sec   Loss 5.7123   LearningRate 0.0433   Epoch: 6   Global Step: 84860   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:02:45,008-Speed 3352.12 samples/sec   Loss 5.7193   LearningRate 0.0433   Epoch: 6   Global Step: 84870   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:02:48,056-Speed 3360.19 samples/sec   Loss 5.7746   LearningRate 0.0433   Epoch: 6   Global Step: 84880   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:02:51,117-Speed 3347.33 samples/sec   Loss 5.6684   LearningRate 0.0433   Epoch: 6   Global Step: 84890   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:02:54,181-Speed 3341.87 samples/sec   Loss 5.8083   LearningRate 0.0433   Epoch: 6   Global Step: 84900   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:02:57,244-Speed 3345.01 samples/sec   Loss 5.7144   LearningRate 0.0433   Epoch: 6   Global Step: 84910   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:03:00,472-Speed 3173.18 samples/sec   Loss 5.8357   LearningRate 0.0433   Epoch: 6   Global Step: 84920   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:03:03,571-Speed 3305.57 samples/sec   Loss 5.8139   LearningRate 0.0433   Epoch: 6   Global Step: 84930   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:03:06,674-Speed 3300.84 samples/sec   Loss 5.8220   LearningRate 0.0433   Epoch: 6   Global Step: 84940   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:03:09,750-Speed 3329.67 samples/sec   Loss 5.8235   LearningRate 0.0433   Epoch: 6   Global Step: 84950   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:03:12,846-Speed 3309.15 samples/sec   Loss 5.7966   LearningRate 0.0433   Epoch: 6   Global Step: 84960   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:03:16,019-Speed 3227.78 samples/sec   Loss 5.8138   LearningRate 0.0433   Epoch: 6   Global Step: 84970   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:03:19,095-Speed 3330.46 samples/sec   Loss 5.8608   LearningRate 0.0433   Epoch: 6   Global Step: 84980   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:03:22,172-Speed 3329.52 samples/sec   Loss 5.7496   LearningRate 0.0433   Epoch: 6   Global Step: 84990   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:03:25,308-Speed 3265.89 samples/sec   Loss 5.8313   LearningRate 0.0433   Epoch: 6   Global Step: 85000   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:03:28,379-Speed 3336.03 samples/sec   Loss 5.8731   LearningRate 0.0433   Epoch: 6   Global Step: 85010   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:03:31,487-Speed 3295.67 samples/sec   Loss 5.7952   LearningRate 0.0433   Epoch: 6   Global Step: 85020   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:03:34,601-Speed 3289.07 samples/sec   Loss 6.0219   LearningRate 0.0433   Epoch: 6   Global Step: 85030   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:03:37,678-Speed 3329.52 samples/sec   Loss 5.8690   LearningRate 0.0433   Epoch: 6   Global Step: 85040   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:03:40,769-Speed 3314.11 samples/sec   Loss 5.7858   LearningRate 0.0432   Epoch: 6   Global Step: 85050   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:03:43,926-Speed 3243.43 samples/sec   Loss 5.7127   LearningRate 0.0432   Epoch: 6   Global Step: 85060   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:03:47,016-Speed 3315.53 samples/sec   Loss 5.7701   LearningRate 0.0432   Epoch: 6   Global Step: 85070   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:03:50,094-Speed 3328.19 samples/sec   Loss 5.8091   LearningRate 0.0432   Epoch: 6   Global Step: 85080   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:03:53,279-Speed 3215.99 samples/sec   Loss 5.8460   LearningRate 0.0432   Epoch: 6   Global Step: 85090   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:03:56,368-Speed 3316.22 samples/sec   Loss 5.7803   LearningRate 0.0432   Epoch: 6   Global Step: 85100   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:03:59,509-Speed 3261.10 samples/sec   Loss 5.8213   LearningRate 0.0432   Epoch: 6   Global Step: 85110   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:04:02,610-Speed 3302.83 samples/sec   Loss 5.6804   LearningRate 0.0432   Epoch: 6   Global Step: 85120   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:04:05,694-Speed 3322.02 samples/sec   Loss 5.7576   LearningRate 0.0432   Epoch: 6   Global Step: 85130   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:04:08,736-Speed 3367.57 samples/sec   Loss 5.7169   LearningRate 0.0432   Epoch: 6   Global Step: 85140   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:04:11,815-Speed 3325.95 samples/sec   Loss 5.7983   LearningRate 0.0432   Epoch: 6   Global Step: 85150   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:04:14,870-Speed 3353.86 samples/sec   Loss 5.8733   LearningRate 0.0432   Epoch: 6   Global Step: 85160   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:04:17,935-Speed 3341.95 samples/sec   Loss 5.7756   LearningRate 0.0432   Epoch: 6   Global Step: 85170   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:04:21,049-Speed 3289.06 samples/sec   Loss 5.7351   LearningRate 0.0432   Epoch: 6   Global Step: 85180   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:04:24,141-Speed 3312.65 samples/sec   Loss 5.7875   LearningRate 0.0432   Epoch: 6   Global Step: 85190   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:04:27,194-Speed 3355.66 samples/sec   Loss 5.8062   LearningRate 0.0432   Epoch: 6   Global Step: 85200   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:04:30,257-Speed 3343.58 samples/sec   Loss 5.8174   LearningRate 0.0432   Epoch: 6   Global Step: 85210   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:04:33,346-Speed 3316.45 samples/sec   Loss 5.7972   LearningRate 0.0432   Epoch: 6   Global Step: 85220   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:04:36,471-Speed 3278.76 samples/sec   Loss 5.8480   LearningRate 0.0432   Epoch: 6   Global Step: 85230   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:04:39,539-Speed 3338.16 samples/sec   Loss 5.8963   LearningRate 0.0431   Epoch: 6   Global Step: 85240   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:04:42,609-Speed 3336.48 samples/sec   Loss 5.7139   LearningRate 0.0431   Epoch: 6   Global Step: 85250   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:04:45,663-Speed 3354.76 samples/sec   Loss 5.8226   LearningRate 0.0431   Epoch: 6   Global Step: 85260   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:04:48,730-Speed 3338.92 samples/sec   Loss 5.7165   LearningRate 0.0431   Epoch: 6   Global Step: 85270   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:04:51,930-Speed 3201.73 samples/sec   Loss 5.7827   LearningRate 0.0431   Epoch: 6   Global Step: 85280   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:04:55,091-Speed 3240.19 samples/sec   Loss 5.8278   LearningRate 0.0431   Epoch: 6   Global Step: 85290   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:04:58,158-Speed 3339.76 samples/sec   Loss 5.7640   LearningRate 0.0431   Epoch: 6   Global Step: 85300   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:05:01,256-Speed 3306.31 samples/sec   Loss 5.7826   LearningRate 0.0431   Epoch: 6   Global Step: 85310   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:05:04,382-Speed 3277.00 samples/sec   Loss 5.8890   LearningRate 0.0431   Epoch: 6   Global Step: 85320   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:05:07,498-Speed 3286.90 samples/sec   Loss 5.8777   LearningRate 0.0431   Epoch: 6   Global Step: 85330   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:05:10,604-Speed 3297.67 samples/sec   Loss 5.8205   LearningRate 0.0431   Epoch: 6   Global Step: 85340   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:05:13,697-Speed 3312.09 samples/sec   Loss 5.8285   LearningRate 0.0431   Epoch: 6   Global Step: 85350   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:05:16,820-Speed 3279.40 samples/sec   Loss 5.8091   LearningRate 0.0431   Epoch: 6   Global Step: 85360   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:05:19,897-Speed 3329.48 samples/sec   Loss 5.8178   LearningRate 0.0431   Epoch: 6   Global Step: 85370   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:05:22,989-Speed 3312.91 samples/sec   Loss 5.7080   LearningRate 0.0431   Epoch: 6   Global Step: 85380   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:05:26,202-Speed 3188.24 samples/sec   Loss 5.8235   LearningRate 0.0431   Epoch: 6   Global Step: 85390   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:05:29,319-Speed 3285.22 samples/sec   Loss 5.7639   LearningRate 0.0431   Epoch: 6   Global Step: 85400   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:05:32,380-Speed 3346.96 samples/sec   Loss 5.8234   LearningRate 0.0431   Epoch: 6   Global Step: 85410   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:05:35,591-Speed 3189.60 samples/sec   Loss 5.8091   LearningRate 0.0431   Epoch: 6   Global Step: 85420   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:05:38,766-Speed 3226.39 samples/sec   Loss 5.8956   LearningRate 0.0430   Epoch: 6   Global Step: 85430   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:05:41,874-Speed 3296.18 samples/sec   Loss 5.7911   LearningRate 0.0430   Epoch: 6   Global Step: 85440   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:05:44,989-Speed 3288.34 samples/sec   Loss 5.7324   LearningRate 0.0430   Epoch: 6   Global Step: 85450   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:05:48,120-Speed 3271.71 samples/sec   Loss 5.7606   LearningRate 0.0430   Epoch: 6   Global Step: 85460   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:05:51,206-Speed 3318.84 samples/sec   Loss 5.7450   LearningRate 0.0430   Epoch: 6   Global Step: 85470   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:05:54,313-Speed 3297.44 samples/sec   Loss 5.6776   LearningRate 0.0430   Epoch: 6   Global Step: 85480   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:05:57,387-Speed 3332.41 samples/sec   Loss 5.8006   LearningRate 0.0430   Epoch: 6   Global Step: 85490   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:06:00,447-Speed 3347.93 samples/sec   Loss 5.8140   LearningRate 0.0430   Epoch: 6   Global Step: 85500   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:06:03,579-Speed 3269.84 samples/sec   Loss 5.8496   LearningRate 0.0430   Epoch: 6   Global Step: 85510   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:06:06,670-Speed 3313.91 samples/sec   Loss 5.7143   LearningRate 0.0430   Epoch: 6   Global Step: 85520   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:06:09,736-Speed 3341.07 samples/sec   Loss 5.7985   LearningRate 0.0430   Epoch: 6   Global Step: 85530   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:06:12,857-Speed 3282.03 samples/sec   Loss 5.7167   LearningRate 0.0430   Epoch: 6   Global Step: 85540   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:06:15,976-Speed 3284.22 samples/sec   Loss 5.7701   LearningRate 0.0430   Epoch: 6   Global Step: 85550   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:06:19,118-Speed 3260.34 samples/sec   Loss 5.7029   LearningRate 0.0430   Epoch: 6   Global Step: 85560   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:06:22,221-Speed 3300.49 samples/sec   Loss 5.8213   LearningRate 0.0430   Epoch: 6   Global Step: 85570   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:06:25,343-Speed 3282.09 samples/sec   Loss 5.8201   LearningRate 0.0430   Epoch: 6   Global Step: 85580   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:06:28,499-Speed 3244.84 samples/sec   Loss 5.7100   LearningRate 0.0430   Epoch: 6   Global Step: 85590   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:06:31,578-Speed 3326.87 samples/sec   Loss 5.7621   LearningRate 0.0430   Epoch: 6   Global Step: 85600   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:06:34,633-Speed 3352.81 samples/sec   Loss 5.8469   LearningRate 0.0430   Epoch: 6   Global Step: 85610   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:06:37,737-Speed 3300.76 samples/sec   Loss 5.8176   LearningRate 0.0429   Epoch: 6   Global Step: 85620   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:06:40,848-Speed 3292.39 samples/sec   Loss 5.8729   LearningRate 0.0429   Epoch: 6   Global Step: 85630   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:06:43,921-Speed 3333.11 samples/sec   Loss 5.6903   LearningRate 0.0429   Epoch: 6   Global Step: 85640   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:06:47,062-Speed 3261.15 samples/sec   Loss 5.9578   LearningRate 0.0429   Epoch: 6   Global Step: 85650   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:06:50,220-Speed 3243.50 samples/sec   Loss 5.7811   LearningRate 0.0429   Epoch: 6   Global Step: 85660   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:06:53,388-Speed 3233.50 samples/sec   Loss 5.8498   LearningRate 0.0429   Epoch: 6   Global Step: 85670   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:06:56,481-Speed 3312.10 samples/sec   Loss 5.7633   LearningRate 0.0429   Epoch: 6   Global Step: 85680   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:06:59,529-Speed 3360.47 samples/sec   Loss 5.7795   LearningRate 0.0429   Epoch: 6   Global Step: 85690   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:07:02,642-Speed 3290.21 samples/sec   Loss 5.8703   LearningRate 0.0429   Epoch: 6   Global Step: 85700   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:07:05,720-Speed 3328.10 samples/sec   Loss 5.7572   LearningRate 0.0429   Epoch: 6   Global Step: 85710   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:07:08,816-Speed 3308.77 samples/sec   Loss 5.7840   LearningRate 0.0429   Epoch: 6   Global Step: 85720   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:07:11,856-Speed 3369.77 samples/sec   Loss 5.8621   LearningRate 0.0429   Epoch: 6   Global Step: 85730   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:07:14,956-Speed 3303.85 samples/sec   Loss 5.8227   LearningRate 0.0429   Epoch: 6   Global Step: 85740   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:07:18,074-Speed 3285.63 samples/sec   Loss 5.7808   LearningRate 0.0429   Epoch: 6   Global Step: 85750   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:07:21,108-Speed 3375.41 samples/sec   Loss 5.8265   LearningRate 0.0429   Epoch: 6   Global Step: 85760   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:07:24,197-Speed 3316.59 samples/sec   Loss 5.7131   LearningRate 0.0429   Epoch: 6   Global Step: 85770   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:07:27,258-Speed 3346.32 samples/sec   Loss 5.7961   LearningRate 0.0429   Epoch: 6   Global Step: 85780   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:07:30,354-Speed 3308.41 samples/sec   Loss 5.8954   LearningRate 0.0429   Epoch: 6   Global Step: 85790   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:07:33,418-Speed 3343.50 samples/sec   Loss 5.8129   LearningRate 0.0429   Epoch: 6   Global Step: 85800   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:07:36,552-Speed 3268.76 samples/sec   Loss 5.7516   LearningRate 0.0428   Epoch: 6   Global Step: 85810   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:07:39,600-Speed 3360.32 samples/sec   Loss 5.7657   LearningRate 0.0428   Epoch: 6   Global Step: 85820   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:07:42,735-Speed 3267.55 samples/sec   Loss 5.7624   LearningRate 0.0428   Epoch: 6   Global Step: 85830   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:07:45,793-Speed 3349.09 samples/sec   Loss 5.7195   LearningRate 0.0428   Epoch: 6   Global Step: 85840   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:07:48,885-Speed 3313.77 samples/sec   Loss 5.7775   LearningRate 0.0428   Epoch: 6   Global Step: 85850   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:07:51,992-Speed 3296.33 samples/sec   Loss 5.8060   LearningRate 0.0428   Epoch: 6   Global Step: 85860   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:07:55,174-Speed 3219.45 samples/sec   Loss 5.6989   LearningRate 0.0428   Epoch: 6   Global Step: 85870   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:07:58,235-Speed 3346.52 samples/sec   Loss 5.6936   LearningRate 0.0428   Epoch: 6   Global Step: 85880   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:08:01,408-Speed 3228.46 samples/sec   Loss 5.8624   LearningRate 0.0428   Epoch: 6   Global Step: 85890   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:08:04,518-Speed 3293.54 samples/sec   Loss 5.8134   LearningRate 0.0428   Epoch: 6   Global Step: 85900   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:08:07,613-Speed 3309.80 samples/sec   Loss 5.7553   LearningRate 0.0428   Epoch: 6   Global Step: 85910   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:08:10,687-Speed 3332.03 samples/sec   Loss 5.8707   LearningRate 0.0428   Epoch: 6   Global Step: 85920   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:08:13,849-Speed 3240.18 samples/sec   Loss 5.8330   LearningRate 0.0428   Epoch: 6   Global Step: 85930   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:08:16,962-Speed 3289.67 samples/sec   Loss 5.7851   LearningRate 0.0428   Epoch: 6   Global Step: 85940   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:08:20,096-Speed 3268.25 samples/sec   Loss 5.7131   LearningRate 0.0428   Epoch: 6   Global Step: 85950   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:08:23,214-Speed 3285.36 samples/sec   Loss 5.7382   LearningRate 0.0428   Epoch: 6   Global Step: 85960   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:08:26,315-Speed 3304.08 samples/sec   Loss 5.7739   LearningRate 0.0428   Epoch: 6   Global Step: 85970   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:08:29,371-Speed 3350.77 samples/sec   Loss 5.7473   LearningRate 0.0428   Epoch: 6   Global Step: 85980   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:08:32,471-Speed 3304.40 samples/sec   Loss 5.8851   LearningRate 0.0428   Epoch: 6   Global Step: 85990   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:08:35,568-Speed 3307.62 samples/sec   Loss 5.7752   LearningRate 0.0427   Epoch: 6   Global Step: 86000   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:08:38,737-Speed 3232.05 samples/sec   Loss 5.6513   LearningRate 0.0427   Epoch: 6   Global Step: 86010   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:08:41,843-Speed 3298.32 samples/sec   Loss 5.8380   LearningRate 0.0427   Epoch: 6   Global Step: 86020   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:08:44,926-Speed 3322.22 samples/sec   Loss 5.7682   LearningRate 0.0427   Epoch: 6   Global Step: 86030   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:08:48,067-Speed 3261.33 samples/sec   Loss 5.8236   LearningRate 0.0427   Epoch: 6   Global Step: 86040   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:08:51,191-Speed 3279.64 samples/sec   Loss 5.7614   LearningRate 0.0427   Epoch: 6   Global Step: 86050   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:08:54,302-Speed 3292.79 samples/sec   Loss 5.7150   LearningRate 0.0427   Epoch: 6   Global Step: 86060   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:08:57,375-Speed 3332.84 samples/sec   Loss 5.7808   LearningRate 0.0427   Epoch: 6   Global Step: 86070   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:09:00,505-Speed 3272.65 samples/sec   Loss 5.7377   LearningRate 0.0427   Epoch: 6   Global Step: 86080   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:09:03,580-Speed 3331.78 samples/sec   Loss 5.6993   LearningRate 0.0427   Epoch: 6   Global Step: 86090   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:09:06,724-Speed 3257.63 samples/sec   Loss 5.8092   LearningRate 0.0427   Epoch: 6   Global Step: 86100   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:09:09,827-Speed 3300.60 samples/sec   Loss 5.9677   LearningRate 0.0427   Epoch: 6   Global Step: 86110   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:09:12,953-Speed 3277.60 samples/sec   Loss 5.7539   LearningRate 0.0427   Epoch: 6   Global Step: 86120   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:09:16,082-Speed 3273.54 samples/sec   Loss 5.7231   LearningRate 0.0427   Epoch: 6   Global Step: 86130   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:09:19,243-Speed 3240.32 samples/sec   Loss 5.7970   LearningRate 0.0427   Epoch: 6   Global Step: 86140   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:09:22,368-Speed 3277.92 samples/sec   Loss 5.9109   LearningRate 0.0427   Epoch: 6   Global Step: 86150   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:09:25,515-Speed 3255.17 samples/sec   Loss 5.7135   LearningRate 0.0427   Epoch: 6   Global Step: 86160   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:09:28,622-Speed 3296.86 samples/sec   Loss 5.7651   LearningRate 0.0427   Epoch: 6   Global Step: 86170   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:09:31,697-Speed 3330.49 samples/sec   Loss 5.8651   LearningRate 0.0427   Epoch: 6   Global Step: 86180   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:09:34,843-Speed 3256.15 samples/sec   Loss 5.8223   LearningRate 0.0426   Epoch: 6   Global Step: 86190   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:09:37,946-Speed 3300.62 samples/sec   Loss 5.7862   LearningRate 0.0426   Epoch: 6   Global Step: 86200   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:09:41,115-Speed 3232.94 samples/sec   Loss 5.7815   LearningRate 0.0426   Epoch: 6   Global Step: 86210   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:09:44,189-Speed 3333.79 samples/sec   Loss 5.7422   LearningRate 0.0426   Epoch: 6   Global Step: 86220   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:09:47,263-Speed 3332.31 samples/sec   Loss 5.8412   LearningRate 0.0426   Epoch: 6   Global Step: 86230   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:09:50,395-Speed 3270.48 samples/sec   Loss 5.8630   LearningRate 0.0426   Epoch: 6   Global Step: 86240   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:09:53,597-Speed 3199.59 samples/sec   Loss 5.7682   LearningRate 0.0426   Epoch: 6   Global Step: 86250   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:09:56,678-Speed 3323.95 samples/sec   Loss 5.6829   LearningRate 0.0426   Epoch: 6   Global Step: 86260   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:09:59,770-Speed 3312.23 samples/sec   Loss 5.8678   LearningRate 0.0426   Epoch: 6   Global Step: 86270   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:10:02,931-Speed 3241.31 samples/sec   Loss 5.7013   LearningRate 0.0426   Epoch: 6   Global Step: 86280   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:10:06,196-Speed 3137.20 samples/sec   Loss 5.7435   LearningRate 0.0426   Epoch: 6   Global Step: 86290   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:10:09,260-Speed 3342.35 samples/sec   Loss 5.8393   LearningRate 0.0426   Epoch: 6   Global Step: 86300   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:10:12,338-Speed 3328.36 samples/sec   Loss 5.6712   LearningRate 0.0426   Epoch: 6   Global Step: 86310   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:10:15,432-Speed 3311.34 samples/sec   Loss 5.7986   LearningRate 0.0426   Epoch: 6   Global Step: 86320   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:10:18,510-Speed 3326.68 samples/sec   Loss 5.6791   LearningRate 0.0426   Epoch: 6   Global Step: 86330   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:10:21,588-Speed 3328.95 samples/sec   Loss 5.7370   LearningRate 0.0426   Epoch: 6   Global Step: 86340   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:10:24,676-Speed 3317.08 samples/sec   Loss 5.8337   LearningRate 0.0426   Epoch: 6   Global Step: 86350   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:10:27,789-Speed 3290.18 samples/sec   Loss 5.7407   LearningRate 0.0426   Epoch: 6   Global Step: 86360   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:10:30,931-Speed 3260.39 samples/sec   Loss 5.7200   LearningRate 0.0426   Epoch: 6   Global Step: 86370   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:10:34,070-Speed 3264.16 samples/sec   Loss 5.8391   LearningRate 0.0425   Epoch: 6   Global Step: 86380   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:10:37,224-Speed 3247.23 samples/sec   Loss 5.8297   LearningRate 0.0425   Epoch: 6   Global Step: 86390   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:10:40,297-Speed 3333.88 samples/sec   Loss 5.8484   LearningRate 0.0425   Epoch: 6   Global Step: 86400   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:10:43,441-Speed 3257.35 samples/sec   Loss 5.8048   LearningRate 0.0425   Epoch: 6   Global Step: 86410   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:10:46,526-Speed 3320.98 samples/sec   Loss 5.7290   LearningRate 0.0425   Epoch: 6   Global Step: 86420   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:10:49,617-Speed 3312.92 samples/sec   Loss 5.6827   LearningRate 0.0425   Epoch: 6   Global Step: 86430   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:10:52,737-Speed 3283.67 samples/sec   Loss 5.7312   LearningRate 0.0425   Epoch: 6   Global Step: 86440   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:10:55,888-Speed 3250.68 samples/sec   Loss 5.8410   LearningRate 0.0425   Epoch: 6   Global Step: 86450   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:10:58,982-Speed 3310.68 samples/sec   Loss 5.7571   LearningRate 0.0425   Epoch: 6   Global Step: 86460   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:11:02,065-Speed 3322.98 samples/sec   Loss 5.7126   LearningRate 0.0425   Epoch: 6   Global Step: 86470   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:11:05,145-Speed 3325.48 samples/sec   Loss 5.7687   LearningRate 0.0425   Epoch: 6   Global Step: 86480   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:11:08,225-Speed 3325.18 samples/sec   Loss 5.8115   LearningRate 0.0425   Epoch: 6   Global Step: 86490   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:11:11,325-Speed 3304.61 samples/sec   Loss 5.8377   LearningRate 0.0425   Epoch: 6   Global Step: 86500   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:11:14,395-Speed 3336.86 samples/sec   Loss 5.7287   LearningRate 0.0425   Epoch: 6   Global Step: 86510   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:11:17,502-Speed 3296.20 samples/sec   Loss 5.6845   LearningRate 0.0425   Epoch: 6   Global Step: 86520   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:11:20,631-Speed 3274.07 samples/sec   Loss 5.8453   LearningRate 0.0425   Epoch: 6   Global Step: 86530   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:11:23,758-Speed 3276.16 samples/sec   Loss 5.7855   LearningRate 0.0425   Epoch: 6   Global Step: 86540   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:11:26,887-Speed 3273.47 samples/sec   Loss 5.7582   LearningRate 0.0425   Epoch: 6   Global Step: 86550   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:11:30,005-Speed 3285.06 samples/sec   Loss 5.7622   LearningRate 0.0425   Epoch: 6   Global Step: 86560   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:11:33,113-Speed 3296.11 samples/sec   Loss 5.7980   LearningRate 0.0424   Epoch: 6   Global Step: 86570   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:11:36,249-Speed 3266.19 samples/sec   Loss 5.6654   LearningRate 0.0424   Epoch: 6   Global Step: 86580   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:11:39,358-Speed 3294.40 samples/sec   Loss 5.6181   LearningRate 0.0424   Epoch: 6   Global Step: 86590   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:11:42,460-Speed 3302.26 samples/sec   Loss 5.7490   LearningRate 0.0424   Epoch: 6   Global Step: 86600   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:11:45,527-Speed 3339.61 samples/sec   Loss 5.7812   LearningRate 0.0424   Epoch: 6   Global Step: 86610   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:11:48,625-Speed 3306.01 samples/sec   Loss 5.8283   LearningRate 0.0424   Epoch: 6   Global Step: 86620   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:11:51,742-Speed 3286.32 samples/sec   Loss 5.8394   LearningRate 0.0424   Epoch: 6   Global Step: 86630   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:11:54,805-Speed 3344.49 samples/sec   Loss 5.6546   LearningRate 0.0424   Epoch: 6   Global Step: 86640   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:11:57,864-Speed 3347.93 samples/sec   Loss 5.7878   LearningRate 0.0424   Epoch: 6   Global Step: 86650   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:12:00,935-Speed 3335.99 samples/sec   Loss 5.7639   LearningRate 0.0424   Epoch: 6   Global Step: 86660   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:12:04,068-Speed 3268.70 samples/sec   Loss 5.8357   LearningRate 0.0424   Epoch: 6   Global Step: 86670   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:12:07,157-Speed 3316.62 samples/sec   Loss 5.7423   LearningRate 0.0424   Epoch: 6   Global Step: 86680   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:12:10,247-Speed 3314.29 samples/sec   Loss 5.7832   LearningRate 0.0424   Epoch: 6   Global Step: 86690   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:12:13,459-Speed 3189.46 samples/sec   Loss 5.7762   LearningRate 0.0424   Epoch: 6   Global Step: 86700   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:12:16,600-Speed 3260.66 samples/sec   Loss 5.7241   LearningRate 0.0424   Epoch: 6   Global Step: 86710   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:12:19,661-Speed 3345.89 samples/sec   Loss 5.7138   LearningRate 0.0424   Epoch: 6   Global Step: 86720   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:12:22,774-Speed 3291.33 samples/sec   Loss 5.8409   LearningRate 0.0424   Epoch: 6   Global Step: 86730   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:12:25,882-Speed 3295.51 samples/sec   Loss 5.8498   LearningRate 0.0424   Epoch: 6   Global Step: 86740   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:12:29,017-Speed 3267.86 samples/sec   Loss 5.7532   LearningRate 0.0424   Epoch: 6   Global Step: 86750   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:12:32,105-Speed 3316.37 samples/sec   Loss 5.9050   LearningRate 0.0423   Epoch: 6   Global Step: 86760   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:12:35,237-Speed 3270.81 samples/sec   Loss 5.7570   LearningRate 0.0423   Epoch: 6   Global Step: 86770   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:12:38,375-Speed 3264.71 samples/sec   Loss 5.7519   LearningRate 0.0423   Epoch: 6   Global Step: 86780   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:12:41,489-Speed 3289.03 samples/sec   Loss 5.7319   LearningRate 0.0423   Epoch: 6   Global Step: 86790   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:12:44,609-Speed 3283.30 samples/sec   Loss 5.8310   LearningRate 0.0423   Epoch: 6   Global Step: 86800   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:12:47,699-Speed 3314.96 samples/sec   Loss 5.7243   LearningRate 0.0423   Epoch: 6   Global Step: 86810   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:12:50,829-Speed 3272.21 samples/sec   Loss 5.7941   LearningRate 0.0423   Epoch: 6   Global Step: 86820   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:12:54,020-Speed 3210.31 samples/sec   Loss 5.7630   LearningRate 0.0423   Epoch: 6   Global Step: 86830   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:12:57,118-Speed 3306.53 samples/sec   Loss 5.6779   LearningRate 0.0423   Epoch: 6   Global Step: 86840   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:13:00,230-Speed 3291.76 samples/sec   Loss 5.7656   LearningRate 0.0423   Epoch: 6   Global Step: 86850   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:13:03,339-Speed 3295.14 samples/sec   Loss 5.6836   LearningRate 0.0423   Epoch: 6   Global Step: 86860   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:13:06,495-Speed 3245.36 samples/sec   Loss 5.7642   LearningRate 0.0423   Epoch: 6   Global Step: 86870   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:13:09,587-Speed 3312.10 samples/sec   Loss 5.7892   LearningRate 0.0423   Epoch: 6   Global Step: 86880   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:13:12,697-Speed 3294.55 samples/sec   Loss 5.7076   LearningRate 0.0423   Epoch: 6   Global Step: 86890   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:13:15,795-Speed 3305.89 samples/sec   Loss 5.7216   LearningRate 0.0423   Epoch: 6   Global Step: 86900   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:13:18,919-Speed 3279.23 samples/sec   Loss 5.7782   LearningRate 0.0423   Epoch: 6   Global Step: 86910   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:13:21,974-Speed 3352.97 samples/sec   Loss 5.6891   LearningRate 0.0423   Epoch: 6   Global Step: 86920   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:13:25,088-Speed 3289.22 samples/sec   Loss 5.6916   LearningRate 0.0423   Epoch: 6   Global Step: 86930   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:13:28,428-Speed 3067.15 samples/sec   Loss 5.8552   LearningRate 0.0423   Epoch: 6   Global Step: 86940   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:13:59,893-Speed 325.45 samples/sec   Loss 5.2976   LearningRate 0.0422   Epoch: 7   Global Step: 86950   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:14:03,133-Speed 3162.25 samples/sec   Loss 4.3257   LearningRate 0.0422   Epoch: 7   Global Step: 86960   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:14:06,356-Speed 3178.31 samples/sec   Loss 4.3542   LearningRate 0.0422   Epoch: 7   Global Step: 86970   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:14:09,399-Speed 3366.11 samples/sec   Loss 4.3194   LearningRate 0.0422   Epoch: 7   Global Step: 86980   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:14:12,529-Speed 3272.21 samples/sec   Loss 4.2421   LearningRate 0.0422   Epoch: 7   Global Step: 86990   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:14:15,643-Speed 3289.40 samples/sec   Loss 4.3839   LearningRate 0.0422   Epoch: 7   Global Step: 87000   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:14:18,782-Speed 3262.81 samples/sec   Loss 4.3706   LearningRate 0.0422   Epoch: 7   Global Step: 87010   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:14:21,877-Speed 3310.42 samples/sec   Loss 4.3842   LearningRate 0.0422   Epoch: 7   Global Step: 87020   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:14:25,080-Speed 3197.40 samples/sec   Loss 4.3791   LearningRate 0.0422   Epoch: 7   Global Step: 87030   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:14:28,178-Speed 3306.71 samples/sec   Loss 4.3726   LearningRate 0.0422   Epoch: 7   Global Step: 87040   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:14:31,305-Speed 3275.57 samples/sec   Loss 4.3333   LearningRate 0.0422   Epoch: 7   Global Step: 87050   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:14:34,477-Speed 3228.76 samples/sec   Loss 4.3455   LearningRate 0.0422   Epoch: 7   Global Step: 87060   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:14:37,620-Speed 3259.13 samples/sec   Loss 4.3667   LearningRate 0.0422   Epoch: 7   Global Step: 87070   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:14:40,760-Speed 3262.78 samples/sec   Loss 4.4366   LearningRate 0.0422   Epoch: 7   Global Step: 87080   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:14:43,858-Speed 3306.71 samples/sec   Loss 4.4819   LearningRate 0.0422   Epoch: 7   Global Step: 87090   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:14:47,055-Speed 3203.56 samples/sec   Loss 4.4163   LearningRate 0.0422   Epoch: 7   Global Step: 87100   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:14:50,133-Speed 3327.58 samples/sec   Loss 4.5705   LearningRate 0.0422   Epoch: 7   Global Step: 87110   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:14:53,283-Speed 3252.24 samples/sec   Loss 4.4497   LearningRate 0.0422   Epoch: 7   Global Step: 87120   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:14:56,382-Speed 3305.74 samples/sec   Loss 4.3970   LearningRate 0.0422   Epoch: 7   Global Step: 87130   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:14:59,450-Speed 3338.50 samples/sec   Loss 4.4045   LearningRate 0.0421   Epoch: 7   Global Step: 87140   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:15:02,583-Speed 3268.93 samples/sec   Loss 4.5195   LearningRate 0.0421   Epoch: 7   Global Step: 87150   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:15:05,704-Speed 3281.99 samples/sec   Loss 4.4256   LearningRate 0.0421   Epoch: 7   Global Step: 87160   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:15:08,744-Speed 3369.97 samples/sec   Loss 4.2934   LearningRate 0.0421   Epoch: 7   Global Step: 87170   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:15:11,839-Speed 3309.53 samples/sec   Loss 4.4273   LearningRate 0.0421   Epoch: 7   Global Step: 87180   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:15:14,942-Speed 3302.25 samples/sec   Loss 4.4034   LearningRate 0.0421   Epoch: 7   Global Step: 87190   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:15:18,145-Speed 3198.51 samples/sec   Loss 4.4996   LearningRate 0.0421   Epoch: 7   Global Step: 87200   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:15:21,478-Speed 3072.43 samples/sec   Loss 4.4879   LearningRate 0.0421   Epoch: 7   Global Step: 87210   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:15:24,552-Speed 3333.15 samples/sec   Loss 4.4963   LearningRate 0.0421   Epoch: 7   Global Step: 87220   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:15:27,656-Speed 3300.28 samples/sec   Loss 4.4249   LearningRate 0.0421   Epoch: 7   Global Step: 87230   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:15:30,800-Speed 3257.07 samples/sec   Loss 4.4919   LearningRate 0.0421   Epoch: 7   Global Step: 87240   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:15:33,958-Speed 3243.36 samples/sec   Loss 4.4273   LearningRate 0.0421   Epoch: 7   Global Step: 87250   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:15:37,044-Speed 3319.86 samples/sec   Loss 4.5047   LearningRate 0.0421   Epoch: 7   Global Step: 87260   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:15:40,233-Speed 3211.95 samples/sec   Loss 4.4918   LearningRate 0.0421   Epoch: 7   Global Step: 87270   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:15:43,370-Speed 3265.43 samples/sec   Loss 4.5427   LearningRate 0.0421   Epoch: 7   Global Step: 87280   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:15:46,438-Speed 3338.82 samples/sec   Loss 4.4961   LearningRate 0.0421   Epoch: 7   Global Step: 87290   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:15:49,518-Speed 3326.21 samples/sec   Loss 4.5152   LearningRate 0.0421   Epoch: 7   Global Step: 87300   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:15:52,664-Speed 3255.62 samples/sec   Loss 4.5319   LearningRate 0.0421   Epoch: 7   Global Step: 87310   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:15:55,743-Speed 3327.13 samples/sec   Loss 4.4386   LearningRate 0.0421   Epoch: 7   Global Step: 87320   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:15:58,835-Speed 3312.41 samples/sec   Loss 4.4685   LearningRate 0.0420   Epoch: 7   Global Step: 87330   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:16:01,989-Speed 3248.14 samples/sec   Loss 4.4790   LearningRate 0.0420   Epoch: 7   Global Step: 87340   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:16:05,111-Speed 3281.01 samples/sec   Loss 4.5076   LearningRate 0.0420   Epoch: 7   Global Step: 87350   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:16:08,228-Speed 3285.95 samples/sec   Loss 4.4609   LearningRate 0.0420   Epoch: 7   Global Step: 87360   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:16:11,327-Speed 3305.00 samples/sec   Loss 4.4674   LearningRate 0.0420   Epoch: 7   Global Step: 87370   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:16:14,408-Speed 3325.28 samples/sec   Loss 4.5712   LearningRate 0.0420   Epoch: 7   Global Step: 87380   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:16:17,544-Speed 3266.04 samples/sec   Loss 4.5590   LearningRate 0.0420   Epoch: 7   Global Step: 87390   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:16:20,617-Speed 3333.03 samples/sec   Loss 4.5293   LearningRate 0.0420   Epoch: 7   Global Step: 87400   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:16:23,696-Speed 3327.30 samples/sec   Loss 4.4071   LearningRate 0.0420   Epoch: 7   Global Step: 87410   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:16:26,808-Speed 3291.48 samples/sec   Loss 4.5577   LearningRate 0.0420   Epoch: 7   Global Step: 87420   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:16:29,894-Speed 3319.46 samples/sec   Loss 4.6735   LearningRate 0.0420   Epoch: 7   Global Step: 87430   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:16:32,942-Speed 3360.89 samples/sec   Loss 4.5785   LearningRate 0.0420   Epoch: 7   Global Step: 87440   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:16:36,043-Speed 3302.56 samples/sec   Loss 4.4914   LearningRate 0.0420   Epoch: 7   Global Step: 87450   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:16:39,212-Speed 3232.72 samples/sec   Loss 4.6158   LearningRate 0.0420   Epoch: 7   Global Step: 87460   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:16:42,296-Speed 3321.26 samples/sec   Loss 4.5026   LearningRate 0.0420   Epoch: 7   Global Step: 87470   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:16:45,374-Speed 3328.21 samples/sec   Loss 4.6251   LearningRate 0.0420   Epoch: 7   Global Step: 87480   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:16:48,477-Speed 3300.76 samples/sec   Loss 4.5650   LearningRate 0.0420   Epoch: 7   Global Step: 87490   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:16:51,567-Speed 3315.29 samples/sec   Loss 4.6335   LearningRate 0.0420   Epoch: 7   Global Step: 87500   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:16:54,710-Speed 3259.06 samples/sec   Loss 4.5451   LearningRate 0.0420   Epoch: 7   Global Step: 87510   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:16:57,785-Speed 3331.69 samples/sec   Loss 4.5316   LearningRate 0.0420   Epoch: 7   Global Step: 87520   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:17:00,864-Speed 3326.36 samples/sec   Loss 4.5562   LearningRate 0.0419   Epoch: 7   Global Step: 87530   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:17:03,963-Speed 3304.47 samples/sec   Loss 4.5828   LearningRate 0.0419   Epoch: 7   Global Step: 87540   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:17:07,027-Speed 3344.00 samples/sec   Loss 4.5792   LearningRate 0.0419   Epoch: 7   Global Step: 87550   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:17:10,076-Speed 3359.76 samples/sec   Loss 4.6373   LearningRate 0.0419   Epoch: 7   Global Step: 87560   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:17:13,191-Speed 3288.62 samples/sec   Loss 4.5439   LearningRate 0.0419   Epoch: 7   Global Step: 87570   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:17:16,309-Speed 3285.07 samples/sec   Loss 4.6282   LearningRate 0.0419   Epoch: 7   Global Step: 87580   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:17:19,367-Speed 3349.87 samples/sec   Loss 4.6003   LearningRate 0.0419   Epoch: 7   Global Step: 87590   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:17:22,415-Speed 3360.43 samples/sec   Loss 4.5654   LearningRate 0.0419   Epoch: 7   Global Step: 87600   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:17:25,529-Speed 3289.85 samples/sec   Loss 4.6248   LearningRate 0.0419   Epoch: 7   Global Step: 87610   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:17:28,601-Speed 3334.93 samples/sec   Loss 4.4872   LearningRate 0.0419   Epoch: 7   Global Step: 87620   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:17:31,680-Speed 3326.13 samples/sec   Loss 4.5063   LearningRate 0.0419   Epoch: 7   Global Step: 87630   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:17:34,743-Speed 3343.90 samples/sec   Loss 4.6142   LearningRate 0.0419   Epoch: 7   Global Step: 87640   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:17:37,888-Speed 3257.43 samples/sec   Loss 4.6237   LearningRate 0.0419   Epoch: 7   Global Step: 87650   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:17:40,981-Speed 3311.22 samples/sec   Loss 4.6479   LearningRate 0.0419   Epoch: 7   Global Step: 87660   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:17:44,101-Speed 3284.09 samples/sec   Loss 4.6350   LearningRate 0.0419   Epoch: 7   Global Step: 87670   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:17:47,230-Speed 3272.93 samples/sec   Loss 4.5902   LearningRate 0.0419   Epoch: 7   Global Step: 87680   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:17:50,307-Speed 3329.07 samples/sec   Loss 4.5299   LearningRate 0.0419   Epoch: 7   Global Step: 87690   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:17:53,353-Speed 3363.98 samples/sec   Loss 4.5121   LearningRate 0.0419   Epoch: 7   Global Step: 87700   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:17:56,478-Speed 3277.75 samples/sec   Loss 4.5883   LearningRate 0.0419   Epoch: 7   Global Step: 87710   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:17:59,531-Speed 3355.00 samples/sec   Loss 4.5798   LearningRate 0.0418   Epoch: 7   Global Step: 87720   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:18:02,650-Speed 3284.28 samples/sec   Loss 4.6214   LearningRate 0.0418   Epoch: 7   Global Step: 87730   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:18:05,805-Speed 3246.22 samples/sec   Loss 4.5880   LearningRate 0.0418   Epoch: 7   Global Step: 87740   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:18:08,958-Speed 3249.26 samples/sec   Loss 4.6115   LearningRate 0.0418   Epoch: 7   Global Step: 87750   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:18:12,052-Speed 3310.62 samples/sec   Loss 4.6105   LearningRate 0.0418   Epoch: 7   Global Step: 87760   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:18:15,107-Speed 3353.11 samples/sec   Loss 4.6040   LearningRate 0.0418   Epoch: 7   Global Step: 87770   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:18:18,191-Speed 3321.39 samples/sec   Loss 4.6142   LearningRate 0.0418   Epoch: 7   Global Step: 87780   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:18:21,270-Speed 3325.99 samples/sec   Loss 4.6134   LearningRate 0.0418   Epoch: 7   Global Step: 87790   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:18:24,345-Speed 3332.07 samples/sec   Loss 4.6066   LearningRate 0.0418   Epoch: 7   Global Step: 87800   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:18:27,442-Speed 3306.52 samples/sec   Loss 4.5687   LearningRate 0.0418   Epoch: 7   Global Step: 87810   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:18:30,573-Speed 3272.00 samples/sec   Loss 4.6406   LearningRate 0.0418   Epoch: 7   Global Step: 87820   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:18:33,622-Speed 3359.21 samples/sec   Loss 4.4928   LearningRate 0.0418   Epoch: 7   Global Step: 87830   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:18:36,734-Speed 3292.23 samples/sec   Loss 4.7234   LearningRate 0.0418   Epoch: 7   Global Step: 87840   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:18:39,860-Speed 3276.12 samples/sec   Loss 4.7061   LearningRate 0.0418   Epoch: 7   Global Step: 87850   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:18:42,990-Speed 3272.59 samples/sec   Loss 4.6444   LearningRate 0.0418   Epoch: 7   Global Step: 87860   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:18:46,048-Speed 3349.66 samples/sec   Loss 4.5687   LearningRate 0.0418   Epoch: 7   Global Step: 87870   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:18:49,149-Speed 3302.83 samples/sec   Loss 4.6275   LearningRate 0.0418   Epoch: 7   Global Step: 87880   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:18:52,286-Speed 3266.28 samples/sec   Loss 4.5924   LearningRate 0.0418   Epoch: 7   Global Step: 87890   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:18:55,395-Speed 3294.06 samples/sec   Loss 4.6094   LearningRate 0.0418   Epoch: 7   Global Step: 87900   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:18:58,434-Speed 3370.91 samples/sec   Loss 4.7148   LearningRate 0.0417   Epoch: 7   Global Step: 87910   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:19:01,502-Speed 3338.22 samples/sec   Loss 4.6194   LearningRate 0.0417   Epoch: 7   Global Step: 87920   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:19:04,627-Speed 3278.33 samples/sec   Loss 4.6181   LearningRate 0.0417   Epoch: 7   Global Step: 87930   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:19:07,746-Speed 3284.25 samples/sec   Loss 4.5208   LearningRate 0.0417   Epoch: 7   Global Step: 87940   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:19:10,805-Speed 3348.45 samples/sec   Loss 4.6883   LearningRate 0.0417   Epoch: 7   Global Step: 87950   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:19:13,892-Speed 3318.72 samples/sec   Loss 4.6131   LearningRate 0.0417   Epoch: 7   Global Step: 87960   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:19:16,948-Speed 3351.16 samples/sec   Loss 4.7564   LearningRate 0.0417   Epoch: 7   Global Step: 87970   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:19:20,045-Speed 3307.23 samples/sec   Loss 4.6535   LearningRate 0.0417   Epoch: 7   Global Step: 87980   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:19:23,160-Speed 3288.75 samples/sec   Loss 4.6872   LearningRate 0.0417   Epoch: 7   Global Step: 87990   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:19:26,245-Speed 3320.15 samples/sec   Loss 4.6040   LearningRate 0.0417   Epoch: 7   Global Step: 88000   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:19:29,416-Speed 3229.81 samples/sec   Loss 4.6995   LearningRate 0.0417   Epoch: 7   Global Step: 88010   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:19:32,489-Speed 3333.88 samples/sec   Loss 4.6830   LearningRate 0.0417   Epoch: 7   Global Step: 88020   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:19:35,597-Speed 3296.40 samples/sec   Loss 4.7714   LearningRate 0.0417   Epoch: 7   Global Step: 88030   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:19:38,767-Speed 3231.05 samples/sec   Loss 4.6987   LearningRate 0.0417   Epoch: 7   Global Step: 88040   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:19:41,886-Speed 3284.15 samples/sec   Loss 4.7267   LearningRate 0.0417   Epoch: 7   Global Step: 88050   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:19:44,953-Speed 3339.90 samples/sec   Loss 4.7659   LearningRate 0.0417   Epoch: 7   Global Step: 88060   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:19:48,135-Speed 3219.05 samples/sec   Loss 4.6931   LearningRate 0.0417   Epoch: 7   Global Step: 88070   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:19:51,337-Speed 3199.01 samples/sec   Loss 4.5968   LearningRate 0.0417   Epoch: 7   Global Step: 88080   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:19:54,396-Speed 3348.07 samples/sec   Loss 4.6587   LearningRate 0.0417   Epoch: 7   Global Step: 88090   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:19:57,453-Speed 3351.34 samples/sec   Loss 4.6855   LearningRate 0.0416   Epoch: 7   Global Step: 88100   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:20:00,557-Speed 3299.97 samples/sec   Loss 4.6947   LearningRate 0.0416   Epoch: 7   Global Step: 88110   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:20:03,691-Speed 3267.69 samples/sec   Loss 4.7096   LearningRate 0.0416   Epoch: 7   Global Step: 88120   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:20:06,802-Speed 3292.96 samples/sec   Loss 4.6450   LearningRate 0.0416   Epoch: 7   Global Step: 88130   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:20:09,904-Speed 3302.13 samples/sec   Loss 4.6427   LearningRate 0.0416   Epoch: 7   Global Step: 88140   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:20:13,082-Speed 3223.21 samples/sec   Loss 4.6735   LearningRate 0.0416   Epoch: 7   Global Step: 88150   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:20:16,200-Speed 3285.66 samples/sec   Loss 4.7372   LearningRate 0.0416   Epoch: 7   Global Step: 88160   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:20:19,288-Speed 3317.08 samples/sec   Loss 4.6909   LearningRate 0.0416   Epoch: 7   Global Step: 88170   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:20:22,376-Speed 3317.18 samples/sec   Loss 4.6517   LearningRate 0.0416   Epoch: 7   Global Step: 88180   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:20:25,561-Speed 3216.28 samples/sec   Loss 4.7107   LearningRate 0.0416   Epoch: 7   Global Step: 88190   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:20:28,656-Speed 3309.01 samples/sec   Loss 4.7867   LearningRate 0.0416   Epoch: 7   Global Step: 88200   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:20:31,755-Speed 3305.56 samples/sec   Loss 4.6785   LearningRate 0.0416   Epoch: 7   Global Step: 88210   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:20:34,855-Speed 3304.52 samples/sec   Loss 4.6035   LearningRate 0.0416   Epoch: 7   Global Step: 88220   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:20:38,026-Speed 3229.80 samples/sec   Loss 4.6138   LearningRate 0.0416   Epoch: 7   Global Step: 88230   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:20:41,198-Speed 3229.51 samples/sec   Loss 4.7255   LearningRate 0.0416   Epoch: 7   Global Step: 88240   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:20:44,326-Speed 3274.96 samples/sec   Loss 4.7663   LearningRate 0.0416   Epoch: 7   Global Step: 88250   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:20:47,380-Speed 3353.27 samples/sec   Loss 4.6008   LearningRate 0.0416   Epoch: 7   Global Step: 88260   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:20:50,508-Speed 3275.08 samples/sec   Loss 4.6859   LearningRate 0.0416   Epoch: 7   Global Step: 88270   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:20:53,714-Speed 3194.82 samples/sec   Loss 4.8083   LearningRate 0.0416   Epoch: 7   Global Step: 88280   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:20:56,812-Speed 3305.84 samples/sec   Loss 4.7123   LearningRate 0.0416   Epoch: 7   Global Step: 88290   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:20:59,878-Speed 3341.36 samples/sec   Loss 4.5975   LearningRate 0.0415   Epoch: 7   Global Step: 88300   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:21:03,000-Speed 3281.55 samples/sec   Loss 4.7380   LearningRate 0.0415   Epoch: 7   Global Step: 88310   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:21:06,190-Speed 3210.26 samples/sec   Loss 4.7803   LearningRate 0.0415   Epoch: 7   Global Step: 88320   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:21:09,312-Speed 3282.72 samples/sec   Loss 4.7167   LearningRate 0.0415   Epoch: 7   Global Step: 88330   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:21:12,411-Speed 3304.96 samples/sec   Loss 4.8052   LearningRate 0.0415   Epoch: 7   Global Step: 88340   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:21:15,565-Speed 3247.12 samples/sec   Loss 4.7201   LearningRate 0.0415   Epoch: 7   Global Step: 88350   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:21:18,695-Speed 3273.30 samples/sec   Loss 4.7880   LearningRate 0.0415   Epoch: 7   Global Step: 88360   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:21:21,761-Speed 3340.42 samples/sec   Loss 4.6549   LearningRate 0.0415   Epoch: 7   Global Step: 88370   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:21:24,848-Speed 3318.68 samples/sec   Loss 4.8436   LearningRate 0.0415   Epoch: 7   Global Step: 88380   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:21:27,931-Speed 3321.64 samples/sec   Loss 4.7498   LearningRate 0.0415   Epoch: 7   Global Step: 88390   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:21:30,990-Speed 3349.10 samples/sec   Loss 4.7610   LearningRate 0.0415   Epoch: 7   Global Step: 88400   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:21:34,101-Speed 3292.55 samples/sec   Loss 4.7729   LearningRate 0.0415   Epoch: 7   Global Step: 88410   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:21:37,252-Speed 3251.39 samples/sec   Loss 4.7658   LearningRate 0.0415   Epoch: 7   Global Step: 88420   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:21:40,330-Speed 3326.97 samples/sec   Loss 4.7889   LearningRate 0.0415   Epoch: 7   Global Step: 88430   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:21:43,429-Speed 3305.88 samples/sec   Loss 4.7572   LearningRate 0.0415   Epoch: 7   Global Step: 88440   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:21:46,530-Speed 3303.33 samples/sec   Loss 4.7323   LearningRate 0.0415   Epoch: 7   Global Step: 88450   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:21:49,617-Speed 3318.34 samples/sec   Loss 4.6870   LearningRate 0.0415   Epoch: 7   Global Step: 88460   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:21:52,705-Speed 3316.57 samples/sec   Loss 4.8691   LearningRate 0.0415   Epoch: 7   Global Step: 88470   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:21:55,804-Speed 3305.89 samples/sec   Loss 4.6878   LearningRate 0.0415   Epoch: 7   Global Step: 88480   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:21:58,872-Speed 3338.11 samples/sec   Loss 4.7807   LearningRate 0.0414   Epoch: 7   Global Step: 88490   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:22:02,013-Speed 3260.82 samples/sec   Loss 4.7790   LearningRate 0.0414   Epoch: 7   Global Step: 88500   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:22:05,107-Speed 3311.42 samples/sec   Loss 4.8345   LearningRate 0.0414   Epoch: 7   Global Step: 88510   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:22:08,181-Speed 3332.59 samples/sec   Loss 4.7258   LearningRate 0.0414   Epoch: 7   Global Step: 88520   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:22:11,231-Speed 3357.41 samples/sec   Loss 4.8463   LearningRate 0.0414   Epoch: 7   Global Step: 88530   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:22:14,336-Speed 3299.38 samples/sec   Loss 4.7273   LearningRate 0.0414   Epoch: 7   Global Step: 88540   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:22:17,422-Speed 3319.38 samples/sec   Loss 4.8587   LearningRate 0.0414   Epoch: 7   Global Step: 88550   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:22:20,491-Speed 3337.47 samples/sec   Loss 4.8557   LearningRate 0.0414   Epoch: 7   Global Step: 88560   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:22:23,566-Speed 3331.45 samples/sec   Loss 4.8272   LearningRate 0.0414   Epoch: 7   Global Step: 88570   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:22:26,657-Speed 3314.39 samples/sec   Loss 4.7189   LearningRate 0.0414   Epoch: 7   Global Step: 88580   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:22:29,783-Speed 3276.34 samples/sec   Loss 4.7700   LearningRate 0.0414   Epoch: 7   Global Step: 88590   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:22:32,891-Speed 3295.35 samples/sec   Loss 4.7544   LearningRate 0.0414   Epoch: 7   Global Step: 88600   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:22:35,953-Speed 3345.65 samples/sec   Loss 4.7780   LearningRate 0.0414   Epoch: 7   Global Step: 88610   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:22:39,037-Speed 3321.52 samples/sec   Loss 4.7745   LearningRate 0.0414   Epoch: 7   Global Step: 88620   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:22:42,093-Speed 3352.40 samples/sec   Loss 4.8089   LearningRate 0.0414   Epoch: 7   Global Step: 88630   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:22:45,168-Speed 3330.32 samples/sec   Loss 4.6927   LearningRate 0.0414   Epoch: 7   Global Step: 88640   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:22:48,349-Speed 3220.64 samples/sec   Loss 4.8174   LearningRate 0.0414   Epoch: 7   Global Step: 88650   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:22:51,495-Speed 3255.48 samples/sec   Loss 4.7905   LearningRate 0.0414   Epoch: 7   Global Step: 88660   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:22:54,714-Speed 3181.95 samples/sec   Loss 4.8234   LearningRate 0.0414   Epoch: 7   Global Step: 88670   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:22:57,785-Speed 3336.17 samples/sec   Loss 4.8604   LearningRate 0.0413   Epoch: 7   Global Step: 88680   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:23:00,832-Speed 3361.67 samples/sec   Loss 4.6741   LearningRate 0.0413   Epoch: 7   Global Step: 88690   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:23:03,991-Speed 3242.52 samples/sec   Loss 4.8437   LearningRate 0.0413   Epoch: 7   Global Step: 88700   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:23:07,061-Speed 3336.72 samples/sec   Loss 4.7878   LearningRate 0.0413   Epoch: 7   Global Step: 88710   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:23:10,116-Speed 3353.10 samples/sec   Loss 4.7397   LearningRate 0.0413   Epoch: 7   Global Step: 88720   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:23:13,222-Speed 3297.82 samples/sec   Loss 4.8250   LearningRate 0.0413   Epoch: 7   Global Step: 88730   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:23:16,417-Speed 3205.92 samples/sec   Loss 4.8293   LearningRate 0.0413   Epoch: 7   Global Step: 88740   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:23:19,527-Speed 3294.18 samples/sec   Loss 4.8510   LearningRate 0.0413   Epoch: 7   Global Step: 88750   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:23:22,613-Speed 3318.94 samples/sec   Loss 4.8746   LearningRate 0.0413   Epoch: 7   Global Step: 88760   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:23:25,697-Speed 3321.38 samples/sec   Loss 4.8599   LearningRate 0.0413   Epoch: 7   Global Step: 88770   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:23:28,768-Speed 3336.30 samples/sec   Loss 4.8096   LearningRate 0.0413   Epoch: 7   Global Step: 88780   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:23:31,870-Speed 3302.26 samples/sec   Loss 4.8289   LearningRate 0.0413   Epoch: 7   Global Step: 88790   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:23:34,979-Speed 3294.36 samples/sec   Loss 4.8813   LearningRate 0.0413   Epoch: 7   Global Step: 88800   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:23:38,038-Speed 3349.07 samples/sec   Loss 4.8496   LearningRate 0.0413   Epoch: 7   Global Step: 88810   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:23:41,166-Speed 3274.18 samples/sec   Loss 4.7791   LearningRate 0.0413   Epoch: 7   Global Step: 88820   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:23:44,230-Speed 3343.63 samples/sec   Loss 4.8600   LearningRate 0.0413   Epoch: 7   Global Step: 88830   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:23:47,361-Speed 3271.14 samples/sec   Loss 4.8136   LearningRate 0.0413   Epoch: 7   Global Step: 88840   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:23:50,470-Speed 3294.66 samples/sec   Loss 4.9047   LearningRate 0.0413   Epoch: 7   Global Step: 88850   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:23:53,602-Speed 3271.35 samples/sec   Loss 4.8529   LearningRate 0.0413   Epoch: 7   Global Step: 88860   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:23:56,668-Speed 3340.78 samples/sec   Loss 4.9079   LearningRate 0.0412   Epoch: 7   Global Step: 88870   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:23:59,755-Speed 3318.41 samples/sec   Loss 4.8808   LearningRate 0.0412   Epoch: 7   Global Step: 88880   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:24:02,839-Speed 3321.36 samples/sec   Loss 4.7466   LearningRate 0.0412   Epoch: 7   Global Step: 88890   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:24:05,913-Speed 3332.19 samples/sec   Loss 4.9036   LearningRate 0.0412   Epoch: 7   Global Step: 88900   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:24:09,002-Speed 3315.45 samples/sec   Loss 4.7820   LearningRate 0.0412   Epoch: 7   Global Step: 88910   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:24:12,088-Speed 3320.10 samples/sec   Loss 4.9345   LearningRate 0.0412   Epoch: 7   Global Step: 88920   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:24:15,145-Speed 3351.30 samples/sec   Loss 4.7827   LearningRate 0.0412   Epoch: 7   Global Step: 88930   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:24:18,214-Speed 3337.62 samples/sec   Loss 4.9541   LearningRate 0.0412   Epoch: 7   Global Step: 88940   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:24:21,244-Speed 3380.58 samples/sec   Loss 4.8510   LearningRate 0.0412   Epoch: 7   Global Step: 88950   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:24:24,322-Speed 3326.97 samples/sec   Loss 4.8775   LearningRate 0.0412   Epoch: 7   Global Step: 88960   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:24:27,463-Speed 3261.30 samples/sec   Loss 4.8709   LearningRate 0.0412   Epoch: 7   Global Step: 88970   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:24:30,606-Speed 3259.17 samples/sec   Loss 4.8905   LearningRate 0.0412   Epoch: 7   Global Step: 88980   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:24:33,709-Speed 3301.23 samples/sec   Loss 4.9111   LearningRate 0.0412   Epoch: 7   Global Step: 88990   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:24:36,827-Speed 3285.56 samples/sec   Loss 4.7962   LearningRate 0.0412   Epoch: 7   Global Step: 89000   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:24:39,877-Speed 3358.46 samples/sec   Loss 4.8298   LearningRate 0.0412   Epoch: 7   Global Step: 89010   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:24:43,013-Speed 3265.94 samples/sec   Loss 4.8617   LearningRate 0.0412   Epoch: 7   Global Step: 89020   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:24:46,073-Speed 3347.55 samples/sec   Loss 4.8997   LearningRate 0.0412   Epoch: 7   Global Step: 89030   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:24:49,130-Speed 3351.01 samples/sec   Loss 4.8317   LearningRate 0.0412   Epoch: 7   Global Step: 89040   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:24:52,258-Speed 3274.60 samples/sec   Loss 4.9204   LearningRate 0.0412   Epoch: 7   Global Step: 89050   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:24:55,417-Speed 3242.67 samples/sec   Loss 4.8837   LearningRate 0.0412   Epoch: 7   Global Step: 89060   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:24:58,469-Speed 3355.77 samples/sec   Loss 4.9203   LearningRate 0.0411   Epoch: 7   Global Step: 89070   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:25:01,567-Speed 3307.54 samples/sec   Loss 4.7378   LearningRate 0.0411   Epoch: 7   Global Step: 89080   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:25:04,650-Speed 3321.96 samples/sec   Loss 4.9579   LearningRate 0.0411   Epoch: 7   Global Step: 89090   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:25:07,793-Speed 3259.58 samples/sec   Loss 4.9987   LearningRate 0.0411   Epoch: 7   Global Step: 89100   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:25:10,940-Speed 3254.70 samples/sec   Loss 4.8869   LearningRate 0.0411   Epoch: 7   Global Step: 89110   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:25:14,107-Speed 3234.58 samples/sec   Loss 4.8748   LearningRate 0.0411   Epoch: 7   Global Step: 89120   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:25:17,259-Speed 3249.54 samples/sec   Loss 4.9754   LearningRate 0.0411   Epoch: 7   Global Step: 89130   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:25:20,318-Speed 3348.50 samples/sec   Loss 4.7540   LearningRate 0.0411   Epoch: 7   Global Step: 89140   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:25:23,411-Speed 3312.32 samples/sec   Loss 4.8203   LearningRate 0.0411   Epoch: 7   Global Step: 89150   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:25:26,547-Speed 3266.21 samples/sec   Loss 4.9159   LearningRate 0.0411   Epoch: 7   Global Step: 89160   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:25:29,637-Speed 3314.72 samples/sec   Loss 4.9470   LearningRate 0.0411   Epoch: 7   Global Step: 89170   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:25:32,735-Speed 3306.46 samples/sec   Loss 4.8908   LearningRate 0.0411   Epoch: 7   Global Step: 89180   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:25:35,817-Speed 3324.12 samples/sec   Loss 4.9762   LearningRate 0.0411   Epoch: 7   Global Step: 89190   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:25:38,922-Speed 3298.18 samples/sec   Loss 4.9138   LearningRate 0.0411   Epoch: 7   Global Step: 89200   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:25:42,075-Speed 3248.81 samples/sec   Loss 4.7788   LearningRate 0.0411   Epoch: 7   Global Step: 89210   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:25:45,132-Speed 3351.02 samples/sec   Loss 4.8832   LearningRate 0.0411   Epoch: 7   Global Step: 89220   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:25:48,250-Speed 3285.11 samples/sec   Loss 4.9514   LearningRate 0.0411   Epoch: 7   Global Step: 89230   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:25:51,362-Speed 3292.59 samples/sec   Loss 4.8901   LearningRate 0.0411   Epoch: 7   Global Step: 89240   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:25:54,504-Speed 3259.38 samples/sec   Loss 5.0027   LearningRate 0.0411   Epoch: 7   Global Step: 89250   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:25:57,606-Speed 3302.56 samples/sec   Loss 4.8800   LearningRate 0.0410   Epoch: 7   Global Step: 89260   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:26:00,709-Speed 3301.92 samples/sec   Loss 4.9647   LearningRate 0.0410   Epoch: 7   Global Step: 89270   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:26:03,785-Speed 3329.94 samples/sec   Loss 4.8941   LearningRate 0.0410   Epoch: 7   Global Step: 89280   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:26:06,852-Speed 3339.34 samples/sec   Loss 4.9203   LearningRate 0.0410   Epoch: 7   Global Step: 89290   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:26:09,917-Speed 3342.78 samples/sec   Loss 4.9134   LearningRate 0.0410   Epoch: 7   Global Step: 89300   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:26:13,825-Speed 2621.03 samples/sec   Loss 4.9638   LearningRate 0.0410   Epoch: 7   Global Step: 89310   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:26:16,945-Speed 3282.22 samples/sec   Loss 4.8942   LearningRate 0.0410   Epoch: 7   Global Step: 89320   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:26:20,036-Speed 3314.28 samples/sec   Loss 4.9392   LearningRate 0.0410   Epoch: 7   Global Step: 89330   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:26:23,120-Speed 3322.03 samples/sec   Loss 4.9799   LearningRate 0.0410   Epoch: 7   Global Step: 89340   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:26:26,195-Speed 3331.07 samples/sec   Loss 4.9132   LearningRate 0.0410   Epoch: 7   Global Step: 89350   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:26:29,315-Speed 3281.84 samples/sec   Loss 5.0119   LearningRate 0.0410   Epoch: 7   Global Step: 89360   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:26:32,419-Speed 3300.27 samples/sec   Loss 4.9295   LearningRate 0.0410   Epoch: 7   Global Step: 89370   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:26:35,486-Speed 3340.49 samples/sec   Loss 4.9897   LearningRate 0.0410   Epoch: 7   Global Step: 89380   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:26:38,597-Speed 3292.70 samples/sec   Loss 4.8858   LearningRate 0.0410   Epoch: 7   Global Step: 89390   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:26:41,743-Speed 3255.39 samples/sec   Loss 4.9615   LearningRate 0.0410   Epoch: 7   Global Step: 89400   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:26:44,953-Speed 3191.72 samples/sec   Loss 5.0121   LearningRate 0.0410   Epoch: 7   Global Step: 89410   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:26:48,016-Speed 3343.58 samples/sec   Loss 4.9708   LearningRate 0.0410   Epoch: 7   Global Step: 89420   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:26:51,132-Speed 3287.35 samples/sec   Loss 4.8618   LearningRate 0.0410   Epoch: 7   Global Step: 89430   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:26:54,259-Speed 3276.11 samples/sec   Loss 5.0734   LearningRate 0.0410   Epoch: 7   Global Step: 89440   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:26:57,339-Speed 3325.23 samples/sec   Loss 4.9420   LearningRate 0.0410   Epoch: 7   Global Step: 89450   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:27:00,480-Speed 3261.73 samples/sec   Loss 4.8850   LearningRate 0.0409   Epoch: 7   Global Step: 89460   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:27:03,557-Speed 3329.05 samples/sec   Loss 5.0001   LearningRate 0.0409   Epoch: 7   Global Step: 89470   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:27:06,670-Speed 3290.46 samples/sec   Loss 5.0309   LearningRate 0.0409   Epoch: 7   Global Step: 89480   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:27:09,749-Speed 3326.42 samples/sec   Loss 4.9787   LearningRate 0.0409   Epoch: 7   Global Step: 89490   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:27:12,822-Speed 3333.58 samples/sec   Loss 4.8840   LearningRate 0.0409   Epoch: 7   Global Step: 89500   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:27:15,983-Speed 3240.54 samples/sec   Loss 4.9758   LearningRate 0.0409   Epoch: 7   Global Step: 89510   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:27:19,098-Speed 3288.67 samples/sec   Loss 4.8850   LearningRate 0.0409   Epoch: 7   Global Step: 89520   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:27:22,180-Speed 3323.09 samples/sec   Loss 5.0321   LearningRate 0.0409   Epoch: 7   Global Step: 89530   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:27:25,323-Speed 3259.10 samples/sec   Loss 4.8567   LearningRate 0.0409   Epoch: 7   Global Step: 89540   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:27:28,467-Speed 3258.60 samples/sec   Loss 5.0037   LearningRate 0.0409   Epoch: 7   Global Step: 89550   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:27:31,654-Speed 3213.72 samples/sec   Loss 4.9062   LearningRate 0.0409   Epoch: 7   Global Step: 89560   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:27:34,807-Speed 3248.48 samples/sec   Loss 4.9120   LearningRate 0.0409   Epoch: 7   Global Step: 89570   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:27:37,884-Speed 3329.54 samples/sec   Loss 4.9354   LearningRate 0.0409   Epoch: 7   Global Step: 89580   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:27:40,961-Speed 3328.26 samples/sec   Loss 4.9718   LearningRate 0.0409   Epoch: 7   Global Step: 89590   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:27:44,097-Speed 3266.72 samples/sec   Loss 4.9761   LearningRate 0.0409   Epoch: 7   Global Step: 89600   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:27:47,162-Speed 3342.45 samples/sec   Loss 4.9828   LearningRate 0.0409   Epoch: 7   Global Step: 89610   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:27:50,263-Speed 3302.98 samples/sec   Loss 4.9537   LearningRate 0.0409   Epoch: 7   Global Step: 89620   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:27:53,338-Speed 3330.92 samples/sec   Loss 4.9491   LearningRate 0.0409   Epoch: 7   Global Step: 89630   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:27:56,402-Speed 3343.18 samples/sec   Loss 4.9545   LearningRate 0.0409   Epoch: 7   Global Step: 89640   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:27:59,462-Speed 3347.91 samples/sec   Loss 4.8947   LearningRate 0.0408   Epoch: 7   Global Step: 89650   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:28:02,556-Speed 3310.41 samples/sec   Loss 5.0411   LearningRate 0.0408   Epoch: 7   Global Step: 89660   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:28:05,662-Speed 3297.23 samples/sec   Loss 4.9217   LearningRate 0.0408   Epoch: 7   Global Step: 89670   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:28:08,750-Speed 3317.00 samples/sec   Loss 5.0157   LearningRate 0.0408   Epoch: 7   Global Step: 89680   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:28:11,861-Speed 3292.85 samples/sec   Loss 5.0140   LearningRate 0.0408   Epoch: 7   Global Step: 89690   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:28:14,964-Speed 3301.74 samples/sec   Loss 5.0029   LearningRate 0.0408   Epoch: 7   Global Step: 89700   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:28:18,071-Speed 3296.17 samples/sec   Loss 4.9288   LearningRate 0.0408   Epoch: 7   Global Step: 89710   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:28:21,128-Speed 3350.35 samples/sec   Loss 5.0421   LearningRate 0.0408   Epoch: 7   Global Step: 89720   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:28:24,261-Speed 3270.70 samples/sec   Loss 4.9309   LearningRate 0.0408   Epoch: 7   Global Step: 89730   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:28:27,424-Speed 3238.15 samples/sec   Loss 5.0359   LearningRate 0.0408   Epoch: 7   Global Step: 89740   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:28:30,572-Speed 3253.91 samples/sec   Loss 4.9228   LearningRate 0.0408   Epoch: 7   Global Step: 89750   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:28:33,668-Speed 3308.79 samples/sec   Loss 5.0627   LearningRate 0.0408   Epoch: 7   Global Step: 89760   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:28:36,800-Speed 3269.65 samples/sec   Loss 4.9511   LearningRate 0.0408   Epoch: 7   Global Step: 89770   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:28:39,872-Speed 3334.24 samples/sec   Loss 5.0766   LearningRate 0.0408   Epoch: 7   Global Step: 89780   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:28:42,978-Speed 3298.67 samples/sec   Loss 4.9948   LearningRate 0.0408   Epoch: 7   Global Step: 89790   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:28:46,033-Speed 3353.38 samples/sec   Loss 5.0400   LearningRate 0.0408   Epoch: 7   Global Step: 89800   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:28:49,151-Speed 3285.08 samples/sec   Loss 5.0051   LearningRate 0.0408   Epoch: 7   Global Step: 89810   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:28:52,266-Speed 3288.55 samples/sec   Loss 4.9102   LearningRate 0.0408   Epoch: 7   Global Step: 89820   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:28:55,364-Speed 3305.59 samples/sec   Loss 4.9136   LearningRate 0.0408   Epoch: 7   Global Step: 89830   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:28:58,449-Speed 3320.28 samples/sec   Loss 4.9188   LearningRate 0.0407   Epoch: 7   Global Step: 89840   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:29:01,515-Speed 3340.76 samples/sec   Loss 5.1173   LearningRate 0.0407   Epoch: 7   Global Step: 89850   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:29:04,591-Speed 3330.84 samples/sec   Loss 4.9143   LearningRate 0.0407   Epoch: 7   Global Step: 89860   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:29:07,672-Speed 3324.46 samples/sec   Loss 5.0055   LearningRate 0.0407   Epoch: 7   Global Step: 89870   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:29:10,751-Speed 3326.42 samples/sec   Loss 5.0439   LearningRate 0.0407   Epoch: 7   Global Step: 89880   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:29:13,844-Speed 3311.35 samples/sec   Loss 4.9975   LearningRate 0.0407   Epoch: 7   Global Step: 89890   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:29:16,937-Speed 3312.28 samples/sec   Loss 4.9714   LearningRate 0.0407   Epoch: 7   Global Step: 89900   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:29:20,060-Speed 3280.27 samples/sec   Loss 4.9034   LearningRate 0.0407   Epoch: 7   Global Step: 89910   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:29:23,147-Speed 3318.01 samples/sec   Loss 4.9568   LearningRate 0.0407   Epoch: 7   Global Step: 89920   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:29:26,255-Speed 3296.04 samples/sec   Loss 5.0633   LearningRate 0.0407   Epoch: 7   Global Step: 89930   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:29:29,317-Speed 3344.53 samples/sec   Loss 5.0802   LearningRate 0.0407   Epoch: 7   Global Step: 89940   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:29:32,407-Speed 3314.75 samples/sec   Loss 5.0298   LearningRate 0.0407   Epoch: 7   Global Step: 89950   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:29:35,532-Speed 3278.34 samples/sec   Loss 4.9688   LearningRate 0.0407   Epoch: 7   Global Step: 89960   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:29:38,721-Speed 3212.10 samples/sec   Loss 5.0485   LearningRate 0.0407   Epoch: 7   Global Step: 89970   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:29:41,846-Speed 3277.78 samples/sec   Loss 4.9520   LearningRate 0.0407   Epoch: 7   Global Step: 89980   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:29:44,908-Speed 3344.78 samples/sec   Loss 5.0364   LearningRate 0.0407   Epoch: 7   Global Step: 89990   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:29:48,047-Speed 3263.58 samples/sec   Loss 5.0401   LearningRate 0.0407   Epoch: 7   Global Step: 90000   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:29:51,156-Speed 3294.33 samples/sec   Loss 4.9954   LearningRate 0.0407   Epoch: 7   Global Step: 90010   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:29:54,262-Speed 3297.99 samples/sec   Loss 5.0705   LearningRate 0.0407   Epoch: 7   Global Step: 90020   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:29:57,373-Speed 3293.15 samples/sec   Loss 5.0454   LearningRate 0.0407   Epoch: 7   Global Step: 90030   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:30:00,462-Speed 3316.16 samples/sec   Loss 5.1792   LearningRate 0.0406   Epoch: 7   Global Step: 90040   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:30:03,520-Speed 3349.10 samples/sec   Loss 5.0701   LearningRate 0.0406   Epoch: 7   Global Step: 90050   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:30:06,575-Speed 3353.45 samples/sec   Loss 4.9523   LearningRate 0.0406   Epoch: 7   Global Step: 90060   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:30:09,630-Speed 3353.27 samples/sec   Loss 4.9099   LearningRate 0.0406   Epoch: 7   Global Step: 90070   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:30:12,739-Speed 3294.41 samples/sec   Loss 5.0844   LearningRate 0.0406   Epoch: 7   Global Step: 90080   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:30:15,869-Speed 3272.73 samples/sec   Loss 5.0988   LearningRate 0.0406   Epoch: 7   Global Step: 90090   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:30:18,971-Speed 3301.78 samples/sec   Loss 5.0532   LearningRate 0.0406   Epoch: 7   Global Step: 90100   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:30:22,031-Speed 3347.73 samples/sec   Loss 5.0609   LearningRate 0.0406   Epoch: 7   Global Step: 90110   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:30:25,159-Speed 3274.65 samples/sec   Loss 5.0834   LearningRate 0.0406   Epoch: 7   Global Step: 90120   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:30:28,271-Speed 3291.04 samples/sec   Loss 4.9670   LearningRate 0.0406   Epoch: 7   Global Step: 90130   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:30:31,335-Speed 3343.39 samples/sec   Loss 4.9390   LearningRate 0.0406   Epoch: 7   Global Step: 90140   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:30:34,449-Speed 3289.71 samples/sec   Loss 4.9689   LearningRate 0.0406   Epoch: 7   Global Step: 90150   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:30:37,607-Speed 3244.25 samples/sec   Loss 5.1006   LearningRate 0.0406   Epoch: 7   Global Step: 90160   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:30:40,785-Speed 3222.90 samples/sec   Loss 5.0196   LearningRate 0.0406   Epoch: 7   Global Step: 90170   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:30:43,912-Speed 3275.81 samples/sec   Loss 5.0608   LearningRate 0.0406   Epoch: 7   Global Step: 90180   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:30:47,025-Speed 3290.01 samples/sec   Loss 5.0388   LearningRate 0.0406   Epoch: 7   Global Step: 90190   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:30:50,167-Speed 3259.81 samples/sec   Loss 5.0931   LearningRate 0.0406   Epoch: 7   Global Step: 90200   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:30:53,233-Speed 3341.75 samples/sec   Loss 5.0299   LearningRate 0.0406   Epoch: 7   Global Step: 90210   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:30:56,289-Speed 3351.74 samples/sec   Loss 5.0825   LearningRate 0.0406   Epoch: 7   Global Step: 90220   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:30:59,370-Speed 3324.25 samples/sec   Loss 4.9465   LearningRate 0.0405   Epoch: 7   Global Step: 90230   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:31:02,462-Speed 3313.16 samples/sec   Loss 5.0501   LearningRate 0.0405   Epoch: 7   Global Step: 90240   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:31:05,554-Speed 3312.43 samples/sec   Loss 5.0469   LearningRate 0.0405   Epoch: 7   Global Step: 90250   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:31:08,619-Speed 3341.82 samples/sec   Loss 5.0647   LearningRate 0.0405   Epoch: 7   Global Step: 90260   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:31:11,674-Speed 3353.41 samples/sec   Loss 5.0298   LearningRate 0.0405   Epoch: 7   Global Step: 90270   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:31:14,750-Speed 3330.47 samples/sec   Loss 4.9628   LearningRate 0.0405   Epoch: 7   Global Step: 90280   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:31:17,829-Speed 3326.41 samples/sec   Loss 5.0674   LearningRate 0.0405   Epoch: 7   Global Step: 90290   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:31:20,876-Speed 3361.97 samples/sec   Loss 5.0920   LearningRate 0.0405   Epoch: 7   Global Step: 90300   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:31:23,977-Speed 3302.85 samples/sec   Loss 5.0884   LearningRate 0.0405   Epoch: 7   Global Step: 90310   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:31:27,107-Speed 3272.90 samples/sec   Loss 5.0300   LearningRate 0.0405   Epoch: 7   Global Step: 90320   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:31:30,184-Speed 3329.39 samples/sec   Loss 5.0669   LearningRate 0.0405   Epoch: 7   Global Step: 90330   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:31:33,300-Speed 3286.66 samples/sec   Loss 5.0815   LearningRate 0.0405   Epoch: 7   Global Step: 90340   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:31:36,384-Speed 3322.07 samples/sec   Loss 5.0666   LearningRate 0.0405   Epoch: 7   Global Step: 90350   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:31:39,479-Speed 3308.58 samples/sec   Loss 5.0709   LearningRate 0.0405   Epoch: 7   Global Step: 90360   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:31:42,604-Speed 3279.07 samples/sec   Loss 4.9891   LearningRate 0.0405   Epoch: 7   Global Step: 90370   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:31:45,670-Speed 3340.09 samples/sec   Loss 5.0739   LearningRate 0.0405   Epoch: 7   Global Step: 90380   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:31:48,781-Speed 3293.28 samples/sec   Loss 5.0082   LearningRate 0.0405   Epoch: 7   Global Step: 90390   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:31:51,898-Speed 3286.27 samples/sec   Loss 4.9679   LearningRate 0.0405   Epoch: 7   Global Step: 90400   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:31:55,044-Speed 3255.66 samples/sec   Loss 5.0024   LearningRate 0.0405   Epoch: 7   Global Step: 90410   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:31:58,104-Speed 3347.65 samples/sec   Loss 5.0627   LearningRate 0.0405   Epoch: 7   Global Step: 90420   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:32:01,155-Speed 3356.78 samples/sec   Loss 4.9806   LearningRate 0.0404   Epoch: 7   Global Step: 90430   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:32:04,225-Speed 3337.78 samples/sec   Loss 5.1086   LearningRate 0.0404   Epoch: 7   Global Step: 90440   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:32:07,359-Speed 3267.37 samples/sec   Loss 5.0310   LearningRate 0.0404   Epoch: 7   Global Step: 90450   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:32:10,419-Speed 3347.26 samples/sec   Loss 5.1250   LearningRate 0.0404   Epoch: 7   Global Step: 90460   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:32:13,484-Speed 3343.05 samples/sec   Loss 5.0196   LearningRate 0.0404   Epoch: 7   Global Step: 90470   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:32:16,600-Speed 3287.07 samples/sec   Loss 4.9100   LearningRate 0.0404   Epoch: 7   Global Step: 90480   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:32:19,727-Speed 3275.46 samples/sec   Loss 5.0331   LearningRate 0.0404   Epoch: 7   Global Step: 90490   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:32:22,817-Speed 3314.96 samples/sec   Loss 5.1088   LearningRate 0.0404   Epoch: 7   Global Step: 90500   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:32:26,013-Speed 3205.02 samples/sec   Loss 5.0071   LearningRate 0.0404   Epoch: 7   Global Step: 90510   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:32:29,185-Speed 3230.09 samples/sec   Loss 5.0691   LearningRate 0.0404   Epoch: 7   Global Step: 90520   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:32:32,297-Speed 3291.23 samples/sec   Loss 5.0497   LearningRate 0.0404   Epoch: 7   Global Step: 90530   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:32:35,420-Speed 3279.53 samples/sec   Loss 5.0668   LearningRate 0.0404   Epoch: 7   Global Step: 90540   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:32:38,534-Speed 3290.05 samples/sec   Loss 5.0987   LearningRate 0.0404   Epoch: 7   Global Step: 90550   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:32:41,630-Speed 3307.68 samples/sec   Loss 5.1215   LearningRate 0.0404   Epoch: 7   Global Step: 90560   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-04-27 09:32:44,740-Speed 3294.04 samples/sec   Loss 5.0621   LearningRate 0.0404   Epoch: 7   Global Step: 90570   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:32:47,852-Speed 3291.27 samples/sec   Loss 5.2395   LearningRate 0.0404   Epoch: 7   Global Step: 90580   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:32:51,016-Speed 3237.16 samples/sec   Loss 5.0541   LearningRate 0.0404   Epoch: 7   Global Step: 90590   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:32:54,198-Speed 3219.51 samples/sec   Loss 5.0789   LearningRate 0.0404   Epoch: 7   Global Step: 90600   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:32:57,263-Speed 3341.71 samples/sec   Loss 5.0965   LearningRate 0.0404   Epoch: 7   Global Step: 90610   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:33:00,397-Speed 3269.16 samples/sec   Loss 5.1431   LearningRate 0.0403   Epoch: 7   Global Step: 90620   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:33:03,553-Speed 3246.01 samples/sec   Loss 4.9978   LearningRate 0.0403   Epoch: 7   Global Step: 90630   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:33:06,703-Speed 3251.72 samples/sec   Loss 5.1480   LearningRate 0.0403   Epoch: 7   Global Step: 90640   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:33:09,756-Speed 3354.85 samples/sec   Loss 5.0312   LearningRate 0.0403   Epoch: 7   Global Step: 90650   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:33:12,824-Speed 3338.19 samples/sec   Loss 5.1091   LearningRate 0.0403   Epoch: 7   Global Step: 90660   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:33:15,895-Speed 3335.74 samples/sec   Loss 5.0870   LearningRate 0.0403   Epoch: 7   Global Step: 90670   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:33:19,053-Speed 3244.04 samples/sec   Loss 5.0952   LearningRate 0.0403   Epoch: 7   Global Step: 90680   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:33:22,104-Speed 3356.67 samples/sec   Loss 5.1019   LearningRate 0.0403   Epoch: 7   Global Step: 90690   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:33:25,248-Speed 3258.21 samples/sec   Loss 5.1427   LearningRate 0.0403   Epoch: 7   Global Step: 90700   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:33:28,337-Speed 3316.32 samples/sec   Loss 5.1100   LearningRate 0.0403   Epoch: 7   Global Step: 90710   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:33:31,440-Speed 3300.56 samples/sec   Loss 5.0773   LearningRate 0.0403   Epoch: 7   Global Step: 90720   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:33:34,497-Speed 3350.29 samples/sec   Loss 5.1685   LearningRate 0.0403   Epoch: 7   Global Step: 90730   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:33:37,625-Speed 3275.92 samples/sec   Loss 5.1468   LearningRate 0.0403   Epoch: 7   Global Step: 90740   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:33:40,763-Speed 3263.48 samples/sec   Loss 5.1656   LearningRate 0.0403   Epoch: 7   Global Step: 90750   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:33:43,901-Speed 3264.85 samples/sec   Loss 5.0812   LearningRate 0.0403   Epoch: 7   Global Step: 90760   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:33:47,011-Speed 3293.37 samples/sec   Loss 4.9717   LearningRate 0.0403   Epoch: 7   Global Step: 90770   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:33:50,087-Speed 3329.58 samples/sec   Loss 5.1530   LearningRate 0.0403   Epoch: 7   Global Step: 90780   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:33:53,212-Speed 3277.82 samples/sec   Loss 5.0950   LearningRate 0.0403   Epoch: 7   Global Step: 90790   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:33:56,338-Speed 3276.30 samples/sec   Loss 5.0863   LearningRate 0.0403   Epoch: 7   Global Step: 90800   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:33:59,412-Speed 3332.86 samples/sec   Loss 5.1040   LearningRate 0.0403   Epoch: 7   Global Step: 90810   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:34:02,577-Speed 3235.75 samples/sec   Loss 5.0897   LearningRate 0.0402   Epoch: 7   Global Step: 90820   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:34:05,661-Speed 3321.54 samples/sec   Loss 5.0556   LearningRate 0.0402   Epoch: 7   Global Step: 90830   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:34:08,756-Speed 3310.05 samples/sec   Loss 5.1794   LearningRate 0.0402   Epoch: 7   Global Step: 90840   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:34:11,865-Speed 3294.37 samples/sec   Loss 5.0677   LearningRate 0.0402   Epoch: 7   Global Step: 90850   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:34:14,988-Speed 3279.98 samples/sec   Loss 5.1259   LearningRate 0.0402   Epoch: 7   Global Step: 90860   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:34:18,062-Speed 3332.65 samples/sec   Loss 5.0644   LearningRate 0.0402   Epoch: 7   Global Step: 90870   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:34:21,145-Speed 3322.15 samples/sec   Loss 5.0665   LearningRate 0.0402   Epoch: 7   Global Step: 90880   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:34:24,261-Speed 3287.31 samples/sec   Loss 5.1304   LearningRate 0.0402   Epoch: 7   Global Step: 90890   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:34:27,423-Speed 3239.40 samples/sec   Loss 5.0744   LearningRate 0.0402   Epoch: 7   Global Step: 90900   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:34:30,493-Speed 3336.13 samples/sec   Loss 5.1497   LearningRate 0.0402   Epoch: 7   Global Step: 90910   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:34:33,566-Speed 3333.76 samples/sec   Loss 5.1504   LearningRate 0.0402   Epoch: 7   Global Step: 90920   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:34:36,660-Speed 3311.06 samples/sec   Loss 5.0414   LearningRate 0.0402   Epoch: 7   Global Step: 90930   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-04-27 09:34:39,775-Speed 3288.62 samples/sec   Loss 5.2194   LearningRate 0.0402   Epoch: 7   Global Step: 90940   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-04-27 09:34:42,836-Speed 3346.22 samples/sec   Loss 5.1038   LearningRate 0.0402   Epoch: 7   Global Step: 90950   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:34:45,936-Speed 3303.69 samples/sec   Loss 5.1934   LearningRate 0.0402   Epoch: 7   Global Step: 90960   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:34:49,019-Speed 3322.61 samples/sec   Loss 5.1042   LearningRate 0.0402   Epoch: 7   Global Step: 90970   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:34:52,197-Speed 3223.49 samples/sec   Loss 5.0785   LearningRate 0.0402   Epoch: 7   Global Step: 90980   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:34:55,301-Speed 3299.73 samples/sec   Loss 5.2221   LearningRate 0.0402   Epoch: 7   Global Step: 90990   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:34:58,383-Speed 3323.33 samples/sec   Loss 5.1319   LearningRate 0.0402   Epoch: 7   Global Step: 91000   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:35:01,487-Speed 3300.65 samples/sec   Loss 5.0167   LearningRate 0.0402   Epoch: 7   Global Step: 91010   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:35:04,613-Speed 3276.32 samples/sec   Loss 5.1239   LearningRate 0.0401   Epoch: 7   Global Step: 91020   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:35:07,721-Speed 3296.18 samples/sec   Loss 5.1612   LearningRate 0.0401   Epoch: 7   Global Step: 91030   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:35:10,806-Speed 3319.83 samples/sec   Loss 5.1606   LearningRate 0.0401   Epoch: 7   Global Step: 91040   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:35:13,978-Speed 3229.68 samples/sec   Loss 5.1656   LearningRate 0.0401   Epoch: 7   Global Step: 91050   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:35:17,147-Speed 3232.20 samples/sec   Loss 5.0767   LearningRate 0.0401   Epoch: 7   Global Step: 91060   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:35:20,265-Speed 3285.81 samples/sec   Loss 5.2210   LearningRate 0.0401   Epoch: 7   Global Step: 91070   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:35:23,426-Speed 3240.49 samples/sec   Loss 5.1346   LearningRate 0.0401   Epoch: 7   Global Step: 91080   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:35:26,534-Speed 3295.22 samples/sec   Loss 5.0801   LearningRate 0.0401   Epoch: 7   Global Step: 91090   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:35:29,609-Speed 3331.60 samples/sec   Loss 5.0568   LearningRate 0.0401   Epoch: 7   Global Step: 91100   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:35:32,741-Speed 3270.70 samples/sec   Loss 5.1189   LearningRate 0.0401   Epoch: 7   Global Step: 91110   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:35:35,910-Speed 3232.42 samples/sec   Loss 5.1265   LearningRate 0.0401   Epoch: 7   Global Step: 91120   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:35:39,060-Speed 3251.87 samples/sec   Loss 5.0421   LearningRate 0.0401   Epoch: 7   Global Step: 91130   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:35:42,184-Speed 3278.98 samples/sec   Loss 5.1294   LearningRate 0.0401   Epoch: 7   Global Step: 91140   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:35:45,277-Speed 3311.55 samples/sec   Loss 5.1871   LearningRate 0.0401   Epoch: 7   Global Step: 91150   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:35:48,443-Speed 3235.15 samples/sec   Loss 5.1489   LearningRate 0.0401   Epoch: 7   Global Step: 91160   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:35:51,570-Speed 3275.20 samples/sec   Loss 5.1316   LearningRate 0.0401   Epoch: 7   Global Step: 91170   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:35:54,730-Speed 3241.83 samples/sec   Loss 5.1217   LearningRate 0.0401   Epoch: 7   Global Step: 91180   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:35:57,848-Speed 3285.75 samples/sec   Loss 5.1403   LearningRate 0.0401   Epoch: 7   Global Step: 91190   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:36:00,929-Speed 3324.48 samples/sec   Loss 5.0646   LearningRate 0.0401   Epoch: 7   Global Step: 91200   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:36:04,045-Speed 3286.94 samples/sec   Loss 5.1312   LearningRate 0.0400   Epoch: 7   Global Step: 91210   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:36:07,169-Speed 3278.64 samples/sec   Loss 5.1656   LearningRate 0.0400   Epoch: 7   Global Step: 91220   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:36:10,238-Speed 3338.06 samples/sec   Loss 5.1498   LearningRate 0.0400   Epoch: 7   Global Step: 91230   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:36:13,360-Speed 3281.11 samples/sec   Loss 5.1172   LearningRate 0.0400   Epoch: 7   Global Step: 91240   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:36:16,461-Speed 3303.05 samples/sec   Loss 5.1109   LearningRate 0.0400   Epoch: 7   Global Step: 91250   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:36:19,574-Speed 3291.06 samples/sec   Loss 5.2195   LearningRate 0.0400   Epoch: 7   Global Step: 91260   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:36:22,664-Speed 3315.02 samples/sec   Loss 5.0758   LearningRate 0.0400   Epoch: 7   Global Step: 91270   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:36:25,868-Speed 3196.71 samples/sec   Loss 5.1777   LearningRate 0.0400   Epoch: 7   Global Step: 91280   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:36:28,994-Speed 3276.73 samples/sec   Loss 5.1528   LearningRate 0.0400   Epoch: 7   Global Step: 91290   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:36:32,092-Speed 3306.25 samples/sec   Loss 5.1229   LearningRate 0.0400   Epoch: 7   Global Step: 91300   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:36:35,273-Speed 3220.05 samples/sec   Loss 5.1697   LearningRate 0.0400   Epoch: 7   Global Step: 91310   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:36:38,400-Speed 3276.33 samples/sec   Loss 5.1947   LearningRate 0.0400   Epoch: 7   Global Step: 91320   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:36:41,527-Speed 3275.50 samples/sec   Loss 5.1709   LearningRate 0.0400   Epoch: 7   Global Step: 91330   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:36:44,594-Speed 3339.39 samples/sec   Loss 5.1130   LearningRate 0.0400   Epoch: 7   Global Step: 91340   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:36:47,670-Speed 3330.62 samples/sec   Loss 5.2324   LearningRate 0.0400   Epoch: 7   Global Step: 91350   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:36:50,818-Speed 3253.59 samples/sec   Loss 5.1663   LearningRate 0.0400   Epoch: 7   Global Step: 91360   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:36:53,956-Speed 3264.22 samples/sec   Loss 5.2204   LearningRate 0.0400   Epoch: 7   Global Step: 91370   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:36:57,013-Speed 3350.80 samples/sec   Loss 5.0491   LearningRate 0.0400   Epoch: 7   Global Step: 91380   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:37:00,162-Speed 3253.05 samples/sec   Loss 5.1848   LearningRate 0.0400   Epoch: 7   Global Step: 91390   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:37:03,320-Speed 3243.58 samples/sec   Loss 5.1357   LearningRate 0.0400   Epoch: 7   Global Step: 91400   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:37:06,476-Speed 3245.05 samples/sec   Loss 5.1994   LearningRate 0.0399   Epoch: 7   Global Step: 91410   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:37:09,555-Speed 3327.58 samples/sec   Loss 5.0363   LearningRate 0.0399   Epoch: 7   Global Step: 91420   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:37:12,667-Speed 3291.16 samples/sec   Loss 5.1574   LearningRate 0.0399   Epoch: 7   Global Step: 91430   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:37:15,782-Speed 3288.01 samples/sec   Loss 5.1301   LearningRate 0.0399   Epoch: 7   Global Step: 91440   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:37:18,904-Speed 3281.81 samples/sec   Loss 5.1524   LearningRate 0.0399   Epoch: 7   Global Step: 91450   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:37:22,016-Speed 3291.14 samples/sec   Loss 5.1995   LearningRate 0.0399   Epoch: 7   Global Step: 91460   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:37:25,116-Speed 3304.39 samples/sec   Loss 5.2130   LearningRate 0.0399   Epoch: 7   Global Step: 91470   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:37:28,176-Speed 3347.04 samples/sec   Loss 5.1269   LearningRate 0.0399   Epoch: 7   Global Step: 91480   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:37:31,290-Speed 3289.48 samples/sec   Loss 5.1688   LearningRate 0.0399   Epoch: 7   Global Step: 91490   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:37:34,340-Speed 3358.49 samples/sec   Loss 5.2279   LearningRate 0.0399   Epoch: 7   Global Step: 91500   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:37:37,499-Speed 3242.53 samples/sec   Loss 5.1333   LearningRate 0.0399   Epoch: 7   Global Step: 91510   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:37:40,687-Speed 3213.49 samples/sec   Loss 5.1212   LearningRate 0.0399   Epoch: 7   Global Step: 91520   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:37:43,834-Speed 3255.12 samples/sec   Loss 5.1567   LearningRate 0.0399   Epoch: 7   Global Step: 91530   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:37:46,893-Speed 3348.13 samples/sec   Loss 5.1822   LearningRate 0.0399   Epoch: 7   Global Step: 91540   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:37:49,995-Speed 3301.80 samples/sec   Loss 5.1212   LearningRate 0.0399   Epoch: 7   Global Step: 91550   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:37:53,103-Speed 3296.65 samples/sec   Loss 5.2512   LearningRate 0.0399   Epoch: 7   Global Step: 91560   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:37:56,175-Speed 3334.22 samples/sec   Loss 5.0713   LearningRate 0.0399   Epoch: 7   Global Step: 91570   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:37:59,241-Speed 3340.62 samples/sec   Loss 5.2154   LearningRate 0.0399   Epoch: 7   Global Step: 91580   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:38:02,339-Speed 3307.02 samples/sec   Loss 5.2198   LearningRate 0.0399   Epoch: 7   Global Step: 91590   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:38:05,418-Speed 3326.85 samples/sec   Loss 5.2643   LearningRate 0.0399   Epoch: 7   Global Step: 91600   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:38:08,489-Speed 3334.37 samples/sec   Loss 5.1547   LearningRate 0.0398   Epoch: 7   Global Step: 91610   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:38:11,602-Speed 3290.70 samples/sec   Loss 5.2058   LearningRate 0.0398   Epoch: 7   Global Step: 91620   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:38:14,719-Speed 3286.45 samples/sec   Loss 5.0965   LearningRate 0.0398   Epoch: 7   Global Step: 91630   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:38:17,940-Speed 3180.46 samples/sec   Loss 5.1662   LearningRate 0.0398   Epoch: 7   Global Step: 91640   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:38:21,017-Speed 3329.00 samples/sec   Loss 5.2706   LearningRate 0.0398   Epoch: 7   Global Step: 91650   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:38:24,103-Speed 3319.40 samples/sec   Loss 5.2031   LearningRate 0.0398   Epoch: 7   Global Step: 91660   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:38:27,205-Speed 3301.77 samples/sec   Loss 5.1307   LearningRate 0.0398   Epoch: 7   Global Step: 91670   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:38:30,327-Speed 3280.90 samples/sec   Loss 5.0511   LearningRate 0.0398   Epoch: 7   Global Step: 91680   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:38:33,383-Speed 3351.94 samples/sec   Loss 5.2124   LearningRate 0.0398   Epoch: 7   Global Step: 91690   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:38:36,500-Speed 3286.51 samples/sec   Loss 5.1556   LearningRate 0.0398   Epoch: 7   Global Step: 91700   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:38:39,616-Speed 3287.10 samples/sec   Loss 5.1049   LearningRate 0.0398   Epoch: 7   Global Step: 91710   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:38:42,766-Speed 3251.26 samples/sec   Loss 5.1792   LearningRate 0.0398   Epoch: 7   Global Step: 91720   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:38:45,874-Speed 3295.78 samples/sec   Loss 5.1084   LearningRate 0.0398   Epoch: 7   Global Step: 91730   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:38:48,947-Speed 3334.12 samples/sec   Loss 5.2362   LearningRate 0.0398   Epoch: 7   Global Step: 91740   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:38:52,047-Speed 3304.22 samples/sec   Loss 5.3057   LearningRate 0.0398   Epoch: 7   Global Step: 91750   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:38:55,099-Speed 3355.98 samples/sec   Loss 5.1200   LearningRate 0.0398   Epoch: 7   Global Step: 91760   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:38:58,151-Speed 3356.79 samples/sec   Loss 5.2913   LearningRate 0.0398   Epoch: 7   Global Step: 91770   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:39:01,216-Speed 3342.10 samples/sec   Loss 5.1883   LearningRate 0.0398   Epoch: 7   Global Step: 91780   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:39:04,303-Speed 3318.08 samples/sec   Loss 5.2738   LearningRate 0.0398   Epoch: 7   Global Step: 91790   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:39:07,422-Speed 3284.19 samples/sec   Loss 5.2285   LearningRate 0.0397   Epoch: 7   Global Step: 91800   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:39:10,518-Speed 3308.14 samples/sec   Loss 5.1152   LearningRate 0.0397   Epoch: 7   Global Step: 91810   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:39:13,649-Speed 3272.75 samples/sec   Loss 5.2643   LearningRate 0.0397   Epoch: 7   Global Step: 91820   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:39:16,727-Speed 3327.11 samples/sec   Loss 5.1512   LearningRate 0.0397   Epoch: 7   Global Step: 91830   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:39:19,832-Speed 3299.86 samples/sec   Loss 5.1970   LearningRate 0.0397   Epoch: 7   Global Step: 91840   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:39:22,914-Speed 3323.02 samples/sec   Loss 5.1513   LearningRate 0.0397   Epoch: 7   Global Step: 91850   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:39:26,010-Speed 3308.69 samples/sec   Loss 5.1243   LearningRate 0.0397   Epoch: 7   Global Step: 91860   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:39:29,134-Speed 3278.61 samples/sec   Loss 5.1731   LearningRate 0.0397   Epoch: 7   Global Step: 91870   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:39:32,271-Speed 3265.30 samples/sec   Loss 5.1647   LearningRate 0.0397   Epoch: 7   Global Step: 91880   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:39:35,447-Speed 3226.01 samples/sec   Loss 5.2174   LearningRate 0.0397   Epoch: 7   Global Step: 91890   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:39:38,614-Speed 3233.43 samples/sec   Loss 5.1928   LearningRate 0.0397   Epoch: 7   Global Step: 91900   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:39:41,733-Speed 3284.52 samples/sec   Loss 5.1771   LearningRate 0.0397   Epoch: 7   Global Step: 91910   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:39:44,855-Speed 3280.47 samples/sec   Loss 5.1330   LearningRate 0.0397   Epoch: 7   Global Step: 91920   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:39:47,971-Speed 3288.06 samples/sec   Loss 5.1049   LearningRate 0.0397   Epoch: 7   Global Step: 91930   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:39:51,123-Speed 3248.97 samples/sec   Loss 5.2060   LearningRate 0.0397   Epoch: 7   Global Step: 91940   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:39:54,260-Speed 3265.19 samples/sec   Loss 5.1951   LearningRate 0.0397   Epoch: 7   Global Step: 91950   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:39:57,349-Speed 3316.52 samples/sec   Loss 5.1056   LearningRate 0.0397   Epoch: 7   Global Step: 91960   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:40:00,432-Speed 3321.90 samples/sec   Loss 5.1901   LearningRate 0.0397   Epoch: 7   Global Step: 91970   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:40:03,540-Speed 3295.90 samples/sec   Loss 5.2598   LearningRate 0.0397   Epoch: 7   Global Step: 91980   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:40:06,643-Speed 3301.21 samples/sec   Loss 5.2331   LearningRate 0.0397   Epoch: 7   Global Step: 91990   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:40:09,757-Speed 3288.77 samples/sec   Loss 5.1878   LearningRate 0.0396   Epoch: 7   Global Step: 92000   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:40:12,876-Speed 3284.48 samples/sec   Loss 5.1388   LearningRate 0.0396   Epoch: 7   Global Step: 92010   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:40:15,983-Speed 3296.42 samples/sec   Loss 5.2059   LearningRate 0.0396   Epoch: 7   Global Step: 92020   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:40:19,083-Speed 3304.73 samples/sec   Loss 5.2867   LearningRate 0.0396   Epoch: 7   Global Step: 92030   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:40:22,146-Speed 3344.09 samples/sec   Loss 5.1794   LearningRate 0.0396   Epoch: 7   Global Step: 92040   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:40:25,237-Speed 3313.89 samples/sec   Loss 5.2917   LearningRate 0.0396   Epoch: 7   Global Step: 92050   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:40:28,376-Speed 3262.53 samples/sec   Loss 5.3223   LearningRate 0.0396   Epoch: 7   Global Step: 92060   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:40:31,545-Speed 3233.06 samples/sec   Loss 5.2165   LearningRate 0.0396   Epoch: 7   Global Step: 92070   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:40:34,650-Speed 3298.79 samples/sec   Loss 5.2611   LearningRate 0.0396   Epoch: 7   Global Step: 92080   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:40:37,763-Speed 3290.86 samples/sec   Loss 5.1992   LearningRate 0.0396   Epoch: 7   Global Step: 92090   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:40:40,867-Speed 3299.62 samples/sec   Loss 5.2102   LearningRate 0.0396   Epoch: 7   Global Step: 92100   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:40:43,946-Speed 3326.81 samples/sec   Loss 5.0381   LearningRate 0.0396   Epoch: 7   Global Step: 92110   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:40:47,021-Speed 3331.78 samples/sec   Loss 5.1864   LearningRate 0.0396   Epoch: 7   Global Step: 92120   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:40:50,186-Speed 3235.98 samples/sec   Loss 5.2172   LearningRate 0.0396   Epoch: 7   Global Step: 92130   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:40:53,339-Speed 3248.35 samples/sec   Loss 5.1539   LearningRate 0.0396   Epoch: 7   Global Step: 92140   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:40:56,449-Speed 3294.31 samples/sec   Loss 5.2418   LearningRate 0.0396   Epoch: 7   Global Step: 92150   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:40:59,566-Speed 3286.20 samples/sec   Loss 5.2036   LearningRate 0.0396   Epoch: 7   Global Step: 92160   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:41:02,671-Speed 3298.68 samples/sec   Loss 5.2287   LearningRate 0.0396   Epoch: 7   Global Step: 92170   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:41:05,720-Speed 3359.68 samples/sec   Loss 5.1514   LearningRate 0.0396   Epoch: 7   Global Step: 92180   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:41:08,820-Speed 3304.79 samples/sec   Loss 5.1725   LearningRate 0.0396   Epoch: 7   Global Step: 92190   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:41:11,929-Speed 3294.94 samples/sec   Loss 5.1850   LearningRate 0.0395   Epoch: 7   Global Step: 92200   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:41:15,049-Speed 3283.46 samples/sec   Loss 5.1668   LearningRate 0.0395   Epoch: 7   Global Step: 92210   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:41:18,156-Speed 3296.14 samples/sec   Loss 5.2342   LearningRate 0.0395   Epoch: 7   Global Step: 92220   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:41:21,224-Speed 3339.08 samples/sec   Loss 5.1222   LearningRate 0.0395   Epoch: 7   Global Step: 92230   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:41:24,325-Speed 3303.14 samples/sec   Loss 5.1958   LearningRate 0.0395   Epoch: 7   Global Step: 92240   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:41:27,460-Speed 3267.57 samples/sec   Loss 5.2282   LearningRate 0.0395   Epoch: 7   Global Step: 92250   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:41:30,548-Speed 3316.77 samples/sec   Loss 5.2378   LearningRate 0.0395   Epoch: 7   Global Step: 92260   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:41:33,661-Speed 3291.26 samples/sec   Loss 5.1734   LearningRate 0.0395   Epoch: 7   Global Step: 92270   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:41:36,848-Speed 3213.33 samples/sec   Loss 5.1328   LearningRate 0.0395   Epoch: 7   Global Step: 92280   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:41:39,966-Speed 3285.30 samples/sec   Loss 5.1820   LearningRate 0.0395   Epoch: 7   Global Step: 92290   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:41:43,060-Speed 3310.79 samples/sec   Loss 5.1532   LearningRate 0.0395   Epoch: 7   Global Step: 92300   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:41:46,149-Speed 3316.27 samples/sec   Loss 5.1475   LearningRate 0.0395   Epoch: 7   Global Step: 92310   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:41:49,211-Speed 3345.51 samples/sec   Loss 5.1836   LearningRate 0.0395   Epoch: 7   Global Step: 92320   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:41:52,327-Speed 3287.09 samples/sec   Loss 5.2076   LearningRate 0.0395   Epoch: 7   Global Step: 92330   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:41:55,532-Speed 3195.97 samples/sec   Loss 5.2995   LearningRate 0.0395   Epoch: 7   Global Step: 92340   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:41:58,655-Speed 3280.36 samples/sec   Loss 5.3175   LearningRate 0.0395   Epoch: 7   Global Step: 92350   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:42:01,762-Speed 3296.31 samples/sec   Loss 5.1216   LearningRate 0.0395   Epoch: 7   Global Step: 92360   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:42:04,975-Speed 3188.66 samples/sec   Loss 5.2941   LearningRate 0.0395   Epoch: 7   Global Step: 92370   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:42:08,112-Speed 3264.55 samples/sec   Loss 5.3962   LearningRate 0.0395   Epoch: 7   Global Step: 92380   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:42:11,207-Speed 3309.96 samples/sec   Loss 5.2405   LearningRate 0.0394   Epoch: 7   Global Step: 92390   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:42:14,307-Speed 3305.04 samples/sec   Loss 5.2195   LearningRate 0.0394   Epoch: 7   Global Step: 92400   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:42:17,395-Speed 3316.62 samples/sec   Loss 5.3540   LearningRate 0.0394   Epoch: 7   Global Step: 92410   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:42:20,466-Speed 3335.86 samples/sec   Loss 5.1985   LearningRate 0.0394   Epoch: 7   Global Step: 92420   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:42:23,569-Speed 3301.61 samples/sec   Loss 5.2509   LearningRate 0.0394   Epoch: 7   Global Step: 92430   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:42:26,631-Speed 3344.59 samples/sec   Loss 5.2192   LearningRate 0.0394   Epoch: 7   Global Step: 92440   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:42:29,742-Speed 3292.55 samples/sec   Loss 5.1793   LearningRate 0.0394   Epoch: 7   Global Step: 92450   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:42:32,797-Speed 3353.52 samples/sec   Loss 5.3144   LearningRate 0.0394   Epoch: 7   Global Step: 92460   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:42:35,909-Speed 3291.65 samples/sec   Loss 5.2310   LearningRate 0.0394   Epoch: 7   Global Step: 92470   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 09:42:39,080-Speed 3230.24 samples/sec   Loss 5.2383   LearningRate 0.0394   Epoch: 7   Global Step: 92480   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:42:42,195-Speed 3287.85 samples/sec   Loss 5.1607   LearningRate 0.0394   Epoch: 7   Global Step: 92490   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:42:45,240-Speed 3364.18 samples/sec   Loss 5.2420   LearningRate 0.0394   Epoch: 7   Global Step: 92500   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:42:48,357-Speed 3286.41 samples/sec   Loss 5.2084   LearningRate 0.0394   Epoch: 7   Global Step: 92510   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:42:51,502-Speed 3256.79 samples/sec   Loss 5.2270   LearningRate 0.0394   Epoch: 7   Global Step: 92520   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:42:54,651-Speed 3253.28 samples/sec   Loss 5.2016   LearningRate 0.0394   Epoch: 7   Global Step: 92530   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:42:57,776-Speed 3278.35 samples/sec   Loss 5.2539   LearningRate 0.0394   Epoch: 7   Global Step: 92540   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:43:00,892-Speed 3287.07 samples/sec   Loss 5.2764   LearningRate 0.0394   Epoch: 7   Global Step: 92550   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:43:04,074-Speed 3218.72 samples/sec   Loss 5.2869   LearningRate 0.0394   Epoch: 7   Global Step: 92560   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:43:07,154-Speed 3326.58 samples/sec   Loss 5.2690   LearningRate 0.0394   Epoch: 7   Global Step: 92570   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:43:10,229-Speed 3331.19 samples/sec   Loss 5.2674   LearningRate 0.0394   Epoch: 7   Global Step: 92580   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:43:13,386-Speed 3244.42 samples/sec   Loss 5.2443   LearningRate 0.0393   Epoch: 7   Global Step: 92590   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:43:16,567-Speed 3220.15 samples/sec   Loss 5.2753   LearningRate 0.0393   Epoch: 7   Global Step: 92600   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:43:19,698-Speed 3270.91 samples/sec   Loss 5.2005   LearningRate 0.0393   Epoch: 7   Global Step: 92610   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:43:22,738-Speed 3369.84 samples/sec   Loss 5.2721   LearningRate 0.0393   Epoch: 7   Global Step: 92620   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:43:25,799-Speed 3346.66 samples/sec   Loss 5.1099   LearningRate 0.0393   Epoch: 7   Global Step: 92630   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:43:28,833-Speed 3375.66 samples/sec   Loss 5.2237   LearningRate 0.0393   Epoch: 7   Global Step: 92640   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:43:31,915-Speed 3323.47 samples/sec   Loss 5.1391   LearningRate 0.0393   Epoch: 7   Global Step: 92650   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:43:34,996-Speed 3325.49 samples/sec   Loss 5.3233   LearningRate 0.0393   Epoch: 7   Global Step: 92660   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:43:38,050-Speed 3354.08 samples/sec   Loss 5.1361   LearningRate 0.0393   Epoch: 7   Global Step: 92670   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:43:41,130-Speed 3325.47 samples/sec   Loss 5.2544   LearningRate 0.0393   Epoch: 7   Global Step: 92680   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:43:44,201-Speed 3335.68 samples/sec   Loss 5.1475   LearningRate 0.0393   Epoch: 7   Global Step: 92690   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:43:47,275-Speed 3331.95 samples/sec   Loss 5.2022   LearningRate 0.0393   Epoch: 7   Global Step: 92700   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:43:50,399-Speed 3278.82 samples/sec   Loss 5.2326   LearningRate 0.0393   Epoch: 7   Global Step: 92710   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:43:53,511-Speed 3290.94 samples/sec   Loss 5.2166   LearningRate 0.0393   Epoch: 7   Global Step: 92720   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:43:56,575-Speed 3343.99 samples/sec   Loss 5.2011   LearningRate 0.0393   Epoch: 7   Global Step: 92730   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:43:59,650-Speed 3330.32 samples/sec   Loss 5.2958   LearningRate 0.0393   Epoch: 7   Global Step: 92740   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:44:02,736-Speed 3319.17 samples/sec   Loss 5.2349   LearningRate 0.0393   Epoch: 7   Global Step: 92750   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:44:05,865-Speed 3273.55 samples/sec   Loss 5.3324   LearningRate 0.0393   Epoch: 7   Global Step: 92760   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:44:08,949-Speed 3322.21 samples/sec   Loss 5.2950   LearningRate 0.0393   Epoch: 7   Global Step: 92770   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:44:12,081-Speed 3269.69 samples/sec   Loss 5.2770   LearningRate 0.0393   Epoch: 7   Global Step: 92780   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:44:15,307-Speed 3175.44 samples/sec   Loss 5.2225   LearningRate 0.0392   Epoch: 7   Global Step: 92790   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:44:18,448-Speed 3261.88 samples/sec   Loss 5.2675   LearningRate 0.0392   Epoch: 7   Global Step: 92800   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:44:21,534-Speed 3319.34 samples/sec   Loss 5.2439   LearningRate 0.0392   Epoch: 7   Global Step: 92810   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:44:24,696-Speed 3238.83 samples/sec   Loss 5.2067   LearningRate 0.0392   Epoch: 7   Global Step: 92820   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:44:27,766-Speed 3336.28 samples/sec   Loss 5.1742   LearningRate 0.0392   Epoch: 7   Global Step: 92830   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:44:30,800-Speed 3376.84 samples/sec   Loss 5.1589   LearningRate 0.0392   Epoch: 7   Global Step: 92840   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:44:33,861-Speed 3345.65 samples/sec   Loss 5.3613   LearningRate 0.0392   Epoch: 7   Global Step: 92850   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:44:36,959-Speed 3307.29 samples/sec   Loss 5.2846   LearningRate 0.0392   Epoch: 7   Global Step: 92860   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:44:40,011-Speed 3355.91 samples/sec   Loss 5.2101   LearningRate 0.0392   Epoch: 7   Global Step: 92870   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:44:43,153-Speed 3259.96 samples/sec   Loss 5.2480   LearningRate 0.0392   Epoch: 7   Global Step: 92880   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:44:46,258-Speed 3299.20 samples/sec   Loss 5.3022   LearningRate 0.0392   Epoch: 7   Global Step: 92890   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:44:49,368-Speed 3294.02 samples/sec   Loss 5.2262   LearningRate 0.0392   Epoch: 7   Global Step: 92900   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:44:52,473-Speed 3299.62 samples/sec   Loss 5.2432   LearningRate 0.0392   Epoch: 7   Global Step: 92910   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:44:55,607-Speed 3268.46 samples/sec   Loss 5.2556   LearningRate 0.0392   Epoch: 7   Global Step: 92920   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:44:58,730-Speed 3278.98 samples/sec   Loss 5.2634   LearningRate 0.0392   Epoch: 7   Global Step: 92930   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:45:01,795-Speed 3342.26 samples/sec   Loss 5.3150   LearningRate 0.0392   Epoch: 7   Global Step: 92940   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:45:04,876-Speed 3325.91 samples/sec   Loss 5.2210   LearningRate 0.0392   Epoch: 7   Global Step: 92950   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:45:07,935-Speed 3347.88 samples/sec   Loss 5.1904   LearningRate 0.0392   Epoch: 7   Global Step: 92960   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:45:11,026-Speed 3313.41 samples/sec   Loss 5.1845   LearningRate 0.0392   Epoch: 7   Global Step: 92970   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:45:14,272-Speed 3155.67 samples/sec   Loss 5.3341   LearningRate 0.0392   Epoch: 7   Global Step: 92980   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:45:17,336-Speed 3343.38 samples/sec   Loss 5.2281   LearningRate 0.0391   Epoch: 7   Global Step: 92990   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:45:20,389-Speed 3355.70 samples/sec   Loss 5.3708   LearningRate 0.0391   Epoch: 7   Global Step: 93000   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:45:23,492-Speed 3300.33 samples/sec   Loss 5.2789   LearningRate 0.0391   Epoch: 7   Global Step: 93010   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:45:26,571-Speed 3327.64 samples/sec   Loss 5.2067   LearningRate 0.0391   Epoch: 7   Global Step: 93020   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:45:29,662-Speed 3313.82 samples/sec   Loss 5.2189   LearningRate 0.0391   Epoch: 7   Global Step: 93030   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:45:32,696-Speed 3375.49 samples/sec   Loss 5.3565   LearningRate 0.0391   Epoch: 7   Global Step: 93040   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:45:35,896-Speed 3201.56 samples/sec   Loss 5.2597   LearningRate 0.0391   Epoch: 7   Global Step: 93050   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:45:39,018-Speed 3280.75 samples/sec   Loss 5.2196   LearningRate 0.0391   Epoch: 7   Global Step: 93060   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:45:42,238-Speed 3180.91 samples/sec   Loss 5.3035   LearningRate 0.0391   Epoch: 7   Global Step: 93070   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:45:45,316-Speed 3328.27 samples/sec   Loss 5.3005   LearningRate 0.0391   Epoch: 7   Global Step: 93080   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:45:48,385-Speed 3337.42 samples/sec   Loss 5.1287   LearningRate 0.0391   Epoch: 7   Global Step: 93090   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:45:51,515-Speed 3272.11 samples/sec   Loss 5.2976   LearningRate 0.0391   Epoch: 7   Global Step: 93100   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:45:54,574-Speed 3348.63 samples/sec   Loss 5.2163   LearningRate 0.0391   Epoch: 7   Global Step: 93110   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:45:57,656-Speed 3324.52 samples/sec   Loss 5.1968   LearningRate 0.0391   Epoch: 7   Global Step: 93120   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:46:00,790-Speed 3267.65 samples/sec   Loss 5.2382   LearningRate 0.0391   Epoch: 7   Global Step: 93130   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:46:03,854-Speed 3343.02 samples/sec   Loss 5.2823   LearningRate 0.0391   Epoch: 7   Global Step: 93140   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:46:06,906-Speed 3356.98 samples/sec   Loss 5.3784   LearningRate 0.0391   Epoch: 7   Global Step: 93150   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:46:09,985-Speed 3326.13 samples/sec   Loss 5.3170   LearningRate 0.0391   Epoch: 7   Global Step: 93160   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:46:13,166-Speed 3220.85 samples/sec   Loss 5.1787   LearningRate 0.0391   Epoch: 7   Global Step: 93170   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:46:16,316-Speed 3251.36 samples/sec   Loss 5.2189   LearningRate 0.0391   Epoch: 7   Global Step: 93180   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:46:19,422-Speed 3297.56 samples/sec   Loss 5.2473   LearningRate 0.0390   Epoch: 7   Global Step: 93190   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:46:22,601-Speed 3221.78 samples/sec   Loss 5.1967   LearningRate 0.0390   Epoch: 7   Global Step: 93200   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:46:25,712-Speed 3293.00 samples/sec   Loss 5.2625   LearningRate 0.0390   Epoch: 7   Global Step: 93210   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:46:28,842-Speed 3272.70 samples/sec   Loss 5.3069   LearningRate 0.0390   Epoch: 7   Global Step: 93220   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:46:31,914-Speed 3334.52 samples/sec   Loss 5.2303   LearningRate 0.0390   Epoch: 7   Global Step: 93230   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:46:35,077-Speed 3238.09 samples/sec   Loss 5.2555   LearningRate 0.0390   Epoch: 7   Global Step: 93240   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:46:38,212-Speed 3268.07 samples/sec   Loss 5.3124   LearningRate 0.0390   Epoch: 7   Global Step: 93250   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:46:41,318-Speed 3296.98 samples/sec   Loss 5.1876   LearningRate 0.0390   Epoch: 7   Global Step: 93260   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:46:44,368-Speed 3358.34 samples/sec   Loss 5.2688   LearningRate 0.0390   Epoch: 7   Global Step: 93270   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:46:47,454-Speed 3319.50 samples/sec   Loss 5.3161   LearningRate 0.0390   Epoch: 7   Global Step: 93280   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:46:50,570-Speed 3287.71 samples/sec   Loss 5.1386   LearningRate 0.0390   Epoch: 7   Global Step: 93290   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:46:53,661-Speed 3313.86 samples/sec   Loss 5.1566   LearningRate 0.0390   Epoch: 7   Global Step: 93300   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:46:56,731-Speed 3335.94 samples/sec   Loss 5.2731   LearningRate 0.0390   Epoch: 7   Global Step: 93310   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:46:59,822-Speed 3314.73 samples/sec   Loss 5.2972   LearningRate 0.0390   Epoch: 7   Global Step: 93320   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:47:02,906-Speed 3320.91 samples/sec   Loss 5.1974   LearningRate 0.0390   Epoch: 7   Global Step: 93330   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:47:06,004-Speed 3307.32 samples/sec   Loss 5.2340   LearningRate 0.0390   Epoch: 7   Global Step: 93340   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:47:09,065-Speed 3346.08 samples/sec   Loss 5.3972   LearningRate 0.0390   Epoch: 7   Global Step: 93350   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:47:12,161-Speed 3307.72 samples/sec   Loss 5.2669   LearningRate 0.0390   Epoch: 7   Global Step: 93360   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:47:15,297-Speed 3267.17 samples/sec   Loss 5.2603   LearningRate 0.0390   Epoch: 7   Global Step: 93370   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:47:18,369-Speed 3334.69 samples/sec   Loss 5.2776   LearningRate 0.0390   Epoch: 7   Global Step: 93380   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:47:21,450-Speed 3323.85 samples/sec   Loss 5.2919   LearningRate 0.0389   Epoch: 7   Global Step: 93390   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:47:24,538-Speed 3317.74 samples/sec   Loss 5.2695   LearningRate 0.0389   Epoch: 7   Global Step: 93400   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:47:27,632-Speed 3309.86 samples/sec   Loss 5.2657   LearningRate 0.0389   Epoch: 7   Global Step: 93410   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:47:30,727-Speed 3309.88 samples/sec   Loss 5.1884   LearningRate 0.0389   Epoch: 7   Global Step: 93420   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:47:33,800-Speed 3333.83 samples/sec   Loss 5.2507   LearningRate 0.0389   Epoch: 7   Global Step: 93430   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:47:36,945-Speed 3256.76 samples/sec   Loss 5.3349   LearningRate 0.0389   Epoch: 7   Global Step: 93440   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:47:40,027-Speed 3323.93 samples/sec   Loss 5.2775   LearningRate 0.0389   Epoch: 7   Global Step: 93450   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:47:43,136-Speed 3294.56 samples/sec   Loss 5.2842   LearningRate 0.0389   Epoch: 7   Global Step: 93460   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:47:46,206-Speed 3335.52 samples/sec   Loss 5.2411   LearningRate 0.0389   Epoch: 7   Global Step: 93470   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:47:49,361-Speed 3246.85 samples/sec   Loss 5.3006   LearningRate 0.0389   Epoch: 7   Global Step: 93480   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:47:52,440-Speed 3326.87 samples/sec   Loss 5.2689   LearningRate 0.0389   Epoch: 7   Global Step: 93490   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:47:55,621-Speed 3220.48 samples/sec   Loss 5.2660   LearningRate 0.0389   Epoch: 7   Global Step: 93500   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:47:58,670-Speed 3359.54 samples/sec   Loss 5.3255   LearningRate 0.0389   Epoch: 7   Global Step: 93510   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:48:01,809-Speed 3263.61 samples/sec   Loss 5.2422   LearningRate 0.0389   Epoch: 7   Global Step: 93520   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:48:04,870-Speed 3346.21 samples/sec   Loss 5.2606   LearningRate 0.0389   Epoch: 7   Global Step: 93530   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:48:07,966-Speed 3308.44 samples/sec   Loss 5.1625   LearningRate 0.0389   Epoch: 7   Global Step: 93540   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:48:11,035-Speed 3337.38 samples/sec   Loss 5.2816   LearningRate 0.0389   Epoch: 7   Global Step: 93550   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:48:14,111-Speed 3331.00 samples/sec   Loss 5.3119   LearningRate 0.0389   Epoch: 7   Global Step: 93560   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:48:17,227-Speed 3287.05 samples/sec   Loss 5.2096   LearningRate 0.0389   Epoch: 7   Global Step: 93570   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:48:20,323-Speed 3307.94 samples/sec   Loss 5.2219   LearningRate 0.0389   Epoch: 7   Global Step: 93580   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:48:23,464-Speed 3261.39 samples/sec   Loss 5.2297   LearningRate 0.0388   Epoch: 7   Global Step: 93590   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:48:26,669-Speed 3195.91 samples/sec   Loss 5.2058   LearningRate 0.0388   Epoch: 7   Global Step: 93600   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:48:29,904-Speed 3166.79 samples/sec   Loss 5.2414   LearningRate 0.0388   Epoch: 7   Global Step: 93610   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:48:32,998-Speed 3310.91 samples/sec   Loss 5.3222   LearningRate 0.0388   Epoch: 7   Global Step: 93620   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:48:36,113-Speed 3288.05 samples/sec   Loss 5.2458   LearningRate 0.0388   Epoch: 7   Global Step: 93630   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:48:39,187-Speed 3331.75 samples/sec   Loss 5.2307   LearningRate 0.0388   Epoch: 7   Global Step: 93640   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:48:42,321-Speed 3268.46 samples/sec   Loss 5.2504   LearningRate 0.0388   Epoch: 7   Global Step: 93650   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:48:45,380-Speed 3348.67 samples/sec   Loss 5.2408   LearningRate 0.0388   Epoch: 7   Global Step: 93660   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:48:48,465-Speed 3319.93 samples/sec   Loss 5.3301   LearningRate 0.0388   Epoch: 7   Global Step: 93670   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:48:51,617-Speed 3250.46 samples/sec   Loss 5.1941   LearningRate 0.0388   Epoch: 7   Global Step: 93680   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:48:54,670-Speed 3355.38 samples/sec   Loss 5.3320   LearningRate 0.0388   Epoch: 7   Global Step: 93690   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:48:57,720-Speed 3357.59 samples/sec   Loss 5.3112   LearningRate 0.0388   Epoch: 7   Global Step: 93700   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:49:00,841-Speed 3282.08 samples/sec   Loss 5.2224   LearningRate 0.0388   Epoch: 7   Global Step: 93710   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:49:03,928-Speed 3318.53 samples/sec   Loss 5.2087   LearningRate 0.0388   Epoch: 7   Global Step: 93720   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:49:07,022-Speed 3311.19 samples/sec   Loss 5.3237   LearningRate 0.0388   Epoch: 7   Global Step: 93730   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:49:10,109-Speed 3319.26 samples/sec   Loss 5.2742   LearningRate 0.0388   Epoch: 7   Global Step: 93740   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:49:13,257-Speed 3253.86 samples/sec   Loss 5.2470   LearningRate 0.0388   Epoch: 7   Global Step: 93750   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:49:16,364-Speed 3296.21 samples/sec   Loss 5.2719   LearningRate 0.0388   Epoch: 7   Global Step: 93760   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:49:20,037-Speed 2788.68 samples/sec   Loss 5.3905   LearningRate 0.0388   Epoch: 7   Global Step: 93770   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:49:23,109-Speed 3334.17 samples/sec   Loss 5.3238   LearningRate 0.0387   Epoch: 7   Global Step: 93780   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:49:26,194-Speed 3320.70 samples/sec   Loss 5.2909   LearningRate 0.0387   Epoch: 7   Global Step: 93790   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:49:29,328-Speed 3268.27 samples/sec   Loss 5.2638   LearningRate 0.0387   Epoch: 7   Global Step: 93800   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:49:32,471-Speed 3258.76 samples/sec   Loss 5.2672   LearningRate 0.0387   Epoch: 7   Global Step: 93810   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:49:35,624-Speed 3249.73 samples/sec   Loss 5.2688   LearningRate 0.0387   Epoch: 7   Global Step: 93820   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:49:38,715-Speed 3314.00 samples/sec   Loss 5.4024   LearningRate 0.0387   Epoch: 7   Global Step: 93830   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:49:41,838-Speed 3278.97 samples/sec   Loss 5.3043   LearningRate 0.0387   Epoch: 7   Global Step: 93840   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:49:44,891-Speed 3355.11 samples/sec   Loss 5.2996   LearningRate 0.0387   Epoch: 7   Global Step: 93850   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:49:47,938-Speed 3362.19 samples/sec   Loss 5.2195   LearningRate 0.0387   Epoch: 7   Global Step: 93860   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:49:51,024-Speed 3319.92 samples/sec   Loss 5.2473   LearningRate 0.0387   Epoch: 7   Global Step: 93870   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:49:54,119-Speed 3309.22 samples/sec   Loss 5.3038   LearningRate 0.0387   Epoch: 7   Global Step: 93880   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:49:57,178-Speed 3348.67 samples/sec   Loss 5.3243   LearningRate 0.0387   Epoch: 7   Global Step: 93890   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:50:00,264-Speed 3319.23 samples/sec   Loss 5.2588   LearningRate 0.0387   Epoch: 7   Global Step: 93900   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:50:03,374-Speed 3293.61 samples/sec   Loss 5.3385   LearningRate 0.0387   Epoch: 7   Global Step: 93910   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:50:06,530-Speed 3246.08 samples/sec   Loss 5.1690   LearningRate 0.0387   Epoch: 7   Global Step: 93920   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:50:09,599-Speed 3336.62 samples/sec   Loss 5.3014   LearningRate 0.0387   Epoch: 7   Global Step: 93930   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:50:12,704-Speed 3299.89 samples/sec   Loss 5.3195   LearningRate 0.0387   Epoch: 7   Global Step: 93940   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:50:15,827-Speed 3280.12 samples/sec   Loss 5.3088   LearningRate 0.0387   Epoch: 7   Global Step: 93950   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:50:18,911-Speed 3321.46 samples/sec   Loss 5.2535   LearningRate 0.0387   Epoch: 7   Global Step: 93960   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:50:21,948-Speed 3372.26 samples/sec   Loss 5.2812   LearningRate 0.0387   Epoch: 7   Global Step: 93970   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:50:25,071-Speed 3280.52 samples/sec   Loss 5.2488   LearningRate 0.0386   Epoch: 7   Global Step: 93980   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:50:28,190-Speed 3282.98 samples/sec   Loss 5.2013   LearningRate 0.0386   Epoch: 7   Global Step: 93990   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:50:31,311-Speed 3282.20 samples/sec   Loss 5.2444   LearningRate 0.0386   Epoch: 7   Global Step: 94000   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:50:34,385-Speed 3332.87 samples/sec   Loss 5.2944   LearningRate 0.0386   Epoch: 7   Global Step: 94010   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:50:37,493-Speed 3295.77 samples/sec   Loss 5.2277   LearningRate 0.0386   Epoch: 7   Global Step: 94020   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:50:40,583-Speed 3314.98 samples/sec   Loss 5.2931   LearningRate 0.0386   Epoch: 7   Global Step: 94030   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:50:43,713-Speed 3271.99 samples/sec   Loss 5.3293   LearningRate 0.0386   Epoch: 7   Global Step: 94040   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:50:46,782-Speed 3338.15 samples/sec   Loss 5.2117   LearningRate 0.0386   Epoch: 7   Global Step: 94050   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:50:50,014-Speed 3168.97 samples/sec   Loss 5.3529   LearningRate 0.0386   Epoch: 7   Global Step: 94060   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:50:53,156-Speed 3259.91 samples/sec   Loss 5.2922   LearningRate 0.0386   Epoch: 7   Global Step: 94070   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:50:56,265-Speed 3294.29 samples/sec   Loss 5.2823   LearningRate 0.0386   Epoch: 7   Global Step: 94080   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:51:00,052-Speed 2704.99 samples/sec   Loss 5.3072   LearningRate 0.0386   Epoch: 7   Global Step: 94090   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:51:04,394-Speed 2359.23 samples/sec   Loss 5.3447   LearningRate 0.0386   Epoch: 7   Global Step: 94100   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:51:07,467-Speed 3333.22 samples/sec   Loss 5.2170   LearningRate 0.0386   Epoch: 7   Global Step: 94110   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:51:10,517-Speed 3358.12 samples/sec   Loss 5.2997   LearningRate 0.0386   Epoch: 7   Global Step: 94120   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:51:13,610-Speed 3312.21 samples/sec   Loss 5.3168   LearningRate 0.0386   Epoch: 7   Global Step: 94130   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:51:16,795-Speed 3215.43 samples/sec   Loss 5.3361   LearningRate 0.0386   Epoch: 7   Global Step: 94140   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:51:19,842-Speed 3361.54 samples/sec   Loss 5.3529   LearningRate 0.0386   Epoch: 7   Global Step: 94150   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:51:23,066-Speed 3178.04 samples/sec   Loss 5.2719   LearningRate 0.0386   Epoch: 7   Global Step: 94160   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:51:26,158-Speed 3312.95 samples/sec   Loss 5.2899   LearningRate 0.0386   Epoch: 7   Global Step: 94170   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:51:29,233-Speed 3330.81 samples/sec   Loss 5.2488   LearningRate 0.0385   Epoch: 7   Global Step: 94180   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:51:32,295-Speed 3344.91 samples/sec   Loss 5.3521   LearningRate 0.0385   Epoch: 7   Global Step: 94190   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:51:35,392-Speed 3307.56 samples/sec   Loss 5.3014   LearningRate 0.0385   Epoch: 7   Global Step: 94200   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:51:38,462-Speed 3336.89 samples/sec   Loss 5.3717   LearningRate 0.0385   Epoch: 7   Global Step: 94210   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:51:41,597-Speed 3267.45 samples/sec   Loss 5.2852   LearningRate 0.0385   Epoch: 7   Global Step: 94220   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:51:44,704-Speed 3297.22 samples/sec   Loss 5.3599   LearningRate 0.0385   Epoch: 7   Global Step: 94230   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:51:47,852-Speed 3253.56 samples/sec   Loss 5.1929   LearningRate 0.0385   Epoch: 7   Global Step: 94240   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:51:50,955-Speed 3300.55 samples/sec   Loss 5.3142   LearningRate 0.0385   Epoch: 7   Global Step: 94250   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:51:54,072-Speed 3286.02 samples/sec   Loss 5.2626   LearningRate 0.0385   Epoch: 7   Global Step: 94260   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:51:57,156-Speed 3321.62 samples/sec   Loss 5.3553   LearningRate 0.0385   Epoch: 7   Global Step: 94270   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:52:00,225-Speed 3337.28 samples/sec   Loss 5.3309   LearningRate 0.0385   Epoch: 7   Global Step: 94280   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:52:03,323-Speed 3306.66 samples/sec   Loss 5.2612   LearningRate 0.0385   Epoch: 7   Global Step: 94290   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:52:06,465-Speed 3260.51 samples/sec   Loss 5.2778   LearningRate 0.0385   Epoch: 7   Global Step: 94300   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:52:09,558-Speed 3311.96 samples/sec   Loss 5.2835   LearningRate 0.0385   Epoch: 7   Global Step: 94310   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:52:12,683-Speed 3277.71 samples/sec   Loss 5.1968   LearningRate 0.0385   Epoch: 7   Global Step: 94320   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:52:15,785-Speed 3302.30 samples/sec   Loss 5.3173   LearningRate 0.0385   Epoch: 7   Global Step: 94330   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:52:18,847-Speed 3346.34 samples/sec   Loss 5.2345   LearningRate 0.0385   Epoch: 7   Global Step: 94340   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:52:21,917-Speed 3335.64 samples/sec   Loss 5.2312   LearningRate 0.0385   Epoch: 7   Global Step: 94350   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:52:25,102-Speed 3216.01 samples/sec   Loss 5.3619   LearningRate 0.0385   Epoch: 7   Global Step: 94360   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:52:28,255-Speed 3249.18 samples/sec   Loss 5.3102   LearningRate 0.0385   Epoch: 7   Global Step: 94370   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:52:31,444-Speed 3211.84 samples/sec   Loss 5.3131   LearningRate 0.0384   Epoch: 7   Global Step: 94380   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:52:34,534-Speed 3315.57 samples/sec   Loss 5.3031   LearningRate 0.0384   Epoch: 7   Global Step: 94390   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:52:37,768-Speed 3167.24 samples/sec   Loss 5.3783   LearningRate 0.0384   Epoch: 7   Global Step: 94400   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:52:40,859-Speed 3313.26 samples/sec   Loss 5.2793   LearningRate 0.0384   Epoch: 7   Global Step: 94410   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:52:43,929-Speed 3337.31 samples/sec   Loss 5.3241   LearningRate 0.0384   Epoch: 7   Global Step: 94420   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:52:47,014-Speed 3320.02 samples/sec   Loss 5.2571   LearningRate 0.0384   Epoch: 7   Global Step: 94430   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:52:50,121-Speed 3296.65 samples/sec   Loss 5.2803   LearningRate 0.0384   Epoch: 7   Global Step: 94440   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:52:53,218-Speed 3307.75 samples/sec   Loss 5.3512   LearningRate 0.0384   Epoch: 7   Global Step: 94450   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:52:56,313-Speed 3310.15 samples/sec   Loss 5.2968   LearningRate 0.0384   Epoch: 7   Global Step: 94460   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:52:59,378-Speed 3341.66 samples/sec   Loss 5.3387   LearningRate 0.0384   Epoch: 7   Global Step: 94470   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:53:02,486-Speed 3296.37 samples/sec   Loss 5.2665   LearningRate 0.0384   Epoch: 7   Global Step: 94480   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:53:05,553-Speed 3339.10 samples/sec   Loss 5.3515   LearningRate 0.0384   Epoch: 7   Global Step: 94490   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:53:08,609-Speed 3352.36 samples/sec   Loss 5.3544   LearningRate 0.0384   Epoch: 7   Global Step: 94500   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:53:11,674-Speed 3341.61 samples/sec   Loss 5.2598   LearningRate 0.0384   Epoch: 7   Global Step: 94510   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:53:14,740-Speed 3341.59 samples/sec   Loss 5.2832   LearningRate 0.0384   Epoch: 7   Global Step: 94520   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:53:17,828-Speed 3316.80 samples/sec   Loss 5.3307   LearningRate 0.0384   Epoch: 7   Global Step: 94530   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:53:20,883-Speed 3352.80 samples/sec   Loss 5.3705   LearningRate 0.0384   Epoch: 7   Global Step: 94540   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:53:23,984-Speed 3303.71 samples/sec   Loss 5.2566   LearningRate 0.0384   Epoch: 7   Global Step: 94550   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:53:27,063-Speed 3325.78 samples/sec   Loss 5.2773   LearningRate 0.0384   Epoch: 7   Global Step: 94560   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:53:30,131-Speed 3339.60 samples/sec   Loss 5.2385   LearningRate 0.0384   Epoch: 7   Global Step: 94570   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:53:33,179-Speed 3360.69 samples/sec   Loss 5.3553   LearningRate 0.0384   Epoch: 7   Global Step: 94580   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:53:36,249-Speed 3336.58 samples/sec   Loss 5.2584   LearningRate 0.0383   Epoch: 7   Global Step: 94590   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:53:39,354-Speed 3297.84 samples/sec   Loss 5.2399   LearningRate 0.0383   Epoch: 7   Global Step: 94600   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:53:42,458-Speed 3300.77 samples/sec   Loss 5.2780   LearningRate 0.0383   Epoch: 7   Global Step: 94610   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:53:45,543-Speed 3319.91 samples/sec   Loss 5.2683   LearningRate 0.0383   Epoch: 7   Global Step: 94620   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:53:48,700-Speed 3244.74 samples/sec   Loss 5.2949   LearningRate 0.0383   Epoch: 7   Global Step: 94630   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:53:51,842-Speed 3260.12 samples/sec   Loss 5.2902   LearningRate 0.0383   Epoch: 7   Global Step: 94640   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:53:54,955-Speed 3290.71 samples/sec   Loss 5.3884   LearningRate 0.0383   Epoch: 7   Global Step: 94650   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:53:57,997-Speed 3366.94 samples/sec   Loss 5.3986   LearningRate 0.0383   Epoch: 7   Global Step: 94660   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:54:01,051-Speed 3353.74 samples/sec   Loss 5.3393   LearningRate 0.0383   Epoch: 7   Global Step: 94670   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:54:04,212-Speed 3240.58 samples/sec   Loss 5.2445   LearningRate 0.0383   Epoch: 7   Global Step: 94680   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:54:07,333-Speed 3282.28 samples/sec   Loss 5.3450   LearningRate 0.0383   Epoch: 7   Global Step: 94690   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:54:10,409-Speed 3330.09 samples/sec   Loss 5.3058   LearningRate 0.0383   Epoch: 7   Global Step: 94700   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:54:13,528-Speed 3283.86 samples/sec   Loss 5.3888   LearningRate 0.0383   Epoch: 7   Global Step: 94710   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:54:16,627-Speed 3305.76 samples/sec   Loss 5.3723   LearningRate 0.0383   Epoch: 7   Global Step: 94720   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:54:19,690-Speed 3344.43 samples/sec   Loss 5.2192   LearningRate 0.0383   Epoch: 7   Global Step: 94730   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:54:22,756-Speed 3340.12 samples/sec   Loss 5.2675   LearningRate 0.0383   Epoch: 7   Global Step: 94740   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:54:25,881-Speed 3278.66 samples/sec   Loss 5.2806   LearningRate 0.0383   Epoch: 7   Global Step: 94750   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:54:28,982-Speed 3303.27 samples/sec   Loss 5.3768   LearningRate 0.0383   Epoch: 7   Global Step: 94760   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:54:32,140-Speed 3243.63 samples/sec   Loss 5.3169   LearningRate 0.0383   Epoch: 7   Global Step: 94770   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:54:35,203-Speed 3343.89 samples/sec   Loss 5.3923   LearningRate 0.0383   Epoch: 7   Global Step: 94780   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:54:38,262-Speed 3348.33 samples/sec   Loss 5.4290   LearningRate 0.0382   Epoch: 7   Global Step: 94790   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:54:41,338-Speed 3330.92 samples/sec   Loss 5.3360   LearningRate 0.0382   Epoch: 7   Global Step: 94800   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:54:44,461-Speed 3279.35 samples/sec   Loss 5.2637   LearningRate 0.0382   Epoch: 7   Global Step: 94810   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:54:47,546-Speed 3320.98 samples/sec   Loss 5.2676   LearningRate 0.0382   Epoch: 7   Global Step: 94820   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:54:50,686-Speed 3261.82 samples/sec   Loss 5.2464   LearningRate 0.0382   Epoch: 7   Global Step: 94830   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:54:53,828-Speed 3259.58 samples/sec   Loss 5.2139   LearningRate 0.0382   Epoch: 7   Global Step: 94840   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:54:56,942-Speed 3289.59 samples/sec   Loss 5.2390   LearningRate 0.0382   Epoch: 7   Global Step: 94850   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:55:00,065-Speed 3280.21 samples/sec   Loss 5.2962   LearningRate 0.0382   Epoch: 7   Global Step: 94860   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:55:03,236-Speed 3230.38 samples/sec   Loss 5.3366   LearningRate 0.0382   Epoch: 7   Global Step: 94870   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:55:06,312-Speed 3329.85 samples/sec   Loss 5.3136   LearningRate 0.0382   Epoch: 7   Global Step: 94880   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:55:09,412-Speed 3304.45 samples/sec   Loss 5.2700   LearningRate 0.0382   Epoch: 7   Global Step: 94890   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:55:12,537-Speed 3277.21 samples/sec   Loss 5.4052   LearningRate 0.0382   Epoch: 7   Global Step: 94900   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:55:15,653-Speed 3287.91 samples/sec   Loss 5.3301   LearningRate 0.0382   Epoch: 7   Global Step: 94910   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:55:18,774-Speed 3281.70 samples/sec   Loss 5.3563   LearningRate 0.0382   Epoch: 7   Global Step: 94920   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:55:21,832-Speed 3349.62 samples/sec   Loss 5.3137   LearningRate 0.0382   Epoch: 7   Global Step: 94930   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:55:24,999-Speed 3234.44 samples/sec   Loss 5.2372   LearningRate 0.0382   Epoch: 7   Global Step: 94940   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:55:28,159-Speed 3241.29 samples/sec   Loss 5.3211   LearningRate 0.0382   Epoch: 7   Global Step: 94950   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:55:31,266-Speed 3297.45 samples/sec   Loss 5.3778   LearningRate 0.0382   Epoch: 7   Global Step: 94960   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:55:34,425-Speed 3242.47 samples/sec   Loss 5.3114   LearningRate 0.0382   Epoch: 7   Global Step: 94970   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:55:37,596-Speed 3230.26 samples/sec   Loss 5.3201   LearningRate 0.0382   Epoch: 7   Global Step: 94980   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:55:40,729-Speed 3269.45 samples/sec   Loss 5.4498   LearningRate 0.0381   Epoch: 7   Global Step: 94990   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:55:43,838-Speed 3294.15 samples/sec   Loss 5.2309   LearningRate 0.0381   Epoch: 7   Global Step: 95000   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:55:46,934-Speed 3308.92 samples/sec   Loss 5.3325   LearningRate 0.0381   Epoch: 7   Global Step: 95010   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:55:50,081-Speed 3254.38 samples/sec   Loss 5.3044   LearningRate 0.0381   Epoch: 7   Global Step: 95020   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:55:53,250-Speed 3232.89 samples/sec   Loss 5.3002   LearningRate 0.0381   Epoch: 7   Global Step: 95030   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:55:56,378-Speed 3274.16 samples/sec   Loss 5.3455   LearningRate 0.0381   Epoch: 7   Global Step: 95040   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:55:59,463-Speed 3320.47 samples/sec   Loss 5.2956   LearningRate 0.0381   Epoch: 7   Global Step: 95050   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:56:02,539-Speed 3330.32 samples/sec   Loss 5.2041   LearningRate 0.0381   Epoch: 7   Global Step: 95060   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:56:05,653-Speed 3289.53 samples/sec   Loss 5.3399   LearningRate 0.0381   Epoch: 7   Global Step: 95070   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:56:08,749-Speed 3308.46 samples/sec   Loss 5.3152   LearningRate 0.0381   Epoch: 7   Global Step: 95080   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:56:11,812-Speed 3344.46 samples/sec   Loss 5.2589   LearningRate 0.0381   Epoch: 7   Global Step: 95090   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:56:14,907-Speed 3309.41 samples/sec   Loss 5.2511   LearningRate 0.0381   Epoch: 7   Global Step: 95100   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:56:17,986-Speed 3327.47 samples/sec   Loss 5.2677   LearningRate 0.0381   Epoch: 7   Global Step: 95110   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:56:21,024-Speed 3371.17 samples/sec   Loss 5.3847   LearningRate 0.0381   Epoch: 7   Global Step: 95120   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:56:24,081-Speed 3351.35 samples/sec   Loss 5.2781   LearningRate 0.0381   Epoch: 7   Global Step: 95130   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:56:27,157-Speed 3330.08 samples/sec   Loss 5.3654   LearningRate 0.0381   Epoch: 7   Global Step: 95140   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:56:30,243-Speed 3318.70 samples/sec   Loss 5.2750   LearningRate 0.0381   Epoch: 7   Global Step: 95150   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:56:33,351-Speed 3295.26 samples/sec   Loss 5.3445   LearningRate 0.0381   Epoch: 7   Global Step: 95160   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:56:36,466-Speed 3289.36 samples/sec   Loss 5.3284   LearningRate 0.0381   Epoch: 7   Global Step: 95170   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:56:39,576-Speed 3292.88 samples/sec   Loss 5.4468   LearningRate 0.0381   Epoch: 7   Global Step: 95180   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:56:42,671-Speed 3309.77 samples/sec   Loss 5.3551   LearningRate 0.0380   Epoch: 7   Global Step: 95190   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:56:45,775-Speed 3300.49 samples/sec   Loss 5.2534   LearningRate 0.0380   Epoch: 7   Global Step: 95200   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:56:48,853-Speed 3327.50 samples/sec   Loss 5.2908   LearningRate 0.0380   Epoch: 7   Global Step: 95210   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:56:51,997-Speed 3258.82 samples/sec   Loss 5.3919   LearningRate 0.0380   Epoch: 7   Global Step: 95220   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:56:55,184-Speed 3213.74 samples/sec   Loss 5.3201   LearningRate 0.0380   Epoch: 7   Global Step: 95230   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:56:58,265-Speed 3324.76 samples/sec   Loss 5.3687   LearningRate 0.0380   Epoch: 7   Global Step: 95240   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:57:01,385-Speed 3282.46 samples/sec   Loss 5.3497   LearningRate 0.0380   Epoch: 7   Global Step: 95250   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:57:04,495-Speed 3294.20 samples/sec   Loss 5.3794   LearningRate 0.0380   Epoch: 7   Global Step: 95260   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:57:07,593-Speed 3306.72 samples/sec   Loss 5.2751   LearningRate 0.0380   Epoch: 7   Global Step: 95270   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:57:10,770-Speed 3223.92 samples/sec   Loss 5.3510   LearningRate 0.0380   Epoch: 7   Global Step: 95280   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:57:13,875-Speed 3298.75 samples/sec   Loss 5.3399   LearningRate 0.0380   Epoch: 7   Global Step: 95290   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:57:17,010-Speed 3267.30 samples/sec   Loss 5.2557   LearningRate 0.0380   Epoch: 7   Global Step: 95300   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:57:20,127-Speed 3286.92 samples/sec   Loss 5.2465   LearningRate 0.0380   Epoch: 7   Global Step: 95310   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:57:23,217-Speed 3314.24 samples/sec   Loss 5.3148   LearningRate 0.0380   Epoch: 7   Global Step: 95320   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:57:26,324-Speed 3297.10 samples/sec   Loss 5.4135   LearningRate 0.0380   Epoch: 7   Global Step: 95330   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:57:29,406-Speed 3323.97 samples/sec   Loss 5.2413   LearningRate 0.0380   Epoch: 7   Global Step: 95340   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:57:32,493-Speed 3318.01 samples/sec   Loss 5.2688   LearningRate 0.0380   Epoch: 7   Global Step: 95350   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:57:35,547-Speed 3353.56 samples/sec   Loss 5.3680   LearningRate 0.0380   Epoch: 7   Global Step: 95360   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:57:38,619-Speed 3334.51 samples/sec   Loss 5.3135   LearningRate 0.0380   Epoch: 7   Global Step: 95370   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:57:41,708-Speed 3316.41 samples/sec   Loss 5.2978   LearningRate 0.0380   Epoch: 7   Global Step: 95380   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:57:44,760-Speed 3355.64 samples/sec   Loss 5.3874   LearningRate 0.0379   Epoch: 7   Global Step: 95390   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:57:47,834-Speed 3333.08 samples/sec   Loss 5.3651   LearningRate 0.0379   Epoch: 7   Global Step: 95400   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:57:50,904-Speed 3336.49 samples/sec   Loss 5.3923   LearningRate 0.0379   Epoch: 7   Global Step: 95410   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 09:57:54,010-Speed 3297.85 samples/sec   Loss 5.3960   LearningRate 0.0379   Epoch: 7   Global Step: 95420   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:57:57,118-Speed 3295.34 samples/sec   Loss 5.2405   LearningRate 0.0379   Epoch: 7   Global Step: 95430   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:58:00,199-Speed 3324.85 samples/sec   Loss 5.3222   LearningRate 0.0379   Epoch: 7   Global Step: 95440   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:58:03,275-Speed 3330.59 samples/sec   Loss 5.3131   LearningRate 0.0379   Epoch: 7   Global Step: 95450   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:58:06,334-Speed 3348.49 samples/sec   Loss 5.3187   LearningRate 0.0379   Epoch: 7   Global Step: 95460   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:58:09,396-Speed 3345.50 samples/sec   Loss 5.4582   LearningRate 0.0379   Epoch: 7   Global Step: 95470   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:58:12,511-Speed 3287.95 samples/sec   Loss 5.3694   LearningRate 0.0379   Epoch: 7   Global Step: 95480   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:58:15,695-Speed 3217.30 samples/sec   Loss 5.4266   LearningRate 0.0379   Epoch: 7   Global Step: 95490   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:58:18,788-Speed 3312.22 samples/sec   Loss 5.3552   LearningRate 0.0379   Epoch: 7   Global Step: 95500   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:58:21,914-Speed 3276.98 samples/sec   Loss 5.3844   LearningRate 0.0379   Epoch: 7   Global Step: 95510   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:58:25,042-Speed 3274.47 samples/sec   Loss 5.3240   LearningRate 0.0379   Epoch: 7   Global Step: 95520   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:58:28,111-Speed 3337.88 samples/sec   Loss 5.3545   LearningRate 0.0379   Epoch: 7   Global Step: 95530   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:58:31,181-Speed 3336.34 samples/sec   Loss 5.2958   LearningRate 0.0379   Epoch: 7   Global Step: 95540   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:58:34,264-Speed 3323.31 samples/sec   Loss 5.3023   LearningRate 0.0379   Epoch: 7   Global Step: 95550   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:58:37,420-Speed 3245.80 samples/sec   Loss 5.3587   LearningRate 0.0379   Epoch: 7   Global Step: 95560   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:58:40,505-Speed 3319.57 samples/sec   Loss 5.3332   LearningRate 0.0379   Epoch: 7   Global Step: 95570   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:58:43,566-Speed 3345.99 samples/sec   Loss 5.3928   LearningRate 0.0379   Epoch: 7   Global Step: 95580   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:58:46,692-Speed 3277.61 samples/sec   Loss 5.3175   LearningRate 0.0378   Epoch: 7   Global Step: 95590   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:58:49,862-Speed 3230.94 samples/sec   Loss 5.3793   LearningRate 0.0378   Epoch: 7   Global Step: 95600   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:58:52,947-Speed 3320.21 samples/sec   Loss 5.2242   LearningRate 0.0378   Epoch: 7   Global Step: 95610   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:58:56,057-Speed 3293.72 samples/sec   Loss 5.3091   LearningRate 0.0378   Epoch: 7   Global Step: 95620   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:58:59,107-Speed 3358.08 samples/sec   Loss 5.3486   LearningRate 0.0378   Epoch: 7   Global Step: 95630   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:59:02,191-Speed 3322.24 samples/sec   Loss 5.3702   LearningRate 0.0378   Epoch: 7   Global Step: 95640   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:59:05,312-Speed 3281.54 samples/sec   Loss 5.2874   LearningRate 0.0378   Epoch: 7   Global Step: 95650   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:59:08,369-Speed 3351.04 samples/sec   Loss 5.3016   LearningRate 0.0378   Epoch: 7   Global Step: 95660   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:59:11,466-Speed 3307.64 samples/sec   Loss 5.3374   LearningRate 0.0378   Epoch: 7   Global Step: 95670   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:59:14,559-Speed 3312.37 samples/sec   Loss 5.3354   LearningRate 0.0378   Epoch: 7   Global Step: 95680   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:59:17,699-Speed 3262.02 samples/sec   Loss 5.3020   LearningRate 0.0378   Epoch: 7   Global Step: 95690   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:59:20,836-Speed 3264.80 samples/sec   Loss 5.2512   LearningRate 0.0378   Epoch: 7   Global Step: 95700   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:59:23,901-Speed 3342.12 samples/sec   Loss 5.2960   LearningRate 0.0378   Epoch: 7   Global Step: 95710   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:59:26,985-Speed 3321.93 samples/sec   Loss 5.2201   LearningRate 0.0378   Epoch: 7   Global Step: 95720   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:59:30,122-Speed 3264.96 samples/sec   Loss 5.3241   LearningRate 0.0378   Epoch: 7   Global Step: 95730   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:59:33,191-Speed 3337.44 samples/sec   Loss 5.3699   LearningRate 0.0378   Epoch: 7   Global Step: 95740   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:59:36,350-Speed 3243.10 samples/sec   Loss 5.3558   LearningRate 0.0378   Epoch: 7   Global Step: 95750   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:59:39,421-Speed 3334.91 samples/sec   Loss 5.2926   LearningRate 0.0378   Epoch: 7   Global Step: 95760   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 09:59:42,532-Speed 3292.89 samples/sec   Loss 5.2937   LearningRate 0.0378   Epoch: 7   Global Step: 95770   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:59:45,586-Speed 3354.11 samples/sec   Loss 5.2769   LearningRate 0.0378   Epoch: 7   Global Step: 95780   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:59:48,674-Speed 3316.94 samples/sec   Loss 5.3233   LearningRate 0.0377   Epoch: 7   Global Step: 95790   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:59:51,756-Speed 3323.86 samples/sec   Loss 5.2898   LearningRate 0.0377   Epoch: 7   Global Step: 95800   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:59:54,970-Speed 3186.93 samples/sec   Loss 5.3175   LearningRate 0.0377   Epoch: 7   Global Step: 95810   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 09:59:58,054-Speed 3320.73 samples/sec   Loss 5.3674   LearningRate 0.0377   Epoch: 7   Global Step: 95820   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:00:01,138-Speed 3321.44 samples/sec   Loss 5.3349   LearningRate 0.0377   Epoch: 7   Global Step: 95830   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:00:04,238-Speed 3305.23 samples/sec   Loss 5.2982   LearningRate 0.0377   Epoch: 7   Global Step: 95840   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:00:07,309-Speed 3334.96 samples/sec   Loss 5.1819   LearningRate 0.0377   Epoch: 7   Global Step: 95850   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:00:10,369-Speed 3346.90 samples/sec   Loss 5.3536   LearningRate 0.0377   Epoch: 7   Global Step: 95860   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:00:13,497-Speed 3275.61 samples/sec   Loss 5.2641   LearningRate 0.0377   Epoch: 7   Global Step: 95870   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:00:16,637-Speed 3262.52 samples/sec   Loss 5.2610   LearningRate 0.0377   Epoch: 7   Global Step: 95880   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:00:19,687-Speed 3357.26 samples/sec   Loss 5.3079   LearningRate 0.0377   Epoch: 7   Global Step: 95890   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:00:22,748-Speed 3346.77 samples/sec   Loss 5.3294   LearningRate 0.0377   Epoch: 7   Global Step: 95900   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:00:25,828-Speed 3326.55 samples/sec   Loss 5.3002   LearningRate 0.0377   Epoch: 7   Global Step: 95910   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:00:28,967-Speed 3262.14 samples/sec   Loss 5.3095   LearningRate 0.0377   Epoch: 7   Global Step: 95920   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:00:32,092-Speed 3278.09 samples/sec   Loss 5.3713   LearningRate 0.0377   Epoch: 7   Global Step: 95930   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:00:35,207-Speed 3288.24 samples/sec   Loss 5.3053   LearningRate 0.0377   Epoch: 7   Global Step: 95940   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:00:38,365-Speed 3243.90 samples/sec   Loss 5.3968   LearningRate 0.0377   Epoch: 7   Global Step: 95950   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:00:41,609-Speed 3157.81 samples/sec   Loss 5.4066   LearningRate 0.0377   Epoch: 7   Global Step: 95960   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:00:44,752-Speed 3258.82 samples/sec   Loss 5.3311   LearningRate 0.0377   Epoch: 7   Global Step: 95970   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:00:47,885-Speed 3269.59 samples/sec   Loss 5.3286   LearningRate 0.0377   Epoch: 7   Global Step: 95980   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:00:50,948-Speed 3344.96 samples/sec   Loss 5.2159   LearningRate 0.0377   Epoch: 7   Global Step: 95990   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:00:54,083-Speed 3267.09 samples/sec   Loss 5.3980   LearningRate 0.0376   Epoch: 7   Global Step: 96000   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:00:57,145-Speed 3345.72 samples/sec   Loss 5.3143   LearningRate 0.0376   Epoch: 7   Global Step: 96010   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:01:00,279-Speed 3268.04 samples/sec   Loss 5.3194   LearningRate 0.0376   Epoch: 7   Global Step: 96020   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:01:03,467-Speed 3213.87 samples/sec   Loss 5.3420   LearningRate 0.0376   Epoch: 7   Global Step: 96030   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:01:06,641-Speed 3226.87 samples/sec   Loss 5.3596   LearningRate 0.0376   Epoch: 7   Global Step: 96040   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:01:09,745-Speed 3299.51 samples/sec   Loss 5.3545   LearningRate 0.0376   Epoch: 7   Global Step: 96050   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:01:12,857-Speed 3291.71 samples/sec   Loss 5.3191   LearningRate 0.0376   Epoch: 7   Global Step: 96060   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:01:15,993-Speed 3266.60 samples/sec   Loss 5.3159   LearningRate 0.0376   Epoch: 7   Global Step: 96070   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:01:19,166-Speed 3227.78 samples/sec   Loss 5.3192   LearningRate 0.0376   Epoch: 7   Global Step: 96080   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:01:22,282-Speed 3286.90 samples/sec   Loss 5.3227   LearningRate 0.0376   Epoch: 7   Global Step: 96090   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:01:25,380-Speed 3306.52 samples/sec   Loss 5.3322   LearningRate 0.0376   Epoch: 7   Global Step: 96100   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:01:28,513-Speed 3270.10 samples/sec   Loss 5.3009   LearningRate 0.0376   Epoch: 7   Global Step: 96110   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:01:31,607-Speed 3311.00 samples/sec   Loss 5.3882   LearningRate 0.0376   Epoch: 7   Global Step: 96120   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:01:34,746-Speed 3262.43 samples/sec   Loss 5.3015   LearningRate 0.0376   Epoch: 7   Global Step: 96130   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:01:37,837-Speed 3314.03 samples/sec   Loss 5.1983   LearningRate 0.0376   Epoch: 7   Global Step: 96140   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:01:40,903-Speed 3341.39 samples/sec   Loss 5.2898   LearningRate 0.0376   Epoch: 7   Global Step: 96150   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:01:44,028-Speed 3278.03 samples/sec   Loss 5.2888   LearningRate 0.0376   Epoch: 7   Global Step: 96160   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:01:47,082-Speed 3353.57 samples/sec   Loss 5.2800   LearningRate 0.0376   Epoch: 7   Global Step: 96170   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:01:50,176-Speed 3310.50 samples/sec   Loss 5.3105   LearningRate 0.0376   Epoch: 7   Global Step: 96180   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:01:53,293-Speed 3286.44 samples/sec   Loss 5.2948   LearningRate 0.0376   Epoch: 7   Global Step: 96190   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:01:56,393-Speed 3304.03 samples/sec   Loss 5.2857   LearningRate 0.0375   Epoch: 7   Global Step: 96200   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:01:59,491-Speed 3306.80 samples/sec   Loss 5.3263   LearningRate 0.0375   Epoch: 7   Global Step: 96210   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:02:02,612-Speed 3281.76 samples/sec   Loss 5.2934   LearningRate 0.0375   Epoch: 7   Global Step: 96220   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:02:05,752-Speed 3263.09 samples/sec   Loss 5.3511   LearningRate 0.0375   Epoch: 7   Global Step: 96230   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:02:08,818-Speed 3340.90 samples/sec   Loss 5.3284   LearningRate 0.0375   Epoch: 7   Global Step: 96240   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:02:11,951-Speed 3269.03 samples/sec   Loss 5.3310   LearningRate 0.0375   Epoch: 7   Global Step: 96250   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:02:15,063-Speed 3292.05 samples/sec   Loss 5.2003   LearningRate 0.0375   Epoch: 7   Global Step: 96260   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:02:18,197-Speed 3267.66 samples/sec   Loss 5.2754   LearningRate 0.0375   Epoch: 7   Global Step: 96270   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:02:21,288-Speed 3313.79 samples/sec   Loss 5.3986   LearningRate 0.0375   Epoch: 7   Global Step: 96280   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:02:24,409-Speed 3282.05 samples/sec   Loss 5.3582   LearningRate 0.0375   Epoch: 7   Global Step: 96290   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:02:27,557-Speed 3253.99 samples/sec   Loss 5.3182   LearningRate 0.0375   Epoch: 7   Global Step: 96300   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:02:30,800-Speed 3158.69 samples/sec   Loss 5.3076   LearningRate 0.0375   Epoch: 7   Global Step: 96310   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:02:33,893-Speed 3311.68 samples/sec   Loss 5.3318   LearningRate 0.0375   Epoch: 7   Global Step: 96320   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:02:36,976-Speed 3322.83 samples/sec   Loss 5.2402   LearningRate 0.0375   Epoch: 7   Global Step: 96330   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:02:40,069-Speed 3311.06 samples/sec   Loss 5.3459   LearningRate 0.0375   Epoch: 7   Global Step: 96340   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:02:43,155-Speed 3319.84 samples/sec   Loss 5.2964   LearningRate 0.0375   Epoch: 7   Global Step: 96350   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:02:46,232-Speed 3328.88 samples/sec   Loss 5.2569   LearningRate 0.0375   Epoch: 7   Global Step: 96360   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:02:49,341-Speed 3294.15 samples/sec   Loss 5.3401   LearningRate 0.0375   Epoch: 7   Global Step: 96370   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:02:52,470-Speed 3273.82 samples/sec   Loss 5.2795   LearningRate 0.0375   Epoch: 7   Global Step: 96380   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:02:55,648-Speed 3224.17 samples/sec   Loss 5.2901   LearningRate 0.0375   Epoch: 7   Global Step: 96390   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:02:58,737-Speed 3315.75 samples/sec   Loss 5.3566   LearningRate 0.0374   Epoch: 7   Global Step: 96400   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:03:01,831-Speed 3309.92 samples/sec   Loss 5.3599   LearningRate 0.0374   Epoch: 7   Global Step: 96410   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:03:04,924-Speed 3312.03 samples/sec   Loss 5.2703   LearningRate 0.0374   Epoch: 7   Global Step: 96420   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:03:08,026-Speed 3302.22 samples/sec   Loss 5.2833   LearningRate 0.0374   Epoch: 7   Global Step: 96430   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:03:11,100-Speed 3331.48 samples/sec   Loss 5.2702   LearningRate 0.0374   Epoch: 7   Global Step: 96440   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:03:14,229-Speed 3274.94 samples/sec   Loss 5.3408   LearningRate 0.0374   Epoch: 7   Global Step: 96450   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:03:17,312-Speed 3322.24 samples/sec   Loss 5.3057   LearningRate 0.0374   Epoch: 7   Global Step: 96460   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:03:20,404-Speed 3312.30 samples/sec   Loss 5.2329   LearningRate 0.0374   Epoch: 7   Global Step: 96470   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:03:23,491-Speed 3317.90 samples/sec   Loss 5.3752   LearningRate 0.0374   Epoch: 7   Global Step: 96480   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:03:26,610-Speed 3285.04 samples/sec   Loss 5.2391   LearningRate 0.0374   Epoch: 7   Global Step: 96490   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:03:29,785-Speed 3225.84 samples/sec   Loss 5.4037   LearningRate 0.0374   Epoch: 7   Global Step: 96500   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:03:32,852-Speed 3339.11 samples/sec   Loss 5.3176   LearningRate 0.0374   Epoch: 7   Global Step: 96510   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:03:35,949-Speed 3307.87 samples/sec   Loss 5.2986   LearningRate 0.0374   Epoch: 7   Global Step: 96520   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:03:39,035-Speed 3319.81 samples/sec   Loss 5.3841   LearningRate 0.0374   Epoch: 7   Global Step: 96530   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:03:42,205-Speed 3230.90 samples/sec   Loss 5.4297   LearningRate 0.0374   Epoch: 7   Global Step: 96540   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:03:45,299-Speed 3311.23 samples/sec   Loss 5.2918   LearningRate 0.0374   Epoch: 7   Global Step: 96550   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:03:48,419-Speed 3282.31 samples/sec   Loss 5.3189   LearningRate 0.0374   Epoch: 7   Global Step: 96560   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:03:51,512-Speed 3311.92 samples/sec   Loss 5.3943   LearningRate 0.0374   Epoch: 7   Global Step: 96570   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:03:54,597-Speed 3320.90 samples/sec   Loss 5.3848   LearningRate 0.0374   Epoch: 7   Global Step: 96580   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:03:57,700-Speed 3301.20 samples/sec   Loss 5.3159   LearningRate 0.0374   Epoch: 7   Global Step: 96590   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:04:00,770-Speed 3335.74 samples/sec   Loss 5.4769   LearningRate 0.0373   Epoch: 7   Global Step: 96600   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:04:03,889-Speed 3284.47 samples/sec   Loss 5.3286   LearningRate 0.0373   Epoch: 7   Global Step: 96610   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:04:07,072-Speed 3217.85 samples/sec   Loss 5.2501   LearningRate 0.0373   Epoch: 7   Global Step: 96620   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:04:10,183-Speed 3292.79 samples/sec   Loss 5.3181   LearningRate 0.0373   Epoch: 7   Global Step: 96630   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:04:13,302-Speed 3283.75 samples/sec   Loss 5.2882   LearningRate 0.0373   Epoch: 7   Global Step: 96640   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:04:16,413-Speed 3292.36 samples/sec   Loss 5.3099   LearningRate 0.0373   Epoch: 7   Global Step: 96650   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:04:19,595-Speed 3219.59 samples/sec   Loss 5.4560   LearningRate 0.0373   Epoch: 7   Global Step: 96660   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:04:22,712-Speed 3286.49 samples/sec   Loss 5.2525   LearningRate 0.0373   Epoch: 7   Global Step: 96670   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:04:25,880-Speed 3233.10 samples/sec   Loss 5.3971   LearningRate 0.0373   Epoch: 7   Global Step: 96680   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:04:29,055-Speed 3226.40 samples/sec   Loss 5.2982   LearningRate 0.0373   Epoch: 7   Global Step: 96690   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:04:32,189-Speed 3268.17 samples/sec   Loss 5.2499   LearningRate 0.0373   Epoch: 7   Global Step: 96700   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:04:35,299-Speed 3294.43 samples/sec   Loss 5.1688   LearningRate 0.0373   Epoch: 7   Global Step: 96710   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:04:38,383-Speed 3321.26 samples/sec   Loss 5.2782   LearningRate 0.0373   Epoch: 7   Global Step: 96720   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:04:41,464-Speed 3324.61 samples/sec   Loss 5.3293   LearningRate 0.0373   Epoch: 7   Global Step: 96730   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:04:44,569-Speed 3299.33 samples/sec   Loss 5.2877   LearningRate 0.0373   Epoch: 7   Global Step: 96740   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:04:47,710-Speed 3260.48 samples/sec   Loss 5.2156   LearningRate 0.0373   Epoch: 7   Global Step: 96750   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:04:50,805-Speed 3309.84 samples/sec   Loss 5.2529   LearningRate 0.0373   Epoch: 7   Global Step: 96760   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:04:53,969-Speed 3237.84 samples/sec   Loss 5.3719   LearningRate 0.0373   Epoch: 7   Global Step: 96770   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:04:57,025-Speed 3351.31 samples/sec   Loss 5.3310   LearningRate 0.0373   Epoch: 7   Global Step: 96780   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:05:00,168-Speed 3258.92 samples/sec   Loss 5.3022   LearningRate 0.0373   Epoch: 7   Global Step: 96790   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:05:03,283-Speed 3289.15 samples/sec   Loss 5.3860   LearningRate 0.0373   Epoch: 7   Global Step: 96800   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:05:06,423-Speed 3261.85 samples/sec   Loss 5.4094   LearningRate 0.0372   Epoch: 7   Global Step: 96810   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:05:09,489-Speed 3340.68 samples/sec   Loss 5.4975   LearningRate 0.0372   Epoch: 7   Global Step: 96820   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:05:12,712-Speed 3178.14 samples/sec   Loss 5.3913   LearningRate 0.0372   Epoch: 7   Global Step: 96830   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:05:15,883-Speed 3230.41 samples/sec   Loss 5.2524   LearningRate 0.0372   Epoch: 7   Global Step: 96840   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:05:19,077-Speed 3206.84 samples/sec   Loss 5.3174   LearningRate 0.0372   Epoch: 7   Global Step: 96850   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:05:22,179-Speed 3302.18 samples/sec   Loss 5.3309   LearningRate 0.0372   Epoch: 7   Global Step: 96860   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:05:25,293-Speed 3289.77 samples/sec   Loss 5.3657   LearningRate 0.0372   Epoch: 7   Global Step: 96870   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:05:28,411-Speed 3285.46 samples/sec   Loss 5.3872   LearningRate 0.0372   Epoch: 7   Global Step: 96880   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:05:31,497-Speed 3319.64 samples/sec   Loss 5.4245   LearningRate 0.0372   Epoch: 7   Global Step: 96890   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:05:34,569-Speed 3333.87 samples/sec   Loss 5.1820   LearningRate 0.0372   Epoch: 7   Global Step: 96900   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:05:37,677-Speed 3296.43 samples/sec   Loss 5.4569   LearningRate 0.0372   Epoch: 7   Global Step: 96910   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:05:40,886-Speed 3191.55 samples/sec   Loss 5.3051   LearningRate 0.0372   Epoch: 7   Global Step: 96920   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:05:44,046-Speed 3241.60 samples/sec   Loss 5.2172   LearningRate 0.0372   Epoch: 7   Global Step: 96930   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:05:47,101-Speed 3353.33 samples/sec   Loss 5.4627   LearningRate 0.0372   Epoch: 7   Global Step: 96940   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:05:50,249-Speed 3253.59 samples/sec   Loss 5.3190   LearningRate 0.0372   Epoch: 7   Global Step: 96950   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:05:53,362-Speed 3290.20 samples/sec   Loss 5.3194   LearningRate 0.0372   Epoch: 7   Global Step: 96960   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:05:56,415-Speed 3355.78 samples/sec   Loss 5.2569   LearningRate 0.0372   Epoch: 7   Global Step: 96970   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:05:59,490-Speed 3331.29 samples/sec   Loss 5.3006   LearningRate 0.0372   Epoch: 7   Global Step: 96980   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:06:02,577-Speed 3317.74 samples/sec   Loss 5.3213   LearningRate 0.0372   Epoch: 7   Global Step: 96990   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:06:05,641-Speed 3342.55 samples/sec   Loss 5.4485   LearningRate 0.0372   Epoch: 7   Global Step: 97000   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:06:08,774-Speed 3269.48 samples/sec   Loss 5.2492   LearningRate 0.0371   Epoch: 7   Global Step: 97010   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:06:11,856-Speed 3324.96 samples/sec   Loss 5.3019   LearningRate 0.0371   Epoch: 7   Global Step: 97020   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:06:14,999-Speed 3258.32 samples/sec   Loss 5.3518   LearningRate 0.0371   Epoch: 7   Global Step: 97030   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:06:18,071-Speed 3334.54 samples/sec   Loss 5.2722   LearningRate 0.0371   Epoch: 7   Global Step: 97040   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:06:21,132-Speed 3346.70 samples/sec   Loss 5.2885   LearningRate 0.0371   Epoch: 7   Global Step: 97050   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:06:24,243-Speed 3292.95 samples/sec   Loss 5.2335   LearningRate 0.0371   Epoch: 7   Global Step: 97060   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:06:27,339-Speed 3308.23 samples/sec   Loss 5.4191   LearningRate 0.0371   Epoch: 7   Global Step: 97070   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:06:30,414-Speed 3331.40 samples/sec   Loss 5.3265   LearningRate 0.0371   Epoch: 7   Global Step: 97080   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:06:33,467-Speed 3355.26 samples/sec   Loss 5.3795   LearningRate 0.0371   Epoch: 7   Global Step: 97090   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:06:36,572-Speed 3298.58 samples/sec   Loss 5.3587   LearningRate 0.0371   Epoch: 7   Global Step: 97100   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:06:39,674-Speed 3301.95 samples/sec   Loss 5.2981   LearningRate 0.0371   Epoch: 7   Global Step: 97110   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:06:42,801-Speed 3276.42 samples/sec   Loss 5.3606   LearningRate 0.0371   Epoch: 7   Global Step: 97120   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:06:45,911-Speed 3293.38 samples/sec   Loss 5.2299   LearningRate 0.0371   Epoch: 7   Global Step: 97130   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:06:49,059-Speed 3254.10 samples/sec   Loss 5.3730   LearningRate 0.0371   Epoch: 7   Global Step: 97140   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:06:52,142-Speed 3322.42 samples/sec   Loss 5.2679   LearningRate 0.0371   Epoch: 7   Global Step: 97150   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:06:55,193-Speed 3357.58 samples/sec   Loss 5.1462   LearningRate 0.0371   Epoch: 7   Global Step: 97160   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:06:58,279-Speed 3319.29 samples/sec   Loss 5.3320   LearningRate 0.0371   Epoch: 7   Global Step: 97170   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:07:01,375-Speed 3308.32 samples/sec   Loss 5.2827   LearningRate 0.0371   Epoch: 7   Global Step: 97180   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:07:04,447-Speed 3334.14 samples/sec   Loss 5.2627   LearningRate 0.0371   Epoch: 7   Global Step: 97190   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:07:07,559-Speed 3291.76 samples/sec   Loss 5.3462   LearningRate 0.0371   Epoch: 7   Global Step: 97200   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:07:10,635-Speed 3329.90 samples/sec   Loss 5.3842   LearningRate 0.0370   Epoch: 7   Global Step: 97210   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:07:13,787-Speed 3250.08 samples/sec   Loss 5.3169   LearningRate 0.0370   Epoch: 7   Global Step: 97220   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:07:16,927-Speed 3261.66 samples/sec   Loss 5.2983   LearningRate 0.0370   Epoch: 7   Global Step: 97230   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:07:20,030-Speed 3301.66 samples/sec   Loss 5.3724   LearningRate 0.0370   Epoch: 7   Global Step: 97240   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:07:23,162-Speed 3270.12 samples/sec   Loss 5.3408   LearningRate 0.0370   Epoch: 7   Global Step: 97250   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:07:26,250-Speed 3317.24 samples/sec   Loss 5.4061   LearningRate 0.0370   Epoch: 7   Global Step: 97260   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:07:29,407-Speed 3244.81 samples/sec   Loss 5.3540   LearningRate 0.0370   Epoch: 7   Global Step: 97270   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:07:32,506-Speed 3305.45 samples/sec   Loss 5.3607   LearningRate 0.0370   Epoch: 7   Global Step: 97280   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:07:35,604-Speed 3306.11 samples/sec   Loss 5.4448   LearningRate 0.0370   Epoch: 7   Global Step: 97290   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:07:38,704-Speed 3304.03 samples/sec   Loss 5.2577   LearningRate 0.0370   Epoch: 7   Global Step: 97300   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:07:41,858-Speed 3247.70 samples/sec   Loss 5.3200   LearningRate 0.0370   Epoch: 7   Global Step: 97310   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:07:44,936-Speed 3328.24 samples/sec   Loss 5.2342   LearningRate 0.0370   Epoch: 7   Global Step: 97320   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:07:48,067-Speed 3271.55 samples/sec   Loss 5.2617   LearningRate 0.0370   Epoch: 7   Global Step: 97330   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:07:51,202-Speed 3267.26 samples/sec   Loss 5.2057   LearningRate 0.0370   Epoch: 7   Global Step: 97340   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:07:54,290-Speed 3317.84 samples/sec   Loss 5.2471   LearningRate 0.0370   Epoch: 7   Global Step: 97350   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:07:57,354-Speed 3342.36 samples/sec   Loss 5.3251   LearningRate 0.0370   Epoch: 7   Global Step: 97360   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:08:00,481-Speed 3275.45 samples/sec   Loss 5.3219   LearningRate 0.0370   Epoch: 7   Global Step: 97370   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:08:03,621-Speed 3262.28 samples/sec   Loss 5.2992   LearningRate 0.0370   Epoch: 7   Global Step: 97380   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:08:06,786-Speed 3236.84 samples/sec   Loss 5.4098   LearningRate 0.0370   Epoch: 7   Global Step: 97390   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:08:09,839-Speed 3354.96 samples/sec   Loss 5.3209   LearningRate 0.0370   Epoch: 7   Global Step: 97400   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:08:13,032-Speed 3207.76 samples/sec   Loss 5.3662   LearningRate 0.0370   Epoch: 7   Global Step: 97410   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:08:16,173-Speed 3260.77 samples/sec   Loss 5.3094   LearningRate 0.0369   Epoch: 7   Global Step: 97420   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:08:19,296-Speed 3280.11 samples/sec   Loss 5.3190   LearningRate 0.0369   Epoch: 7   Global Step: 97430   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:08:22,357-Speed 3347.01 samples/sec   Loss 5.3291   LearningRate 0.0369   Epoch: 7   Global Step: 97440   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:08:25,534-Speed 3224.38 samples/sec   Loss 5.3637   LearningRate 0.0369   Epoch: 7   Global Step: 97450   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:08:28,616-Speed 3322.62 samples/sec   Loss 5.2683   LearningRate 0.0369   Epoch: 7   Global Step: 97460   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:08:31,707-Speed 3314.20 samples/sec   Loss 5.3213   LearningRate 0.0369   Epoch: 7   Global Step: 97470   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:08:34,794-Speed 3318.43 samples/sec   Loss 5.3822   LearningRate 0.0369   Epoch: 7   Global Step: 97480   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:08:37,882-Speed 3316.77 samples/sec   Loss 5.2931   LearningRate 0.0369   Epoch: 7   Global Step: 97490   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:08:41,006-Speed 3279.05 samples/sec   Loss 5.3817   LearningRate 0.0369   Epoch: 7   Global Step: 97500   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:08:44,063-Speed 3350.79 samples/sec   Loss 5.3955   LearningRate 0.0369   Epoch: 7   Global Step: 97510   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:08:47,131-Speed 3338.54 samples/sec   Loss 5.3838   LearningRate 0.0369   Epoch: 7   Global Step: 97520   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:08:50,279-Speed 3254.27 samples/sec   Loss 5.3697   LearningRate 0.0369   Epoch: 7   Global Step: 97530   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:08:53,407-Speed 3274.29 samples/sec   Loss 5.2682   LearningRate 0.0369   Epoch: 7   Global Step: 97540   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:08:56,556-Speed 3253.11 samples/sec   Loss 5.2334   LearningRate 0.0369   Epoch: 7   Global Step: 97550   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:08:59,722-Speed 3235.06 samples/sec   Loss 5.3541   LearningRate 0.0369   Epoch: 7   Global Step: 97560   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:09:02,872-Speed 3251.42 samples/sec   Loss 5.3903   LearningRate 0.0369   Epoch: 7   Global Step: 97570   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:09:06,040-Speed 3234.14 samples/sec   Loss 5.4108   LearningRate 0.0369   Epoch: 7   Global Step: 97580   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:09:09,132-Speed 3312.62 samples/sec   Loss 5.3145   LearningRate 0.0369   Epoch: 7   Global Step: 97590   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:09:12,281-Speed 3252.68 samples/sec   Loss 5.2598   LearningRate 0.0369   Epoch: 7   Global Step: 97600   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:09:15,369-Speed 3316.95 samples/sec   Loss 5.1759   LearningRate 0.0369   Epoch: 7   Global Step: 97610   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:09:18,476-Speed 3296.73 samples/sec   Loss 5.4189   LearningRate 0.0368   Epoch: 7   Global Step: 97620   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:09:21,539-Speed 3344.55 samples/sec   Loss 5.3223   LearningRate 0.0368   Epoch: 7   Global Step: 97630   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:09:24,601-Speed 3344.96 samples/sec   Loss 5.3433   LearningRate 0.0368   Epoch: 7   Global Step: 97640   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:09:27,762-Speed 3241.00 samples/sec   Loss 5.3081   LearningRate 0.0368   Epoch: 7   Global Step: 97650   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:09:30,922-Speed 3240.54 samples/sec   Loss 5.3500   LearningRate 0.0368   Epoch: 7   Global Step: 97660   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:09:33,979-Speed 3350.68 samples/sec   Loss 5.3099   LearningRate 0.0368   Epoch: 7   Global Step: 97670   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:09:37,161-Speed 3219.72 samples/sec   Loss 5.3036   LearningRate 0.0368   Epoch: 7   Global Step: 97680   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:09:40,345-Speed 3217.18 samples/sec   Loss 5.4106   LearningRate 0.0368   Epoch: 7   Global Step: 97690   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:09:43,459-Speed 3288.62 samples/sec   Loss 5.2881   LearningRate 0.0368   Epoch: 7   Global Step: 97700   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:09:46,559-Speed 3304.67 samples/sec   Loss 5.2566   LearningRate 0.0368   Epoch: 7   Global Step: 97710   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:09:49,637-Speed 3327.54 samples/sec   Loss 5.3582   LearningRate 0.0368   Epoch: 7   Global Step: 97720   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:09:52,715-Speed 3328.68 samples/sec   Loss 5.3030   LearningRate 0.0368   Epoch: 7   Global Step: 97730   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:09:55,781-Speed 3341.31 samples/sec   Loss 5.2373   LearningRate 0.0368   Epoch: 7   Global Step: 97740   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:09:58,881-Speed 3304.34 samples/sec   Loss 5.4239   LearningRate 0.0368   Epoch: 7   Global Step: 97750   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:10:02,010-Speed 3273.60 samples/sec   Loss 5.2807   LearningRate 0.0368   Epoch: 7   Global Step: 97760   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:10:05,114-Speed 3299.68 samples/sec   Loss 5.1740   LearningRate 0.0368   Epoch: 7   Global Step: 97770   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:10:08,214-Speed 3305.23 samples/sec   Loss 5.2758   LearningRate 0.0368   Epoch: 7   Global Step: 97780   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:10:11,335-Speed 3281.18 samples/sec   Loss 5.3539   LearningRate 0.0368   Epoch: 7   Global Step: 97790   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:10:14,414-Speed 3326.45 samples/sec   Loss 5.2208   LearningRate 0.0368   Epoch: 7   Global Step: 97800   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:10:17,540-Speed 3277.74 samples/sec   Loss 5.3331   LearningRate 0.0368   Epoch: 7   Global Step: 97810   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:10:20,634-Speed 3310.61 samples/sec   Loss 5.2861   LearningRate 0.0368   Epoch: 7   Global Step: 97820   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:10:23,713-Speed 3326.56 samples/sec   Loss 5.3295   LearningRate 0.0367   Epoch: 7   Global Step: 97830   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:10:26,875-Speed 3240.30 samples/sec   Loss 5.2509   LearningRate 0.0367   Epoch: 7   Global Step: 97840   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:10:29,967-Speed 3312.47 samples/sec   Loss 5.3304   LearningRate 0.0367   Epoch: 7   Global Step: 97850   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:10:33,045-Speed 3326.93 samples/sec   Loss 5.2512   LearningRate 0.0367   Epoch: 7   Global Step: 97860   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:10:36,151-Speed 3298.20 samples/sec   Loss 5.2899   LearningRate 0.0367   Epoch: 7   Global Step: 97870   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:10:39,268-Speed 3286.91 samples/sec   Loss 5.2612   LearningRate 0.0367   Epoch: 7   Global Step: 97880   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:10:42,393-Speed 3276.80 samples/sec   Loss 5.3599   LearningRate 0.0367   Epoch: 7   Global Step: 97890   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:10:45,500-Speed 3297.13 samples/sec   Loss 5.2396   LearningRate 0.0367   Epoch: 7   Global Step: 97900   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:10:48,618-Speed 3285.75 samples/sec   Loss 5.4024   LearningRate 0.0367   Epoch: 7   Global Step: 97910   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:10:51,800-Speed 3218.53 samples/sec   Loss 5.1433   LearningRate 0.0367   Epoch: 7   Global Step: 97920   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:10:54,918-Speed 3284.93 samples/sec   Loss 5.3386   LearningRate 0.0367   Epoch: 7   Global Step: 97930   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-04-27 10:10:57,974-Speed 3352.85 samples/sec   Loss 5.3487   LearningRate 0.0367   Epoch: 7   Global Step: 97940   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:11:01,078-Speed 3299.27 samples/sec   Loss 5.2763   LearningRate 0.0367   Epoch: 7   Global Step: 97950   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:11:04,152-Speed 3332.35 samples/sec   Loss 5.3030   LearningRate 0.0367   Epoch: 7   Global Step: 97960   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:11:07,261-Speed 3295.17 samples/sec   Loss 5.2652   LearningRate 0.0367   Epoch: 7   Global Step: 97970   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:11:10,329-Speed 3338.57 samples/sec   Loss 5.3503   LearningRate 0.0367   Epoch: 7   Global Step: 97980   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:11:13,434-Speed 3298.88 samples/sec   Loss 5.4023   LearningRate 0.0367   Epoch: 7   Global Step: 97990   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:11:16,519-Speed 3320.26 samples/sec   Loss 5.3475   LearningRate 0.0367   Epoch: 7   Global Step: 98000   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:11:19,718-Speed 3201.79 samples/sec   Loss 5.4027   LearningRate 0.0367   Epoch: 7   Global Step: 98010   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:11:22,805-Speed 3318.51 samples/sec   Loss 5.4013   LearningRate 0.0367   Epoch: 7   Global Step: 98020   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:11:25,917-Speed 3291.75 samples/sec   Loss 5.3231   LearningRate 0.0366   Epoch: 7   Global Step: 98030   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:11:29,083-Speed 3234.85 samples/sec   Loss 5.3877   LearningRate 0.0366   Epoch: 7   Global Step: 98040   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:11:32,176-Speed 3311.77 samples/sec   Loss 5.3070   LearningRate 0.0366   Epoch: 7   Global Step: 98050   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:11:35,286-Speed 3294.08 samples/sec   Loss 5.2952   LearningRate 0.0366   Epoch: 7   Global Step: 98060   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:11:38,436-Speed 3251.64 samples/sec   Loss 5.2545   LearningRate 0.0366   Epoch: 7   Global Step: 98070   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:11:41,576-Speed 3261.64 samples/sec   Loss 5.3349   LearningRate 0.0366   Epoch: 7   Global Step: 98080   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:11:44,723-Speed 3254.78 samples/sec   Loss 5.3602   LearningRate 0.0366   Epoch: 7   Global Step: 98090   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:11:47,844-Speed 3282.18 samples/sec   Loss 5.3553   LearningRate 0.0366   Epoch: 7   Global Step: 98100   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:11:50,944-Speed 3304.68 samples/sec   Loss 5.4187   LearningRate 0.0366   Epoch: 7   Global Step: 98110   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:11:54,101-Speed 3243.81 samples/sec   Loss 5.3494   LearningRate 0.0366   Epoch: 7   Global Step: 98120   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:11:57,200-Speed 3306.38 samples/sec   Loss 5.2388   LearningRate 0.0366   Epoch: 7   Global Step: 98130   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:12:00,296-Speed 3308.00 samples/sec   Loss 5.2795   LearningRate 0.0366   Epoch: 7   Global Step: 98140   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:12:03,373-Speed 3329.22 samples/sec   Loss 5.2887   LearningRate 0.0366   Epoch: 7   Global Step: 98150   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:12:06,493-Speed 3283.36 samples/sec   Loss 5.2669   LearningRate 0.0366   Epoch: 7   Global Step: 98160   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:12:09,568-Speed 3330.93 samples/sec   Loss 5.3079   LearningRate 0.0366   Epoch: 7   Global Step: 98170   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:12:12,642-Speed 3332.20 samples/sec   Loss 5.3579   LearningRate 0.0366   Epoch: 7   Global Step: 98180   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:12:15,754-Speed 3291.29 samples/sec   Loss 5.3453   LearningRate 0.0366   Epoch: 7   Global Step: 98190   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:12:18,871-Speed 3286.07 samples/sec   Loss 5.3292   LearningRate 0.0366   Epoch: 7   Global Step: 98200   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:12:21,952-Speed 3324.81 samples/sec   Loss 5.2834   LearningRate 0.0366   Epoch: 7   Global Step: 98210   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:12:25,050-Speed 3306.50 samples/sec   Loss 5.3336   LearningRate 0.0366   Epoch: 7   Global Step: 98220   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:12:28,194-Speed 3257.92 samples/sec   Loss 5.2924   LearningRate 0.0366   Epoch: 7   Global Step: 98230   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:12:31,298-Speed 3299.13 samples/sec   Loss 5.3563   LearningRate 0.0365   Epoch: 7   Global Step: 98240   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:12:34,386-Speed 3317.41 samples/sec   Loss 5.3787   LearningRate 0.0365   Epoch: 7   Global Step: 98250   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:12:37,478-Speed 3313.48 samples/sec   Loss 5.2331   LearningRate 0.0365   Epoch: 7   Global Step: 98260   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:12:40,577-Speed 3304.48 samples/sec   Loss 5.3427   LearningRate 0.0365   Epoch: 7   Global Step: 98270   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:12:43,752-Speed 3226.40 samples/sec   Loss 5.2757   LearningRate 0.0365   Epoch: 7   Global Step: 98280   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:12:46,833-Speed 3325.39 samples/sec   Loss 5.2819   LearningRate 0.0365   Epoch: 7   Global Step: 98290   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:12:49,914-Speed 3323.51 samples/sec   Loss 5.3339   LearningRate 0.0365   Epoch: 7   Global Step: 98300   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:12:53,079-Speed 3237.45 samples/sec   Loss 5.3719   LearningRate 0.0365   Epoch: 7   Global Step: 98310   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:12:56,184-Speed 3298.96 samples/sec   Loss 5.3367   LearningRate 0.0365   Epoch: 7   Global Step: 98320   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:12:59,254-Speed 3336.11 samples/sec   Loss 5.3732   LearningRate 0.0365   Epoch: 7   Global Step: 98330   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:13:02,372-Speed 3285.48 samples/sec   Loss 5.3313   LearningRate 0.0365   Epoch: 7   Global Step: 98340   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:13:05,454-Speed 3323.25 samples/sec   Loss 5.2821   LearningRate 0.0365   Epoch: 7   Global Step: 98350   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:13:08,563-Speed 3294.11 samples/sec   Loss 5.3477   LearningRate 0.0365   Epoch: 7   Global Step: 98360   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:13:11,665-Speed 3302.49 samples/sec   Loss 5.2297   LearningRate 0.0365   Epoch: 7   Global Step: 98370   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:13:14,778-Speed 3290.92 samples/sec   Loss 5.3586   LearningRate 0.0365   Epoch: 7   Global Step: 98380   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:13:17,957-Speed 3221.94 samples/sec   Loss 5.3150   LearningRate 0.0365   Epoch: 7   Global Step: 98390   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:13:21,021-Speed 3342.57 samples/sec   Loss 5.1963   LearningRate 0.0365   Epoch: 7   Global Step: 98400   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:13:24,119-Speed 3307.13 samples/sec   Loss 5.2796   LearningRate 0.0365   Epoch: 7   Global Step: 98410   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:13:27,275-Speed 3245.53 samples/sec   Loss 5.3834   LearningRate 0.0365   Epoch: 7   Global Step: 98420   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:13:30,409-Speed 3268.04 samples/sec   Loss 5.3290   LearningRate 0.0365   Epoch: 7   Global Step: 98430   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:13:33,502-Speed 3311.27 samples/sec   Loss 5.1646   LearningRate 0.0364   Epoch: 7   Global Step: 98440   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:13:36,665-Speed 3238.38 samples/sec   Loss 5.3354   LearningRate 0.0364   Epoch: 7   Global Step: 98450   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:13:39,766-Speed 3303.60 samples/sec   Loss 5.2921   LearningRate 0.0364   Epoch: 7   Global Step: 98460   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:13:42,899-Speed 3269.55 samples/sec   Loss 5.2405   LearningRate 0.0364   Epoch: 7   Global Step: 98470   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:13:45,982-Speed 3321.77 samples/sec   Loss 5.3388   LearningRate 0.0364   Epoch: 7   Global Step: 98480   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:13:49,098-Speed 3287.57 samples/sec   Loss 5.3090   LearningRate 0.0364   Epoch: 7   Global Step: 98490   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:13:52,287-Speed 3211.69 samples/sec   Loss 5.2934   LearningRate 0.0364   Epoch: 7   Global Step: 98500   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:13:55,369-Speed 3324.41 samples/sec   Loss 5.2534   LearningRate 0.0364   Epoch: 7   Global Step: 98510   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:13:58,465-Speed 3307.35 samples/sec   Loss 5.2529   LearningRate 0.0364   Epoch: 7   Global Step: 98520   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:14:01,537-Speed 3335.01 samples/sec   Loss 5.3135   LearningRate 0.0364   Epoch: 7   Global Step: 98530   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:14:04,643-Speed 3297.78 samples/sec   Loss 5.2913   LearningRate 0.0364   Epoch: 7   Global Step: 98540   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:14:07,751-Speed 3296.24 samples/sec   Loss 5.3350   LearningRate 0.0364   Epoch: 7   Global Step: 98550   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:14:10,834-Speed 3322.27 samples/sec   Loss 5.2213   LearningRate 0.0364   Epoch: 7   Global Step: 98560   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:14:13,919-Speed 3320.51 samples/sec   Loss 5.3261   LearningRate 0.0364   Epoch: 7   Global Step: 98570   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:14:17,017-Speed 3305.97 samples/sec   Loss 5.2470   LearningRate 0.0364   Epoch: 7   Global Step: 98580   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:14:20,116-Speed 3305.77 samples/sec   Loss 5.3164   LearningRate 0.0364   Epoch: 7   Global Step: 98590   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:14:23,210-Speed 3310.44 samples/sec   Loss 5.2377   LearningRate 0.0364   Epoch: 7   Global Step: 98600   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:14:26,305-Speed 3309.76 samples/sec   Loss 5.3436   LearningRate 0.0364   Epoch: 7   Global Step: 98610   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:14:29,481-Speed 3224.94 samples/sec   Loss 5.3672   LearningRate 0.0364   Epoch: 7   Global Step: 98620   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:14:32,593-Speed 3291.26 samples/sec   Loss 5.2991   LearningRate 0.0364   Epoch: 7   Global Step: 98630   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:14:35,663-Speed 3336.69 samples/sec   Loss 5.3696   LearningRate 0.0364   Epoch: 7   Global Step: 98640   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:14:38,761-Speed 3306.47 samples/sec   Loss 5.3610   LearningRate 0.0363   Epoch: 7   Global Step: 98650   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:14:41,910-Speed 3253.05 samples/sec   Loss 5.2983   LearningRate 0.0363   Epoch: 7   Global Step: 98660   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:14:45,004-Speed 3310.16 samples/sec   Loss 5.3563   LearningRate 0.0363   Epoch: 7   Global Step: 98670   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:14:48,088-Speed 3320.99 samples/sec   Loss 5.2679   LearningRate 0.0363   Epoch: 7   Global Step: 98680   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:14:51,216-Speed 3275.53 samples/sec   Loss 5.2920   LearningRate 0.0363   Epoch: 7   Global Step: 98690   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:14:54,390-Speed 3226.71 samples/sec   Loss 5.4095   LearningRate 0.0363   Epoch: 7   Global Step: 98700   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:14:57,496-Speed 3298.44 samples/sec   Loss 5.3302   LearningRate 0.0363   Epoch: 7   Global Step: 98710   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:15:00,634-Speed 3264.19 samples/sec   Loss 5.3392   LearningRate 0.0363   Epoch: 7   Global Step: 98720   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:15:03,763-Speed 3273.67 samples/sec   Loss 5.3393   LearningRate 0.0363   Epoch: 7   Global Step: 98730   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:15:06,909-Speed 3255.38 samples/sec   Loss 5.2968   LearningRate 0.0363   Epoch: 7   Global Step: 98740   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:15:09,983-Speed 3332.12 samples/sec   Loss 5.2987   LearningRate 0.0363   Epoch: 7   Global Step: 98750   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:15:13,123-Speed 3262.71 samples/sec   Loss 5.4006   LearningRate 0.0363   Epoch: 7   Global Step: 98760   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:15:16,251-Speed 3274.84 samples/sec   Loss 5.3429   LearningRate 0.0363   Epoch: 7   Global Step: 98770   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:15:19,348-Speed 3306.90 samples/sec   Loss 5.2408   LearningRate 0.0363   Epoch: 7   Global Step: 98780   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:15:22,501-Speed 3248.33 samples/sec   Loss 5.1378   LearningRate 0.0363   Epoch: 7   Global Step: 98790   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:15:25,635-Speed 3268.55 samples/sec   Loss 5.3084   LearningRate 0.0363   Epoch: 7   Global Step: 98800   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:15:28,763-Speed 3276.46 samples/sec   Loss 5.2349   LearningRate 0.0363   Epoch: 7   Global Step: 98810   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:15:31,908-Speed 3257.10 samples/sec   Loss 5.3220   LearningRate 0.0363   Epoch: 7   Global Step: 98820   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:15:35,027-Speed 3283.14 samples/sec   Loss 5.2680   LearningRate 0.0363   Epoch: 7   Global Step: 98830   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:15:38,169-Speed 3260.45 samples/sec   Loss 5.3436   LearningRate 0.0363   Epoch: 7   Global Step: 98840   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:15:41,277-Speed 3296.36 samples/sec   Loss 5.3409   LearningRate 0.0363   Epoch: 7   Global Step: 98850   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:15:44,393-Speed 3287.09 samples/sec   Loss 5.4131   LearningRate 0.0362   Epoch: 7   Global Step: 98860   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:15:47,526-Speed 3269.45 samples/sec   Loss 5.2763   LearningRate 0.0362   Epoch: 7   Global Step: 98870   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:15:50,653-Speed 3274.76 samples/sec   Loss 5.3010   LearningRate 0.0362   Epoch: 7   Global Step: 98880   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:15:53,768-Speed 3288.94 samples/sec   Loss 5.3136   LearningRate 0.0362   Epoch: 7   Global Step: 98890   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:15:56,871-Speed 3300.39 samples/sec   Loss 5.3794   LearningRate 0.0362   Epoch: 7   Global Step: 98900   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:15:59,972-Speed 3303.34 samples/sec   Loss 5.3712   LearningRate 0.0362   Epoch: 7   Global Step: 98910   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:16:03,112-Speed 3262.01 samples/sec   Loss 5.3642   LearningRate 0.0362   Epoch: 7   Global Step: 98920   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:16:06,242-Speed 3272.69 samples/sec   Loss 5.3784   LearningRate 0.0362   Epoch: 7   Global Step: 98930   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:16:09,308-Speed 3341.16 samples/sec   Loss 5.3255   LearningRate 0.0362   Epoch: 7   Global Step: 98940   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:16:12,407-Speed 3305.87 samples/sec   Loss 5.3017   LearningRate 0.0362   Epoch: 7   Global Step: 98950   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:16:15,609-Speed 3198.21 samples/sec   Loss 5.2989   LearningRate 0.0362   Epoch: 7   Global Step: 98960   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:16:18,761-Speed 3249.65 samples/sec   Loss 5.2659   LearningRate 0.0362   Epoch: 7   Global Step: 98970   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:16:21,890-Speed 3273.98 samples/sec   Loss 5.3141   LearningRate 0.0362   Epoch: 7   Global Step: 98980   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:16:24,993-Speed 3301.72 samples/sec   Loss 5.3104   LearningRate 0.0362   Epoch: 7   Global Step: 98990   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:16:28,095-Speed 3301.13 samples/sec   Loss 5.3123   LearningRate 0.0362   Epoch: 7   Global Step: 99000   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:16:31,275-Speed 3221.75 samples/sec   Loss 5.3152   LearningRate 0.0362   Epoch: 7   Global Step: 99010   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:16:34,378-Speed 3301.33 samples/sec   Loss 5.1789   LearningRate 0.0362   Epoch: 7   Global Step: 99020   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:16:37,567-Speed 3212.24 samples/sec   Loss 5.3635   LearningRate 0.0362   Epoch: 7   Global Step: 99030   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:16:40,693-Speed 3277.03 samples/sec   Loss 5.3338   LearningRate 0.0362   Epoch: 7   Global Step: 99040   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:16:43,844-Speed 3250.20 samples/sec   Loss 5.3363   LearningRate 0.0362   Epoch: 7   Global Step: 99050   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:16:46,918-Speed 3332.86 samples/sec   Loss 5.2045   LearningRate 0.0361   Epoch: 7   Global Step: 99060   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:16:50,053-Speed 3266.58 samples/sec   Loss 5.3575   LearningRate 0.0361   Epoch: 7   Global Step: 99070   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:16:53,159-Speed 3299.41 samples/sec   Loss 5.2544   LearningRate 0.0361   Epoch: 7   Global Step: 99080   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:16:56,199-Speed 3368.47 samples/sec   Loss 5.3941   LearningRate 0.0361   Epoch: 7   Global Step: 99090   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:16:59,321-Speed 3281.45 samples/sec   Loss 5.3243   LearningRate 0.0361   Epoch: 7   Global Step: 99100   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:17:02,526-Speed 3196.28 samples/sec   Loss 5.3282   LearningRate 0.0361   Epoch: 7   Global Step: 99110   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:17:05,650-Speed 3278.15 samples/sec   Loss 5.3278   LearningRate 0.0361   Epoch: 7   Global Step: 99120   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:17:08,705-Speed 3353.46 samples/sec   Loss 5.3814   LearningRate 0.0361   Epoch: 7   Global Step: 99130   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:17:11,812-Speed 3297.14 samples/sec   Loss 5.2240   LearningRate 0.0361   Epoch: 7   Global Step: 99140   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:17:14,883-Speed 3335.34 samples/sec   Loss 5.3870   LearningRate 0.0361   Epoch: 7   Global Step: 99150   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:17:18,007-Speed 3278.75 samples/sec   Loss 5.2867   LearningRate 0.0361   Epoch: 7   Global Step: 99160   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:17:21,071-Speed 3343.32 samples/sec   Loss 5.2553   LearningRate 0.0361   Epoch: 7   Global Step: 99170   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:17:24,124-Speed 3355.49 samples/sec   Loss 5.3868   LearningRate 0.0361   Epoch: 7   Global Step: 99180   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:17:27,197-Speed 3333.09 samples/sec   Loss 5.3271   LearningRate 0.0361   Epoch: 7   Global Step: 99190   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:17:30,307-Speed 3294.41 samples/sec   Loss 5.0953   LearningRate 0.0361   Epoch: 7   Global Step: 99200   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:17:33,396-Speed 3315.37 samples/sec   Loss 5.2916   LearningRate 0.0361   Epoch: 7   Global Step: 99210   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:17:36,541-Speed 3257.18 samples/sec   Loss 5.2514   LearningRate 0.0361   Epoch: 7   Global Step: 99220   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:17:39,647-Speed 3297.63 samples/sec   Loss 5.3067   LearningRate 0.0361   Epoch: 7   Global Step: 99230   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:17:42,853-Speed 3195.10 samples/sec   Loss 5.2032   LearningRate 0.0361   Epoch: 7   Global Step: 99240   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:17:45,921-Speed 3338.22 samples/sec   Loss 5.4143   LearningRate 0.0361   Epoch: 7   Global Step: 99250   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:17:49,050-Speed 3273.65 samples/sec   Loss 5.2581   LearningRate 0.0361   Epoch: 7   Global Step: 99260   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:17:52,197-Speed 3254.70 samples/sec   Loss 5.3013   LearningRate 0.0360   Epoch: 7   Global Step: 99270   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:17:55,387-Speed 3211.70 samples/sec   Loss 5.3604   LearningRate 0.0360   Epoch: 7   Global Step: 99280   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:17:58,475-Speed 3316.80 samples/sec   Loss 5.2102   LearningRate 0.0360   Epoch: 7   Global Step: 99290   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:18:01,567-Speed 3313.57 samples/sec   Loss 5.2439   LearningRate 0.0360   Epoch: 7   Global Step: 99300   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:18:04,720-Speed 3248.60 samples/sec   Loss 5.2453   LearningRate 0.0360   Epoch: 7   Global Step: 99310   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:18:07,797-Speed 3327.79 samples/sec   Loss 5.2922   LearningRate 0.0360   Epoch: 7   Global Step: 99320   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:18:10,856-Speed 3348.97 samples/sec   Loss 5.3457   LearningRate 0.0360   Epoch: 7   Global Step: 99330   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:18:13,989-Speed 3269.71 samples/sec   Loss 5.2908   LearningRate 0.0360   Epoch: 7   Global Step: 99340   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:18:17,074-Speed 3320.49 samples/sec   Loss 5.2586   LearningRate 0.0360   Epoch: 7   Global Step: 99350   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:18:20,349-Speed 3126.90 samples/sec   Loss 5.2860   LearningRate 0.0360   Epoch: 7   Global Step: 99360   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:18:52,330-Speed 320.21 samples/sec   Loss 5.0248   LearningRate 0.0360   Epoch: 8   Global Step: 99370   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:18:55,810-Speed 2944.42 samples/sec   Loss 3.8714   LearningRate 0.0360   Epoch: 8   Global Step: 99380   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:18:58,996-Speed 3215.02 samples/sec   Loss 3.9765   LearningRate 0.0360   Epoch: 8   Global Step: 99390   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:19:02,104-Speed 3295.80 samples/sec   Loss 3.8848   LearningRate 0.0360   Epoch: 8   Global Step: 99400   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:19:05,246-Speed 3259.87 samples/sec   Loss 3.9439   LearningRate 0.0360   Epoch: 8   Global Step: 99410   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:19:08,320-Speed 3332.08 samples/sec   Loss 3.9092   LearningRate 0.0360   Epoch: 8   Global Step: 99420   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:19:11,475-Speed 3247.14 samples/sec   Loss 3.9495   LearningRate 0.0360   Epoch: 8   Global Step: 99430   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:19:14,625-Speed 3251.56 samples/sec   Loss 3.8960   LearningRate 0.0360   Epoch: 8   Global Step: 99440   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:19:17,711-Speed 3319.54 samples/sec   Loss 3.9471   LearningRate 0.0360   Epoch: 8   Global Step: 99450   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:19:20,761-Speed 3358.61 samples/sec   Loss 4.0967   LearningRate 0.0360   Epoch: 8   Global Step: 99460   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:19:23,882-Speed 3281.59 samples/sec   Loss 4.0013   LearningRate 0.0360   Epoch: 8   Global Step: 99470   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:19:26,982-Speed 3304.46 samples/sec   Loss 4.0172   LearningRate 0.0359   Epoch: 8   Global Step: 99480   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:19:30,111-Speed 3273.37 samples/sec   Loss 3.9441   LearningRate 0.0359   Epoch: 8   Global Step: 99490   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:19:33,200-Speed 3315.76 samples/sec   Loss 4.0096   LearningRate 0.0359   Epoch: 8   Global Step: 99500   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:19:36,250-Speed 3358.84 samples/sec   Loss 3.9805   LearningRate 0.0359   Epoch: 8   Global Step: 99510   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:19:39,364-Speed 3289.12 samples/sec   Loss 3.9907   LearningRate 0.0359   Epoch: 8   Global Step: 99520   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:19:42,508-Speed 3258.27 samples/sec   Loss 3.9445   LearningRate 0.0359   Epoch: 8   Global Step: 99530   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:19:45,607-Speed 3304.83 samples/sec   Loss 3.9679   LearningRate 0.0359   Epoch: 8   Global Step: 99540   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:19:48,670-Speed 3344.49 samples/sec   Loss 3.8933   LearningRate 0.0359   Epoch: 8   Global Step: 99550   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:19:51,867-Speed 3204.03 samples/sec   Loss 4.0291   LearningRate 0.0359   Epoch: 8   Global Step: 99560   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:19:54,923-Speed 3352.49 samples/sec   Loss 4.1155   LearningRate 0.0359   Epoch: 8   Global Step: 99570   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:19:57,988-Speed 3342.01 samples/sec   Loss 3.9871   LearningRate 0.0359   Epoch: 8   Global Step: 99580   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:20:01,078-Speed 3314.81 samples/sec   Loss 4.0706   LearningRate 0.0359   Epoch: 8   Global Step: 99590   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:20:04,268-Speed 3210.76 samples/sec   Loss 4.1223   LearningRate 0.0359   Epoch: 8   Global Step: 99600   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:20:07,434-Speed 3235.47 samples/sec   Loss 3.9885   LearningRate 0.0359   Epoch: 8   Global Step: 99610   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:20:10,503-Speed 3337.56 samples/sec   Loss 3.9901   LearningRate 0.0359   Epoch: 8   Global Step: 99620   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:20:13,580-Speed 3328.32 samples/sec   Loss 4.0306   LearningRate 0.0359   Epoch: 8   Global Step: 99630   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:20:16,651-Speed 3336.13 samples/sec   Loss 4.0501   LearningRate 0.0359   Epoch: 8   Global Step: 99640   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:20:19,723-Speed 3334.74 samples/sec   Loss 4.0325   LearningRate 0.0359   Epoch: 8   Global Step: 99650   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:20:22,804-Speed 3324.33 samples/sec   Loss 3.9564   LearningRate 0.0359   Epoch: 8   Global Step: 99660   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:20:25,878-Speed 3332.14 samples/sec   Loss 4.1094   LearningRate 0.0359   Epoch: 8   Global Step: 99670   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:20:28,973-Speed 3309.12 samples/sec   Loss 4.0401   LearningRate 0.0358   Epoch: 8   Global Step: 99680   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:20:32,095-Speed 3281.21 samples/sec   Loss 4.0979   LearningRate 0.0358   Epoch: 8   Global Step: 99690   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:20:35,228-Speed 3269.72 samples/sec   Loss 4.0431   LearningRate 0.0358   Epoch: 8   Global Step: 99700   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:20:38,318-Speed 3314.84 samples/sec   Loss 4.0330   LearningRate 0.0358   Epoch: 8   Global Step: 99710   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:20:41,408-Speed 3314.81 samples/sec   Loss 4.0791   LearningRate 0.0358   Epoch: 8   Global Step: 99720   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:20:44,517-Speed 3295.11 samples/sec   Loss 4.1065   LearningRate 0.0358   Epoch: 8   Global Step: 99730   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:20:47,641-Speed 3279.39 samples/sec   Loss 3.9645   LearningRate 0.0358   Epoch: 8   Global Step: 99740   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:20:50,771-Speed 3272.56 samples/sec   Loss 4.0449   LearningRate 0.0358   Epoch: 8   Global Step: 99750   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:20:53,918-Speed 3254.44 samples/sec   Loss 4.0712   LearningRate 0.0358   Epoch: 8   Global Step: 99760   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:20:57,007-Speed 3315.93 samples/sec   Loss 4.0807   LearningRate 0.0358   Epoch: 8   Global Step: 99770   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:21:00,135-Speed 3275.27 samples/sec   Loss 4.0892   LearningRate 0.0358   Epoch: 8   Global Step: 99780   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:21:03,295-Speed 3242.01 samples/sec   Loss 4.0776   LearningRate 0.0358   Epoch: 8   Global Step: 99790   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:21:06,406-Speed 3291.88 samples/sec   Loss 4.0081   LearningRate 0.0358   Epoch: 8   Global Step: 99800   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:21:09,476-Speed 3337.50 samples/sec   Loss 4.1015   LearningRate 0.0358   Epoch: 8   Global Step: 99810   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:21:12,534-Speed 3349.27 samples/sec   Loss 4.0634   LearningRate 0.0358   Epoch: 8   Global Step: 99820   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:21:15,624-Speed 3314.47 samples/sec   Loss 4.0823   LearningRate 0.0358   Epoch: 8   Global Step: 99830   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:21:18,735-Speed 3292.20 samples/sec   Loss 4.1284   LearningRate 0.0358   Epoch: 8   Global Step: 99840   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:21:21,803-Speed 3339.20 samples/sec   Loss 4.1181   LearningRate 0.0358   Epoch: 8   Global Step: 99850   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:21:24,878-Speed 3331.35 samples/sec   Loss 4.0977   LearningRate 0.0358   Epoch: 8   Global Step: 99860   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:21:27,982-Speed 3299.49 samples/sec   Loss 4.1276   LearningRate 0.0358   Epoch: 8   Global Step: 99870   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:21:31,088-Speed 3298.12 samples/sec   Loss 4.0295   LearningRate 0.0358   Epoch: 8   Global Step: 99880   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:21:34,214-Speed 3277.53 samples/sec   Loss 4.0496   LearningRate 0.0357   Epoch: 8   Global Step: 99890   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:21:37,328-Speed 3288.49 samples/sec   Loss 4.1453   LearningRate 0.0357   Epoch: 8   Global Step: 99900   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:21:40,461-Speed 3270.08 samples/sec   Loss 4.1311   LearningRate 0.0357   Epoch: 8   Global Step: 99910   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:21:43,650-Speed 3212.05 samples/sec   Loss 3.9862   LearningRate 0.0357   Epoch: 8   Global Step: 99920   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:21:46,746-Speed 3308.81 samples/sec   Loss 4.1288   LearningRate 0.0357   Epoch: 8   Global Step: 99930   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:21:49,819-Speed 3332.90 samples/sec   Loss 4.1109   LearningRate 0.0357   Epoch: 8   Global Step: 99940   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:21:52,918-Speed 3305.47 samples/sec   Loss 4.0365   LearningRate 0.0357   Epoch: 8   Global Step: 99950   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:21:56,018-Speed 3304.35 samples/sec   Loss 4.1290   LearningRate 0.0357   Epoch: 8   Global Step: 99960   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:21:59,074-Speed 3351.12 samples/sec   Loss 4.2394   LearningRate 0.0357   Epoch: 8   Global Step: 99970   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:22:02,198-Speed 3279.07 samples/sec   Loss 4.1392   LearningRate 0.0357   Epoch: 8   Global Step: 99980   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:22:05,301-Speed 3301.52 samples/sec   Loss 4.1410   LearningRate 0.0357   Epoch: 8   Global Step: 99990   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:22:08,397-Speed 3308.77 samples/sec   Loss 4.1510   LearningRate 0.0357   Epoch: 8   Global Step: 100000   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:22:11,530-Speed 3269.60 samples/sec   Loss 4.1776   LearningRate 0.0357   Epoch: 8   Global Step: 100010   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:22:14,716-Speed 3215.35 samples/sec   Loss 4.1108   LearningRate 0.0357   Epoch: 8   Global Step: 100020   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:22:17,857-Speed 3260.33 samples/sec   Loss 4.1873   LearningRate 0.0357   Epoch: 8   Global Step: 100030   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:22:20,937-Speed 3325.73 samples/sec   Loss 4.1305   LearningRate 0.0357   Epoch: 8   Global Step: 100040   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:22:24,028-Speed 3314.17 samples/sec   Loss 4.0995   LearningRate 0.0357   Epoch: 8   Global Step: 100050   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:22:27,133-Speed 3298.88 samples/sec   Loss 4.1247   LearningRate 0.0357   Epoch: 8   Global Step: 100060   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:22:30,211-Speed 3327.98 samples/sec   Loss 4.1709   LearningRate 0.0357   Epoch: 8   Global Step: 100070   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:22:33,283-Speed 3334.42 samples/sec   Loss 4.2496   LearningRate 0.0357   Epoch: 8   Global Step: 100080   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:22:36,415-Speed 3270.69 samples/sec   Loss 4.1849   LearningRate 0.0357   Epoch: 8   Global Step: 100090   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:22:39,582-Speed 3234.30 samples/sec   Loss 4.1339   LearningRate 0.0356   Epoch: 8   Global Step: 100100   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:22:42,786-Speed 3197.31 samples/sec   Loss 4.0925   LearningRate 0.0356   Epoch: 8   Global Step: 100110   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:22:45,878-Speed 3312.81 samples/sec   Loss 4.1587   LearningRate 0.0356   Epoch: 8   Global Step: 100120   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:22:49,032-Speed 3247.15 samples/sec   Loss 4.1364   LearningRate 0.0356   Epoch: 8   Global Step: 100130   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:22:52,124-Speed 3313.23 samples/sec   Loss 4.0141   LearningRate 0.0356   Epoch: 8   Global Step: 100140   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:22:55,285-Speed 3240.67 samples/sec   Loss 4.1857   LearningRate 0.0356   Epoch: 8   Global Step: 100150   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:22:58,407-Speed 3281.51 samples/sec   Loss 4.1863   LearningRate 0.0356   Epoch: 8   Global Step: 100160   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:23:01,485-Speed 3327.49 samples/sec   Loss 4.2034   LearningRate 0.0356   Epoch: 8   Global Step: 100170   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:23:04,557-Speed 3334.67 samples/sec   Loss 4.2512   LearningRate 0.0356   Epoch: 8   Global Step: 100180   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:23:07,713-Speed 3245.77 samples/sec   Loss 4.1665   LearningRate 0.0356   Epoch: 8   Global Step: 100190   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:23:10,805-Speed 3312.76 samples/sec   Loss 4.1044   LearningRate 0.0356   Epoch: 8   Global Step: 100200   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:23:13,959-Speed 3247.12 samples/sec   Loss 4.2928   LearningRate 0.0356   Epoch: 8   Global Step: 100210   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:23:17,144-Speed 3216.86 samples/sec   Loss 4.2474   LearningRate 0.0356   Epoch: 8   Global Step: 100220   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:23:20,227-Speed 3321.60 samples/sec   Loss 4.1552   LearningRate 0.0356   Epoch: 8   Global Step: 100230   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:23:23,312-Speed 3320.15 samples/sec   Loss 4.1870   LearningRate 0.0356   Epoch: 8   Global Step: 100240   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:23:26,460-Speed 3254.82 samples/sec   Loss 4.3176   LearningRate 0.0356   Epoch: 8   Global Step: 100250   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:23:29,566-Speed 3298.11 samples/sec   Loss 4.1442   LearningRate 0.0356   Epoch: 8   Global Step: 100260   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:23:32,640-Speed 3331.61 samples/sec   Loss 4.0792   LearningRate 0.0356   Epoch: 8   Global Step: 100270   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:23:35,800-Speed 3242.17 samples/sec   Loss 4.1793   LearningRate 0.0356   Epoch: 8   Global Step: 100280   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:23:38,889-Speed 3315.67 samples/sec   Loss 4.1931   LearningRate 0.0356   Epoch: 8   Global Step: 100290   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:23:42,018-Speed 3273.61 samples/sec   Loss 4.1499   LearningRate 0.0356   Epoch: 8   Global Step: 100300   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:23:45,093-Speed 3331.63 samples/sec   Loss 4.2114   LearningRate 0.0355   Epoch: 8   Global Step: 100310   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:23:48,148-Speed 3352.50 samples/sec   Loss 4.3432   LearningRate 0.0355   Epoch: 8   Global Step: 100320   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:23:51,213-Speed 3342.16 samples/sec   Loss 4.3057   LearningRate 0.0355   Epoch: 8   Global Step: 100330   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:23:54,354-Speed 3261.15 samples/sec   Loss 4.2114   LearningRate 0.0355   Epoch: 8   Global Step: 100340   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:23:57,444-Speed 3315.04 samples/sec   Loss 4.2507   LearningRate 0.0355   Epoch: 8   Global Step: 100350   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:24:00,504-Speed 3348.07 samples/sec   Loss 4.2276   LearningRate 0.0355   Epoch: 8   Global Step: 100360   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:24:03,596-Speed 3312.81 samples/sec   Loss 4.2397   LearningRate 0.0355   Epoch: 8   Global Step: 100370   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:24:06,702-Speed 3297.49 samples/sec   Loss 4.1527   LearningRate 0.0355   Epoch: 8   Global Step: 100380   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:24:09,794-Speed 3313.62 samples/sec   Loss 4.1830   LearningRate 0.0355   Epoch: 8   Global Step: 100390   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:24:12,878-Speed 3320.88 samples/sec   Loss 4.2589   LearningRate 0.0355   Epoch: 8   Global Step: 100400   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:24:15,997-Speed 3284.01 samples/sec   Loss 4.2895   LearningRate 0.0355   Epoch: 8   Global Step: 100410   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:24:19,109-Speed 3291.26 samples/sec   Loss 4.2595   LearningRate 0.0355   Epoch: 8   Global Step: 100420   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:24:22,200-Speed 3314.13 samples/sec   Loss 4.2148   LearningRate 0.0355   Epoch: 8   Global Step: 100430   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:24:25,368-Speed 3233.32 samples/sec   Loss 4.2618   LearningRate 0.0355   Epoch: 8   Global Step: 100440   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:24:28,480-Speed 3291.27 samples/sec   Loss 4.1622   LearningRate 0.0355   Epoch: 8   Global Step: 100450   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:24:31,550-Speed 3336.73 samples/sec   Loss 4.1782   LearningRate 0.0355   Epoch: 8   Global Step: 100460   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:24:34,607-Speed 3350.99 samples/sec   Loss 4.2292   LearningRate 0.0355   Epoch: 8   Global Step: 100470   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:24:37,762-Speed 3246.71 samples/sec   Loss 4.1567   LearningRate 0.0355   Epoch: 8   Global Step: 100480   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:24:40,837-Speed 3330.80 samples/sec   Loss 4.2252   LearningRate 0.0355   Epoch: 8   Global Step: 100490   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-27 10:24:43,939-Speed 3301.98 samples/sec   Loss 4.4533   LearningRate 0.0355   Epoch: 8   Global Step: 100500   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-27 10:24:47,098-Speed 3242.97 samples/sec   Loss 4.2820   LearningRate 0.0355   Epoch: 8   Global Step: 100510   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-27 10:24:50,168-Speed 3336.20 samples/sec   Loss 4.2151   LearningRate 0.0354   Epoch: 8   Global Step: 100520   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-27 10:24:53,353-Speed 3216.40 samples/sec   Loss 4.2765   LearningRate 0.0354   Epoch: 8   Global Step: 100530   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-27 10:24:56,446-Speed 3311.39 samples/sec   Loss 4.2467   LearningRate 0.0354   Epoch: 8   Global Step: 100540   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-27 10:24:59,583-Speed 3265.33 samples/sec   Loss 4.2179   LearningRate 0.0354   Epoch: 8   Global Step: 100550   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-27 10:25:02,714-Speed 3271.81 samples/sec   Loss 4.3416   LearningRate 0.0354   Epoch: 8   Global Step: 100560   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-27 10:25:05,838-Speed 3278.84 samples/sec   Loss 4.2893   LearningRate 0.0354   Epoch: 8   Global Step: 100570   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-27 10:25:08,949-Speed 3293.02 samples/sec   Loss 4.2388   LearningRate 0.0354   Epoch: 8   Global Step: 100580   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-04-27 10:25:12,047-Speed 3305.89 samples/sec   Loss 4.3066   LearningRate 0.0354   Epoch: 8   Global Step: 100590   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:25:15,148-Speed 3304.45 samples/sec   Loss 4.2569   LearningRate 0.0354   Epoch: 8   Global Step: 100600   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:25:18,209-Speed 3345.94 samples/sec   Loss 4.2806   LearningRate 0.0354   Epoch: 8   Global Step: 100610   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:25:21,325-Speed 3287.23 samples/sec   Loss 4.3225   LearningRate 0.0354   Epoch: 8   Global Step: 100620   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:25:24,532-Speed 3193.75 samples/sec   Loss 4.2211   LearningRate 0.0354   Epoch: 8   Global Step: 100630   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:25:27,708-Speed 3225.71 samples/sec   Loss 4.2047   LearningRate 0.0354   Epoch: 8   Global Step: 100640   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:25:30,858-Speed 3251.23 samples/sec   Loss 4.2328   LearningRate 0.0354   Epoch: 8   Global Step: 100650   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:25:33,938-Speed 3325.34 samples/sec   Loss 4.2526   LearningRate 0.0354   Epoch: 8   Global Step: 100660   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:25:37,094-Speed 3246.23 samples/sec   Loss 4.2625   LearningRate 0.0354   Epoch: 8   Global Step: 100670   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:25:40,227-Speed 3269.72 samples/sec   Loss 4.3428   LearningRate 0.0354   Epoch: 8   Global Step: 100680   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:25:43,419-Speed 3208.73 samples/sec   Loss 4.2791   LearningRate 0.0354   Epoch: 8   Global Step: 100690   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:25:46,510-Speed 3314.26 samples/sec   Loss 4.3020   LearningRate 0.0354   Epoch: 8   Global Step: 100700   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:25:49,652-Speed 3259.96 samples/sec   Loss 4.2187   LearningRate 0.0354   Epoch: 8   Global Step: 100710   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:25:52,803-Speed 3250.11 samples/sec   Loss 4.3909   LearningRate 0.0353   Epoch: 8   Global Step: 100720   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:25:55,908-Speed 3299.43 samples/sec   Loss 4.2882   LearningRate 0.0353   Epoch: 8   Global Step: 100730   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:25:58,984-Speed 3330.62 samples/sec   Loss 4.2967   LearningRate 0.0353   Epoch: 8   Global Step: 100740   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:26:02,140-Speed 3245.67 samples/sec   Loss 4.2902   LearningRate 0.0353   Epoch: 8   Global Step: 100750   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:26:05,225-Speed 3319.91 samples/sec   Loss 4.3019   LearningRate 0.0353   Epoch: 8   Global Step: 100760   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:26:08,288-Speed 3344.31 samples/sec   Loss 4.2648   LearningRate 0.0353   Epoch: 8   Global Step: 100770   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:26:11,377-Speed 3315.95 samples/sec   Loss 4.2737   LearningRate 0.0353   Epoch: 8   Global Step: 100780   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:26:14,472-Speed 3310.19 samples/sec   Loss 4.3572   LearningRate 0.0353   Epoch: 8   Global Step: 100790   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:26:17,567-Speed 3309.49 samples/sec   Loss 4.2817   LearningRate 0.0353   Epoch: 8   Global Step: 100800   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:26:20,647-Speed 3324.79 samples/sec   Loss 4.3646   LearningRate 0.0353   Epoch: 8   Global Step: 100810   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:26:23,713-Speed 3341.15 samples/sec   Loss 4.2804   LearningRate 0.0353   Epoch: 8   Global Step: 100820   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:26:26,804-Speed 3314.56 samples/sec   Loss 4.3192   LearningRate 0.0353   Epoch: 8   Global Step: 100830   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:26:29,968-Speed 3237.20 samples/sec   Loss 4.2754   LearningRate 0.0353   Epoch: 8   Global Step: 100840   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:26:33,057-Speed 3315.94 samples/sec   Loss 4.2997   LearningRate 0.0353   Epoch: 8   Global Step: 100850   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:26:36,248-Speed 3209.91 samples/sec   Loss 4.3510   LearningRate 0.0353   Epoch: 8   Global Step: 100860   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:26:39,418-Speed 3230.66 samples/sec   Loss 4.3395   LearningRate 0.0353   Epoch: 8   Global Step: 100870   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:26:42,539-Speed 3282.33 samples/sec   Loss 4.2751   LearningRate 0.0353   Epoch: 8   Global Step: 100880   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:26:45,624-Speed 3320.53 samples/sec   Loss 4.3408   LearningRate 0.0353   Epoch: 8   Global Step: 100890   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:26:48,793-Speed 3231.67 samples/sec   Loss 4.3584   LearningRate 0.0353   Epoch: 8   Global Step: 100900   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:26:51,909-Speed 3287.74 samples/sec   Loss 4.3743   LearningRate 0.0353   Epoch: 8   Global Step: 100910   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:26:55,092-Speed 3218.53 samples/sec   Loss 4.3693   LearningRate 0.0353   Epoch: 8   Global Step: 100920   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:26:58,130-Speed 3370.90 samples/sec   Loss 4.3171   LearningRate 0.0352   Epoch: 8   Global Step: 100930   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:27:01,252-Speed 3280.70 samples/sec   Loss 4.2990   LearningRate 0.0352   Epoch: 8   Global Step: 100940   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:27:04,336-Speed 3322.07 samples/sec   Loss 4.3385   LearningRate 0.0352   Epoch: 8   Global Step: 100950   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:27:07,518-Speed 3218.81 samples/sec   Loss 4.3060   LearningRate 0.0352   Epoch: 8   Global Step: 100960   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:27:10,625-Speed 3296.24 samples/sec   Loss 4.4000   LearningRate 0.0352   Epoch: 8   Global Step: 100970   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:27:13,772-Speed 3254.85 samples/sec   Loss 4.2863   LearningRate 0.0352   Epoch: 8   Global Step: 100980   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:27:16,868-Speed 3308.72 samples/sec   Loss 4.3171   LearningRate 0.0352   Epoch: 8   Global Step: 100990   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:27:19,956-Speed 3317.65 samples/sec   Loss 4.4145   LearningRate 0.0352   Epoch: 8   Global Step: 101000   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:27:23,018-Speed 3344.99 samples/sec   Loss 4.3590   LearningRate 0.0352   Epoch: 8   Global Step: 101010   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:27:26,092-Speed 3332.05 samples/sec   Loss 4.3036   LearningRate 0.0352   Epoch: 8   Global Step: 101020   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:27:29,169-Speed 3328.85 samples/sec   Loss 4.4496   LearningRate 0.0352   Epoch: 8   Global Step: 101030   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:27:32,251-Speed 3324.20 samples/sec   Loss 4.3785   LearningRate 0.0352   Epoch: 8   Global Step: 101040   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:27:35,406-Speed 3246.08 samples/sec   Loss 4.2895   LearningRate 0.0352   Epoch: 8   Global Step: 101050   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:27:38,552-Speed 3255.75 samples/sec   Loss 4.3030   LearningRate 0.0352   Epoch: 8   Global Step: 101060   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:27:41,649-Speed 3308.23 samples/sec   Loss 4.3187   LearningRate 0.0352   Epoch: 8   Global Step: 101070   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:27:44,731-Speed 3323.57 samples/sec   Loss 4.3490   LearningRate 0.0352   Epoch: 8   Global Step: 101080   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:27:47,851-Speed 3282.50 samples/sec   Loss 4.3840   LearningRate 0.0352   Epoch: 8   Global Step: 101090   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:27:51,092-Speed 3160.94 samples/sec   Loss 4.3801   LearningRate 0.0352   Epoch: 8   Global Step: 101100   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:27:54,246-Speed 3247.46 samples/sec   Loss 4.3385   LearningRate 0.0352   Epoch: 8   Global Step: 101110   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:27:57,337-Speed 3313.96 samples/sec   Loss 4.3365   LearningRate 0.0352   Epoch: 8   Global Step: 101120   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:28:00,514-Speed 3224.07 samples/sec   Loss 4.3487   LearningRate 0.0352   Epoch: 8   Global Step: 101130   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:28:03,692-Speed 3223.74 samples/sec   Loss 4.3464   LearningRate 0.0351   Epoch: 8   Global Step: 101140   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:28:06,886-Speed 3206.88 samples/sec   Loss 4.3792   LearningRate 0.0351   Epoch: 8   Global Step: 101150   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:28:09,955-Speed 3337.77 samples/sec   Loss 4.3660   LearningRate 0.0351   Epoch: 8   Global Step: 101160   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:28:13,011-Speed 3351.69 samples/sec   Loss 4.4209   LearningRate 0.0351   Epoch: 8   Global Step: 101170   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:28:16,211-Speed 3201.11 samples/sec   Loss 4.4275   LearningRate 0.0351   Epoch: 8   Global Step: 101180   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:28:19,316-Speed 3298.83 samples/sec   Loss 4.2504   LearningRate 0.0351   Epoch: 8   Global Step: 101190   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:28:22,412-Speed 3308.83 samples/sec   Loss 4.3485   LearningRate 0.0351   Epoch: 8   Global Step: 101200   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:28:25,529-Speed 3286.44 samples/sec   Loss 4.3731   LearningRate 0.0351   Epoch: 8   Global Step: 101210   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:28:28,728-Speed 3201.79 samples/sec   Loss 4.3516   LearningRate 0.0351   Epoch: 8   Global Step: 101220   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:28:31,818-Speed 3315.33 samples/sec   Loss 4.3552   LearningRate 0.0351   Epoch: 8   Global Step: 101230   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:28:34,928-Speed 3294.14 samples/sec   Loss 4.3796   LearningRate 0.0351   Epoch: 8   Global Step: 101240   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:28:38,023-Speed 3309.20 samples/sec   Loss 4.2667   LearningRate 0.0351   Epoch: 8   Global Step: 101250   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:28:41,105-Speed 3323.63 samples/sec   Loss 4.4227   LearningRate 0.0351   Epoch: 8   Global Step: 101260   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:28:44,208-Speed 3301.61 samples/sec   Loss 4.3903   LearningRate 0.0351   Epoch: 8   Global Step: 101270   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:28:47,277-Speed 3336.64 samples/sec   Loss 4.3816   LearningRate 0.0351   Epoch: 8   Global Step: 101280   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:28:50,367-Speed 3316.04 samples/sec   Loss 4.4185   LearningRate 0.0351   Epoch: 8   Global Step: 101290   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:28:53,434-Speed 3339.60 samples/sec   Loss 4.5434   LearningRate 0.0351   Epoch: 8   Global Step: 101300   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:28:56,500-Speed 3340.84 samples/sec   Loss 4.4285   LearningRate 0.0351   Epoch: 8   Global Step: 101310   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:28:59,651-Speed 3252.18 samples/sec   Loss 4.4279   LearningRate 0.0351   Epoch: 8   Global Step: 101320   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:29:02,781-Speed 3272.47 samples/sec   Loss 4.4453   LearningRate 0.0351   Epoch: 8   Global Step: 101330   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:29:05,879-Speed 3306.73 samples/sec   Loss 4.4307   LearningRate 0.0351   Epoch: 8   Global Step: 101340   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:29:08,942-Speed 3344.17 samples/sec   Loss 4.3622   LearningRate 0.0350   Epoch: 8   Global Step: 101350   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:29:12,025-Speed 3323.15 samples/sec   Loss 4.5010   LearningRate 0.0350   Epoch: 8   Global Step: 101360   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:29:15,095-Speed 3336.56 samples/sec   Loss 4.3219   LearningRate 0.0350   Epoch: 8   Global Step: 101370   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:29:18,255-Speed 3241.55 samples/sec   Loss 4.4356   LearningRate 0.0350   Epoch: 8   Global Step: 101380   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:29:21,309-Speed 3353.66 samples/sec   Loss 4.4041   LearningRate 0.0350   Epoch: 8   Global Step: 101390   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:29:24,461-Speed 3249.42 samples/sec   Loss 4.5370   LearningRate 0.0350   Epoch: 8   Global Step: 101400   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:29:27,602-Speed 3261.48 samples/sec   Loss 4.3870   LearningRate 0.0350   Epoch: 8   Global Step: 101410   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:29:30,789-Speed 3213.74 samples/sec   Loss 4.4095   LearningRate 0.0350   Epoch: 8   Global Step: 101420   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:29:33,900-Speed 3293.05 samples/sec   Loss 4.4756   LearningRate 0.0350   Epoch: 8   Global Step: 101430   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:29:36,993-Speed 3311.80 samples/sec   Loss 4.3050   LearningRate 0.0350   Epoch: 8   Global Step: 101440   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:29:40,069-Speed 3330.34 samples/sec   Loss 4.4669   LearningRate 0.0350   Epoch: 8   Global Step: 101450   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:29:43,162-Speed 3311.53 samples/sec   Loss 4.6072   LearningRate 0.0350   Epoch: 8   Global Step: 101460   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:29:46,260-Speed 3306.63 samples/sec   Loss 4.4922   LearningRate 0.0350   Epoch: 8   Global Step: 101470   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:29:49,358-Speed 3305.95 samples/sec   Loss 4.3451   LearningRate 0.0350   Epoch: 8   Global Step: 101480   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:29:52,531-Speed 3228.33 samples/sec   Loss 4.4352   LearningRate 0.0350   Epoch: 8   Global Step: 101490   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:29:55,594-Speed 3344.55 samples/sec   Loss 4.4883   LearningRate 0.0350   Epoch: 8   Global Step: 101500   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:29:58,701-Speed 3296.69 samples/sec   Loss 4.4866   LearningRate 0.0350   Epoch: 8   Global Step: 101510   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:30:01,783-Speed 3323.51 samples/sec   Loss 4.4344   LearningRate 0.0350   Epoch: 8   Global Step: 101520   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:30:04,872-Speed 3316.66 samples/sec   Loss 4.4024   LearningRate 0.0350   Epoch: 8   Global Step: 101530   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:30:08,058-Speed 3214.80 samples/sec   Loss 4.4893   LearningRate 0.0350   Epoch: 8   Global Step: 101540   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:30:11,170-Speed 3291.35 samples/sec   Loss 4.4274   LearningRate 0.0350   Epoch: 8   Global Step: 101550   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:30:14,303-Speed 3269.70 samples/sec   Loss 4.5283   LearningRate 0.0349   Epoch: 8   Global Step: 101560   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:30:17,423-Speed 3283.10 samples/sec   Loss 4.3755   LearningRate 0.0349   Epoch: 8   Global Step: 101570   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:30:20,504-Speed 3325.37 samples/sec   Loss 4.3768   LearningRate 0.0349   Epoch: 8   Global Step: 101580   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:30:23,592-Speed 3317.13 samples/sec   Loss 4.4450   LearningRate 0.0349   Epoch: 8   Global Step: 101590   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:30:26,694-Speed 3302.22 samples/sec   Loss 4.4091   LearningRate 0.0349   Epoch: 8   Global Step: 101600   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:30:29,793-Speed 3305.16 samples/sec   Loss 4.4275   LearningRate 0.0349   Epoch: 8   Global Step: 101610   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:30:32,894-Speed 3303.31 samples/sec   Loss 4.4021   LearningRate 0.0349   Epoch: 8   Global Step: 101620   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:30:36,057-Speed 3238.18 samples/sec   Loss 4.5256   LearningRate 0.0349   Epoch: 8   Global Step: 101630   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:30:39,184-Speed 3275.90 samples/sec   Loss 4.3146   LearningRate 0.0349   Epoch: 8   Global Step: 101640   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:30:42,296-Speed 3291.65 samples/sec   Loss 4.4896   LearningRate 0.0349   Epoch: 8   Global Step: 101650   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:30:45,402-Speed 3298.34 samples/sec   Loss 4.5004   LearningRate 0.0349   Epoch: 8   Global Step: 101660   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:30:48,534-Speed 3270.81 samples/sec   Loss 4.4322   LearningRate 0.0349   Epoch: 8   Global Step: 101670   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:30:51,651-Speed 3285.57 samples/sec   Loss 4.4949   LearningRate 0.0349   Epoch: 8   Global Step: 101680   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:30:54,696-Speed 3364.18 samples/sec   Loss 4.4620   LearningRate 0.0349   Epoch: 8   Global Step: 101690   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:30:57,783-Speed 3318.02 samples/sec   Loss 4.4583   LearningRate 0.0349   Epoch: 8   Global Step: 101700   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:31:01,001-Speed 3183.25 samples/sec   Loss 4.4989   LearningRate 0.0349   Epoch: 8   Global Step: 101710   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:31:04,116-Speed 3288.74 samples/sec   Loss 4.4996   LearningRate 0.0349   Epoch: 8   Global Step: 101720   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:31:07,189-Speed 3333.71 samples/sec   Loss 4.5533   LearningRate 0.0349   Epoch: 8   Global Step: 101730   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:31:10,258-Speed 3337.18 samples/sec   Loss 4.5453   LearningRate 0.0349   Epoch: 8   Global Step: 101740   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:31:13,374-Speed 3287.27 samples/sec   Loss 4.4911   LearningRate 0.0349   Epoch: 8   Global Step: 101750   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:31:16,486-Speed 3292.14 samples/sec   Loss 4.5375   LearningRate 0.0349   Epoch: 8   Global Step: 101760   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:31:19,610-Speed 3278.33 samples/sec   Loss 4.3993   LearningRate 0.0348   Epoch: 8   Global Step: 101770   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:31:22,684-Speed 3331.50 samples/sec   Loss 4.5237   LearningRate 0.0348   Epoch: 8   Global Step: 101780   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:31:25,807-Speed 3280.90 samples/sec   Loss 4.5179   LearningRate 0.0348   Epoch: 8   Global Step: 101790   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:31:28,959-Speed 3249.98 samples/sec   Loss 4.3616   LearningRate 0.0348   Epoch: 8   Global Step: 101800   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:31:32,059-Speed 3304.33 samples/sec   Loss 4.4223   LearningRate 0.0348   Epoch: 8   Global Step: 101810   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:31:35,158-Speed 3305.07 samples/sec   Loss 4.5534   LearningRate 0.0348   Epoch: 8   Global Step: 101820   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:31:38,238-Speed 3326.00 samples/sec   Loss 4.4696   LearningRate 0.0348   Epoch: 8   Global Step: 101830   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:31:41,315-Speed 3328.37 samples/sec   Loss 4.5882   LearningRate 0.0348   Epoch: 8   Global Step: 101840   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:31:44,398-Speed 3322.50 samples/sec   Loss 4.5067   LearningRate 0.0348   Epoch: 8   Global Step: 101850   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:31:47,578-Speed 3220.95 samples/sec   Loss 4.5549   LearningRate 0.0348   Epoch: 8   Global Step: 101860   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:31:50,733-Speed 3246.86 samples/sec   Loss 4.5152   LearningRate 0.0348   Epoch: 8   Global Step: 101870   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:31:53,887-Speed 3247.45 samples/sec   Loss 4.5022   LearningRate 0.0348   Epoch: 8   Global Step: 101880   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:31:56,984-Speed 3307.62 samples/sec   Loss 4.4392   LearningRate 0.0348   Epoch: 8   Global Step: 101890   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:32:00,069-Speed 3320.62 samples/sec   Loss 4.4415   LearningRate 0.0348   Epoch: 8   Global Step: 101900   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:32:03,177-Speed 3295.54 samples/sec   Loss 4.4657   LearningRate 0.0348   Epoch: 8   Global Step: 101910   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:32:06,283-Speed 3299.06 samples/sec   Loss 4.5275   LearningRate 0.0348   Epoch: 8   Global Step: 101920   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:32:09,400-Speed 3286.46 samples/sec   Loss 4.6317   LearningRate 0.0348   Epoch: 8   Global Step: 101930   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:32:12,553-Speed 3249.11 samples/sec   Loss 4.4299   LearningRate 0.0348   Epoch: 8   Global Step: 101940   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:32:15,643-Speed 3314.30 samples/sec   Loss 4.5406   LearningRate 0.0348   Epoch: 8   Global Step: 101950   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:32:18,755-Speed 3292.24 samples/sec   Loss 4.4639   LearningRate 0.0348   Epoch: 8   Global Step: 101960   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:32:21,827-Speed 3334.26 samples/sec   Loss 4.4982   LearningRate 0.0348   Epoch: 8   Global Step: 101970   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:32:24,920-Speed 3311.82 samples/sec   Loss 4.5486   LearningRate 0.0347   Epoch: 8   Global Step: 101980   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:32:28,047-Speed 3275.97 samples/sec   Loss 4.4800   LearningRate 0.0347   Epoch: 8   Global Step: 101990   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:32:31,155-Speed 3295.99 samples/sec   Loss 4.4989   LearningRate 0.0347   Epoch: 8   Global Step: 102000   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:32:34,230-Speed 3331.14 samples/sec   Loss 4.4168   LearningRate 0.0347   Epoch: 8   Global Step: 102010   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:32:37,326-Speed 3307.97 samples/sec   Loss 4.5291   LearningRate 0.0347   Epoch: 8   Global Step: 102020   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:32:40,409-Speed 3322.34 samples/sec   Loss 4.4602   LearningRate 0.0347   Epoch: 8   Global Step: 102030   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:32:43,471-Speed 3345.84 samples/sec   Loss 4.4666   LearningRate 0.0347   Epoch: 8   Global Step: 102040   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:32:46,568-Speed 3307.41 samples/sec   Loss 4.5392   LearningRate 0.0347   Epoch: 8   Global Step: 102050   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:32:49,672-Speed 3299.95 samples/sec   Loss 4.5503   LearningRate 0.0347   Epoch: 8   Global Step: 102060   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:32:52,781-Speed 3294.83 samples/sec   Loss 4.4417   LearningRate 0.0347   Epoch: 8   Global Step: 102070   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:32:55,952-Speed 3230.30 samples/sec   Loss 4.3787   LearningRate 0.0347   Epoch: 8   Global Step: 102080   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:32:59,046-Speed 3310.65 samples/sec   Loss 4.5295   LearningRate 0.0347   Epoch: 8   Global Step: 102090   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:33:02,140-Speed 3310.99 samples/sec   Loss 4.6324   LearningRate 0.0347   Epoch: 8   Global Step: 102100   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:33:05,243-Speed 3300.59 samples/sec   Loss 4.5079   LearningRate 0.0347   Epoch: 8   Global Step: 102110   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:33:08,380-Speed 3265.80 samples/sec   Loss 4.5894   LearningRate 0.0347   Epoch: 8   Global Step: 102120   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:33:11,543-Speed 3237.56 samples/sec   Loss 4.6007   LearningRate 0.0347   Epoch: 8   Global Step: 102130   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:33:14,637-Speed 3311.44 samples/sec   Loss 4.5649   LearningRate 0.0347   Epoch: 8   Global Step: 102140   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:33:17,756-Speed 3283.26 samples/sec   Loss 4.5704   LearningRate 0.0347   Epoch: 8   Global Step: 102150   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:33:20,858-Speed 3302.57 samples/sec   Loss 4.5439   LearningRate 0.0347   Epoch: 8   Global Step: 102160   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:33:23,995-Speed 3264.59 samples/sec   Loss 4.5744   LearningRate 0.0347   Epoch: 8   Global Step: 102170   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:33:27,094-Speed 3306.06 samples/sec   Loss 4.5224   LearningRate 0.0347   Epoch: 8   Global Step: 102180   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:33:30,233-Speed 3263.54 samples/sec   Loss 4.4865   LearningRate 0.0346   Epoch: 8   Global Step: 102190   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:33:33,375-Speed 3259.57 samples/sec   Loss 4.5636   LearningRate 0.0346   Epoch: 8   Global Step: 102200   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:33:36,526-Speed 3250.90 samples/sec   Loss 4.4601   LearningRate 0.0346   Epoch: 8   Global Step: 102210   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:33:39,665-Speed 3262.55 samples/sec   Loss 4.5376   LearningRate 0.0346   Epoch: 8   Global Step: 102220   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:33:42,773-Speed 3296.24 samples/sec   Loss 4.5342   LearningRate 0.0346   Epoch: 8   Global Step: 102230   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:33:45,876-Speed 3301.52 samples/sec   Loss 4.5486   LearningRate 0.0346   Epoch: 8   Global Step: 102240   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:33:49,044-Speed 3232.82 samples/sec   Loss 4.5963   LearningRate 0.0346   Epoch: 8   Global Step: 102250   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:33:52,154-Speed 3293.68 samples/sec   Loss 4.5941   LearningRate 0.0346   Epoch: 8   Global Step: 102260   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:33:55,307-Speed 3248.78 samples/sec   Loss 4.4866   LearningRate 0.0346   Epoch: 8   Global Step: 102270   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:33:58,379-Speed 3334.42 samples/sec   Loss 4.5843   LearningRate 0.0346   Epoch: 8   Global Step: 102280   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:34:01,519-Speed 3262.89 samples/sec   Loss 4.5485   LearningRate 0.0346   Epoch: 8   Global Step: 102290   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:34:04,581-Speed 3345.13 samples/sec   Loss 4.5498   LearningRate 0.0346   Epoch: 8   Global Step: 102300   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:34:07,652-Speed 3334.96 samples/sec   Loss 4.5574   LearningRate 0.0346   Epoch: 8   Global Step: 102310   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:34:10,725-Speed 3333.39 samples/sec   Loss 4.5327   LearningRate 0.0346   Epoch: 8   Global Step: 102320   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:34:13,933-Speed 3193.03 samples/sec   Loss 4.6138   LearningRate 0.0346   Epoch: 8   Global Step: 102330   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:34:17,120-Speed 3214.03 samples/sec   Loss 4.5973   LearningRate 0.0346   Epoch: 8   Global Step: 102340   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:34:20,227-Speed 3297.19 samples/sec   Loss 4.5240   LearningRate 0.0346   Epoch: 8   Global Step: 102350   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:34:23,374-Speed 3254.50 samples/sec   Loss 4.5334   LearningRate 0.0346   Epoch: 8   Global Step: 102360   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:34:26,436-Speed 3345.17 samples/sec   Loss 4.4757   LearningRate 0.0346   Epoch: 8   Global Step: 102370   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:34:29,613-Speed 3223.85 samples/sec   Loss 4.5604   LearningRate 0.0346   Epoch: 8   Global Step: 102380   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:34:32,705-Speed 3313.36 samples/sec   Loss 4.5553   LearningRate 0.0346   Epoch: 8   Global Step: 102390   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:34:35,883-Speed 3223.79 samples/sec   Loss 4.5469   LearningRate 0.0346   Epoch: 8   Global Step: 102400   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:34:39,003-Speed 3281.92 samples/sec   Loss 4.5401   LearningRate 0.0345   Epoch: 8   Global Step: 102410   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:34:42,112-Speed 3294.67 samples/sec   Loss 4.5302   LearningRate 0.0345   Epoch: 8   Global Step: 102420   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:34:45,192-Speed 3327.24 samples/sec   Loss 4.6033   LearningRate 0.0345   Epoch: 8   Global Step: 102430   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:34:48,348-Speed 3245.18 samples/sec   Loss 4.5392   LearningRate 0.0345   Epoch: 8   Global Step: 102440   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:34:51,479-Speed 3271.31 samples/sec   Loss 4.5597   LearningRate 0.0345   Epoch: 8   Global Step: 102450   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:34:54,631-Speed 3250.24 samples/sec   Loss 4.5888   LearningRate 0.0345   Epoch: 8   Global Step: 102460   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:34:57,729-Speed 3306.52 samples/sec   Loss 4.5506   LearningRate 0.0345   Epoch: 8   Global Step: 102470   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:35:00,869-Speed 3262.10 samples/sec   Loss 4.6134   LearningRate 0.0345   Epoch: 8   Global Step: 102480   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:35:03,959-Speed 3314.44 samples/sec   Loss 4.5439   LearningRate 0.0345   Epoch: 8   Global Step: 102490   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:35:07,039-Speed 3326.66 samples/sec   Loss 4.5673   LearningRate 0.0345   Epoch: 8   Global Step: 102500   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:35:10,127-Speed 3316.82 samples/sec   Loss 4.6349   LearningRate 0.0345   Epoch: 8   Global Step: 102510   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:35:13,217-Speed 3314.90 samples/sec   Loss 4.6472   LearningRate 0.0345   Epoch: 8   Global Step: 102520   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:35:16,398-Speed 3220.83 samples/sec   Loss 4.5491   LearningRate 0.0345   Epoch: 8   Global Step: 102530   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:35:19,473-Speed 3330.75 samples/sec   Loss 4.6338   LearningRate 0.0345   Epoch: 8   Global Step: 102540   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:35:22,552-Speed 3326.89 samples/sec   Loss 4.5110   LearningRate 0.0345   Epoch: 8   Global Step: 102550   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:35:25,623-Speed 3335.54 samples/sec   Loss 4.5941   LearningRate 0.0345   Epoch: 8   Global Step: 102560   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:35:28,763-Speed 3262.52 samples/sec   Loss 4.6598   LearningRate 0.0345   Epoch: 8   Global Step: 102570   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:35:31,904-Speed 3261.82 samples/sec   Loss 4.6708   LearningRate 0.0345   Epoch: 8   Global Step: 102580   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:35:35,049-Speed 3256.85 samples/sec   Loss 4.6129   LearningRate 0.0345   Epoch: 8   Global Step: 102590   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:35:38,140-Speed 3313.22 samples/sec   Loss 4.5855   LearningRate 0.0345   Epoch: 8   Global Step: 102600   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:35:41,255-Speed 3288.27 samples/sec   Loss 4.6240   LearningRate 0.0345   Epoch: 8   Global Step: 102610   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:35:44,332-Speed 3329.12 samples/sec   Loss 4.5366   LearningRate 0.0344   Epoch: 8   Global Step: 102620   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:35:47,398-Speed 3341.62 samples/sec   Loss 4.5494   LearningRate 0.0344   Epoch: 8   Global Step: 102630   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:35:50,544-Speed 3255.70 samples/sec   Loss 4.5203   LearningRate 0.0344   Epoch: 8   Global Step: 102640   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:35:53,639-Speed 3310.06 samples/sec   Loss 4.6153   LearningRate 0.0344   Epoch: 8   Global Step: 102650   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:35:56,716-Speed 3328.27 samples/sec   Loss 4.6203   LearningRate 0.0344   Epoch: 8   Global Step: 102660   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:35:59,808-Speed 3313.26 samples/sec   Loss 4.5645   LearningRate 0.0344   Epoch: 8   Global Step: 102670   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:36:02,925-Speed 3286.58 samples/sec   Loss 4.5845   LearningRate 0.0344   Epoch: 8   Global Step: 102680   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:36:06,035-Speed 3293.52 samples/sec   Loss 4.5166   LearningRate 0.0344   Epoch: 8   Global Step: 102690   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:36:09,128-Speed 3311.50 samples/sec   Loss 4.6140   LearningRate 0.0344   Epoch: 8   Global Step: 102700   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-04-27 10:36:12,199-Speed 3335.33 samples/sec   Loss 4.6680   LearningRate 0.0344   Epoch: 8   Global Step: 102710   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:36:15,335-Speed 3266.55 samples/sec   Loss 4.6267   LearningRate 0.0344   Epoch: 8   Global Step: 102720   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:36:18,419-Speed 3321.40 samples/sec   Loss 4.5571   LearningRate 0.0344   Epoch: 8   Global Step: 102730   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:36:21,479-Speed 3347.81 samples/sec   Loss 4.5262   LearningRate 0.0344   Epoch: 8   Global Step: 102740   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:36:24,583-Speed 3299.54 samples/sec   Loss 4.5477   LearningRate 0.0344   Epoch: 8   Global Step: 102750   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:36:27,659-Speed 3330.40 samples/sec   Loss 4.6393   LearningRate 0.0344   Epoch: 8   Global Step: 102760   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:36:30,748-Speed 3316.02 samples/sec   Loss 4.5375   LearningRate 0.0344   Epoch: 8   Global Step: 102770   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:36:33,837-Speed 3315.69 samples/sec   Loss 4.6641   LearningRate 0.0344   Epoch: 8   Global Step: 102780   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:36:36,961-Speed 3279.25 samples/sec   Loss 4.5856   LearningRate 0.0344   Epoch: 8   Global Step: 102790   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:36:40,087-Speed 3276.94 samples/sec   Loss 4.4936   LearningRate 0.0344   Epoch: 8   Global Step: 102800   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:36:43,200-Speed 3289.61 samples/sec   Loss 4.6028   LearningRate 0.0344   Epoch: 8   Global Step: 102810   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-04-27 10:36:46,250-Speed 3359.70 samples/sec   Loss 4.5381   LearningRate 0.0344   Epoch: 8   Global Step: 102820   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:36:49,330-Speed 3325.11 samples/sec   Loss 4.5618   LearningRate 0.0343   Epoch: 8   Global Step: 102830   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:36:52,461-Speed 3271.52 samples/sec   Loss 4.5938   LearningRate 0.0343   Epoch: 8   Global Step: 102840   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:36:55,533-Speed 3334.15 samples/sec   Loss 4.7107   LearningRate 0.0343   Epoch: 8   Global Step: 102850   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:36:58,616-Speed 3322.85 samples/sec   Loss 4.6234   LearningRate 0.0343   Epoch: 8   Global Step: 102860   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:37:01,761-Speed 3256.97 samples/sec   Loss 4.5711   LearningRate 0.0343   Epoch: 8   Global Step: 102870   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:37:04,852-Speed 3314.10 samples/sec   Loss 4.5956   LearningRate 0.0343   Epoch: 8   Global Step: 102880   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-04-27 10:37:08,001-Speed 3253.12 samples/sec   Loss 4.6073   LearningRate 0.0343   Epoch: 8   Global Step: 102890   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:37:11,073-Speed 3334.30 samples/sec   Loss 4.5781   LearningRate 0.0343   Epoch: 8   Global Step: 102900   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:37:14,271-Speed 3202.90 samples/sec   Loss 4.6293   LearningRate 0.0343   Epoch: 8   Global Step: 102910   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:37:17,427-Speed 3245.84 samples/sec   Loss 4.6210   LearningRate 0.0343   Epoch: 8   Global Step: 102920   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 10:37:20,517-Speed 3314.52 samples/sec   Loss 4.5752   LearningRate 0.0343   Epoch: 8   Global Step: 102930   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 10:37:23,631-Speed 3289.49 samples/sec   Loss 4.7203   LearningRate 0.0343   Epoch: 8   Global Step: 102940   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 10:37:26,786-Speed 3247.04 samples/sec   Loss 4.4796   LearningRate 0.0343   Epoch: 8   Global Step: 102950   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:37:29,873-Speed 3318.53 samples/sec   Loss 4.6123   LearningRate 0.0343   Epoch: 8   Global Step: 102960   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:37:32,945-Speed 3333.72 samples/sec   Loss 4.6654   LearningRate 0.0343   Epoch: 8   Global Step: 102970   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:37:36,008-Speed 3344.44 samples/sec   Loss 4.6680   LearningRate 0.0343   Epoch: 8   Global Step: 102980   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:37:39,105-Speed 3307.93 samples/sec   Loss 4.5884   LearningRate 0.0343   Epoch: 8   Global Step: 102990   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:37:42,256-Speed 3250.76 samples/sec   Loss 4.5993   LearningRate 0.0343   Epoch: 8   Global Step: 103000   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:37:45,306-Speed 3358.33 samples/sec   Loss 4.6234   LearningRate 0.0343   Epoch: 8   Global Step: 103010   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:37:48,403-Speed 3307.43 samples/sec   Loss 4.6321   LearningRate 0.0343   Epoch: 8   Global Step: 103020   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:37:51,581-Speed 3223.38 samples/sec   Loss 4.4934   LearningRate 0.0343   Epoch: 8   Global Step: 103030   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:37:54,719-Speed 3265.11 samples/sec   Loss 4.5510   LearningRate 0.0342   Epoch: 8   Global Step: 103040   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:37:57,843-Speed 3278.20 samples/sec   Loss 4.6291   LearningRate 0.0342   Epoch: 8   Global Step: 103050   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 10:38:00,932-Speed 3316.41 samples/sec   Loss 4.5894   LearningRate 0.0342   Epoch: 8   Global Step: 103060   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 10:38:04,020-Speed 3316.26 samples/sec   Loss 4.5306   LearningRate 0.0342   Epoch: 8   Global Step: 103070   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:38:07,155-Speed 3268.22 samples/sec   Loss 4.5819   LearningRate 0.0342   Epoch: 8   Global Step: 103080   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:38:10,204-Speed 3358.94 samples/sec   Loss 4.6445   LearningRate 0.0342   Epoch: 8   Global Step: 103090   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:38:13,270-Speed 3341.37 samples/sec   Loss 4.6628   LearningRate 0.0342   Epoch: 8   Global Step: 103100   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:38:16,395-Speed 3277.80 samples/sec   Loss 4.6946   LearningRate 0.0342   Epoch: 8   Global Step: 103110   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:38:19,540-Speed 3256.83 samples/sec   Loss 4.5976   LearningRate 0.0342   Epoch: 8   Global Step: 103120   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:38:22,605-Speed 3341.95 samples/sec   Loss 4.6095   LearningRate 0.0342   Epoch: 8   Global Step: 103130   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:38:25,744-Speed 3263.73 samples/sec   Loss 4.6328   LearningRate 0.0342   Epoch: 8   Global Step: 103140   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:38:28,828-Speed 3321.15 samples/sec   Loss 4.6141   LearningRate 0.0342   Epoch: 8   Global Step: 103150   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:38:31,963-Speed 3267.38 samples/sec   Loss 4.6852   LearningRate 0.0342   Epoch: 8   Global Step: 103160   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:38:35,038-Speed 3330.40 samples/sec   Loss 4.7425   LearningRate 0.0342   Epoch: 8   Global Step: 103170   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 10:38:38,115-Speed 3328.86 samples/sec   Loss 4.6568   LearningRate 0.0342   Epoch: 8   Global Step: 103180   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 10:38:41,254-Speed 3263.52 samples/sec   Loss 4.6022   LearningRate 0.0342   Epoch: 8   Global Step: 103190   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:38:44,352-Speed 3306.70 samples/sec   Loss 4.6089   LearningRate 0.0342   Epoch: 8   Global Step: 103200   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:38:47,456-Speed 3299.93 samples/sec   Loss 4.6360   LearningRate 0.0342   Epoch: 8   Global Step: 103210   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:38:50,548-Speed 3312.77 samples/sec   Loss 4.6309   LearningRate 0.0342   Epoch: 8   Global Step: 103220   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:38:53,638-Speed 3314.51 samples/sec   Loss 4.7416   LearningRate 0.0342   Epoch: 8   Global Step: 103230   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:38:56,720-Speed 3323.82 samples/sec   Loss 4.6063   LearningRate 0.0342   Epoch: 8   Global Step: 103240   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:38:59,824-Speed 3300.42 samples/sec   Loss 4.6465   LearningRate 0.0341   Epoch: 8   Global Step: 103250   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:39:02,990-Speed 3234.54 samples/sec   Loss 4.6552   LearningRate 0.0341   Epoch: 8   Global Step: 103260   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:39:06,169-Speed 3223.49 samples/sec   Loss 4.6063   LearningRate 0.0341   Epoch: 8   Global Step: 103270   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:39:09,235-Speed 3340.76 samples/sec   Loss 4.5641   LearningRate 0.0341   Epoch: 8   Global Step: 103280   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:39:12,363-Speed 3274.18 samples/sec   Loss 4.6278   LearningRate 0.0341   Epoch: 8   Global Step: 103290   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 10:39:15,492-Speed 3273.77 samples/sec   Loss 4.6141   LearningRate 0.0341   Epoch: 8   Global Step: 103300   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:39:18,645-Speed 3249.21 samples/sec   Loss 4.7499   LearningRate 0.0341   Epoch: 8   Global Step: 103310   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:39:21,696-Speed 3357.81 samples/sec   Loss 4.6273   LearningRate 0.0341   Epoch: 8   Global Step: 103320   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:39:24,881-Speed 3215.86 samples/sec   Loss 4.5904   LearningRate 0.0341   Epoch: 8   Global Step: 103330   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:39:28,012-Speed 3270.85 samples/sec   Loss 4.6815   LearningRate 0.0341   Epoch: 8   Global Step: 103340   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:39:31,212-Speed 3201.76 samples/sec   Loss 4.5592   LearningRate 0.0341   Epoch: 8   Global Step: 103350   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:39:34,283-Speed 3335.04 samples/sec   Loss 4.5940   LearningRate 0.0341   Epoch: 8   Global Step: 103360   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:39:37,417-Speed 3268.43 samples/sec   Loss 4.6396   LearningRate 0.0341   Epoch: 8   Global Step: 103370   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:39:40,509-Speed 3313.31 samples/sec   Loss 4.7269   LearningRate 0.0341   Epoch: 8   Global Step: 103380   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:39:43,643-Speed 3267.98 samples/sec   Loss 4.6621   LearningRate 0.0341   Epoch: 8   Global Step: 103390   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:39:46,724-Speed 3325.60 samples/sec   Loss 4.6604   LearningRate 0.0341   Epoch: 8   Global Step: 103400   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 10:39:49,812-Speed 3316.30 samples/sec   Loss 4.6590   LearningRate 0.0341   Epoch: 8   Global Step: 103410   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 10:39:52,923-Speed 3292.85 samples/sec   Loss 4.6391   LearningRate 0.0341   Epoch: 8   Global Step: 103420   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:39:56,014-Speed 3314.41 samples/sec   Loss 4.7823   LearningRate 0.0341   Epoch: 8   Global Step: 103430   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:39:59,082-Speed 3338.50 samples/sec   Loss 4.5466   LearningRate 0.0341   Epoch: 8   Global Step: 103440   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:40:02,145-Speed 3344.59 samples/sec   Loss 4.6998   LearningRate 0.0341   Epoch: 8   Global Step: 103450   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:40:05,282-Speed 3264.93 samples/sec   Loss 4.7091   LearningRate 0.0341   Epoch: 8   Global Step: 103460   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:40:08,342-Speed 3348.17 samples/sec   Loss 4.6479   LearningRate 0.0340   Epoch: 8   Global Step: 103470   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:40:11,402-Speed 3346.73 samples/sec   Loss 4.6166   LearningRate 0.0340   Epoch: 8   Global Step: 103480   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-27 10:40:14,553-Speed 3251.25 samples/sec   Loss 4.6764   LearningRate 0.0340   Epoch: 8   Global Step: 103490   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-27 10:40:17,721-Speed 3233.07 samples/sec   Loss 4.7779   LearningRate 0.0340   Epoch: 8   Global Step: 103500   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-27 10:40:20,837-Speed 3286.84 samples/sec   Loss 4.7347   LearningRate 0.0340   Epoch: 8   Global Step: 103510   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-27 10:40:23,916-Speed 3327.41 samples/sec   Loss 4.6342   LearningRate 0.0340   Epoch: 8   Global Step: 103520   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-27 10:40:27,002-Speed 3319.06 samples/sec   Loss 4.6675   LearningRate 0.0340   Epoch: 8   Global Step: 103530   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-27 10:40:30,210-Speed 3192.84 samples/sec   Loss 4.5405   LearningRate 0.0340   Epoch: 8   Global Step: 103540   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-27 10:40:33,311-Speed 3303.93 samples/sec   Loss 4.6184   LearningRate 0.0340   Epoch: 8   Global Step: 103550   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-27 10:40:36,390-Speed 3326.24 samples/sec   Loss 4.6401   LearningRate 0.0340   Epoch: 8   Global Step: 103560   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-27 10:40:39,525-Speed 3267.37 samples/sec   Loss 4.7023   LearningRate 0.0340   Epoch: 8   Global Step: 103570   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-04-27 10:40:42,689-Speed 3237.56 samples/sec   Loss 4.6492   LearningRate 0.0340   Epoch: 8   Global Step: 103580   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:40:45,742-Speed 3355.08 samples/sec   Loss 4.5446   LearningRate 0.0340   Epoch: 8   Global Step: 103590   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:40:48,871-Speed 3274.23 samples/sec   Loss 4.5393   LearningRate 0.0340   Epoch: 8   Global Step: 103600   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:40:52,070-Speed 3201.62 samples/sec   Loss 4.7183   LearningRate 0.0340   Epoch: 8   Global Step: 103610   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:40:55,198-Speed 3274.60 samples/sec   Loss 4.5902   LearningRate 0.0340   Epoch: 8   Global Step: 103620   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:40:58,290-Speed 3313.37 samples/sec   Loss 4.6535   LearningRate 0.0340   Epoch: 8   Global Step: 103630   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:41:01,385-Speed 3309.69 samples/sec   Loss 4.6611   LearningRate 0.0340   Epoch: 8   Global Step: 103640   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:41:04,469-Speed 3321.09 samples/sec   Loss 4.7352   LearningRate 0.0340   Epoch: 8   Global Step: 103650   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:41:07,639-Speed 3231.51 samples/sec   Loss 4.6954   LearningRate 0.0340   Epoch: 8   Global Step: 103660   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:41:10,719-Speed 3325.62 samples/sec   Loss 4.6521   LearningRate 0.0340   Epoch: 8   Global Step: 103670   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:41:13,880-Speed 3240.24 samples/sec   Loss 4.7244   LearningRate 0.0339   Epoch: 8   Global Step: 103680   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:41:16,958-Speed 3328.27 samples/sec   Loss 4.6300   LearningRate 0.0339   Epoch: 8   Global Step: 103690   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:41:20,103-Speed 3257.06 samples/sec   Loss 4.6252   LearningRate 0.0339   Epoch: 8   Global Step: 103700   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:41:23,232-Speed 3273.95 samples/sec   Loss 4.6848   LearningRate 0.0339   Epoch: 8   Global Step: 103710   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:41:26,355-Speed 3280.20 samples/sec   Loss 4.6633   LearningRate 0.0339   Epoch: 8   Global Step: 103720   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:41:29,472-Speed 3285.77 samples/sec   Loss 4.6397   LearningRate 0.0339   Epoch: 8   Global Step: 103730   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:41:32,583-Speed 3292.36 samples/sec   Loss 4.6248   LearningRate 0.0339   Epoch: 8   Global Step: 103740   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:41:35,686-Speed 3306.89 samples/sec   Loss 4.7233   LearningRate 0.0339   Epoch: 8   Global Step: 103750   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:41:38,848-Speed 3238.79 samples/sec   Loss 4.6778   LearningRate 0.0339   Epoch: 8   Global Step: 103760   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:41:42,006-Speed 3243.66 samples/sec   Loss 4.7101   LearningRate 0.0339   Epoch: 8   Global Step: 103770   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:41:45,814-Speed 2689.73 samples/sec   Loss 4.7342   LearningRate 0.0339   Epoch: 8   Global Step: 103780   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:41:48,959-Speed 3256.91 samples/sec   Loss 4.6651   LearningRate 0.0339   Epoch: 8   Global Step: 103790   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:41:52,142-Speed 3218.48 samples/sec   Loss 4.7795   LearningRate 0.0339   Epoch: 8   Global Step: 103800   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:41:55,251-Speed 3295.04 samples/sec   Loss 4.7637   LearningRate 0.0339   Epoch: 8   Global Step: 103810   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:41:58,346-Speed 3308.80 samples/sec   Loss 4.6943   LearningRate 0.0339   Epoch: 8   Global Step: 103820   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:42:01,507-Speed 3240.82 samples/sec   Loss 4.6744   LearningRate 0.0339   Epoch: 8   Global Step: 103830   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:42:04,582-Speed 3331.00 samples/sec   Loss 4.7194   LearningRate 0.0339   Epoch: 8   Global Step: 103840   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:42:07,761-Speed 3222.48 samples/sec   Loss 4.7296   LearningRate 0.0339   Epoch: 8   Global Step: 103850   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:42:10,845-Speed 3321.30 samples/sec   Loss 4.6932   LearningRate 0.0339   Epoch: 8   Global Step: 103860   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:42:13,941-Speed 3308.48 samples/sec   Loss 4.6594   LearningRate 0.0339   Epoch: 8   Global Step: 103870   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:42:17,033-Speed 3312.74 samples/sec   Loss 4.7419   LearningRate 0.0339   Epoch: 8   Global Step: 103880   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:42:20,095-Speed 3345.09 samples/sec   Loss 4.7079   LearningRate 0.0338   Epoch: 8   Global Step: 103890   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:42:23,175-Speed 3326.43 samples/sec   Loss 4.6337   LearningRate 0.0338   Epoch: 8   Global Step: 103900   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:42:26,317-Speed 3259.44 samples/sec   Loss 4.5966   LearningRate 0.0338   Epoch: 8   Global Step: 103910   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:42:29,457-Speed 3261.86 samples/sec   Loss 4.7009   LearningRate 0.0338   Epoch: 8   Global Step: 103920   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:42:32,539-Speed 3323.89 samples/sec   Loss 4.6810   LearningRate 0.0338   Epoch: 8   Global Step: 103930   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:42:35,604-Speed 3341.56 samples/sec   Loss 4.7269   LearningRate 0.0338   Epoch: 8   Global Step: 103940   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:42:38,680-Speed 3330.64 samples/sec   Loss 4.6540   LearningRate 0.0338   Epoch: 8   Global Step: 103950   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:42:41,763-Speed 3322.28 samples/sec   Loss 4.7984   LearningRate 0.0338   Epoch: 8   Global Step: 103960   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:42:44,826-Speed 3344.58 samples/sec   Loss 4.7024   LearningRate 0.0338   Epoch: 8   Global Step: 103970   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:42:47,960-Speed 3268.09 samples/sec   Loss 4.7230   LearningRate 0.0338   Epoch: 8   Global Step: 103980   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:42:51,061-Speed 3303.01 samples/sec   Loss 4.7333   LearningRate 0.0338   Epoch: 8   Global Step: 103990   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:42:54,152-Speed 3313.74 samples/sec   Loss 4.7268   LearningRate 0.0338   Epoch: 8   Global Step: 104000   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:42:57,260-Speed 3295.97 samples/sec   Loss 4.6654   LearningRate 0.0338   Epoch: 8   Global Step: 104010   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:43:00,342-Speed 3322.87 samples/sec   Loss 4.7141   LearningRate 0.0338   Epoch: 8   Global Step: 104020   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:43:03,469-Speed 3275.54 samples/sec   Loss 4.6752   LearningRate 0.0338   Epoch: 8   Global Step: 104030   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:43:06,592-Speed 3280.18 samples/sec   Loss 4.6798   LearningRate 0.0338   Epoch: 8   Global Step: 104040   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:43:09,668-Speed 3329.97 samples/sec   Loss 4.7006   LearningRate 0.0338   Epoch: 8   Global Step: 104050   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:43:12,820-Speed 3250.09 samples/sec   Loss 4.7749   LearningRate 0.0338   Epoch: 8   Global Step: 104060   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:43:15,910-Speed 3315.27 samples/sec   Loss 4.6762   LearningRate 0.0338   Epoch: 8   Global Step: 104070   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:43:19,036-Speed 3276.34 samples/sec   Loss 4.7109   LearningRate 0.0338   Epoch: 8   Global Step: 104080   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 10:43:22,108-Speed 3334.28 samples/sec   Loss 4.7013   LearningRate 0.0338   Epoch: 8   Global Step: 104090   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 10:43:25,194-Speed 3319.98 samples/sec   Loss 4.7201   LearningRate 0.0338   Epoch: 8   Global Step: 104100   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 10:43:28,303-Speed 3294.15 samples/sec   Loss 4.7102   LearningRate 0.0337   Epoch: 8   Global Step: 104110   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 10:43:31,390-Speed 3318.43 samples/sec   Loss 4.6498   LearningRate 0.0337   Epoch: 8   Global Step: 104120   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 10:43:34,455-Speed 3341.72 samples/sec   Loss 4.6545   LearningRate 0.0337   Epoch: 8   Global Step: 104130   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 10:43:37,563-Speed 3295.00 samples/sec   Loss 4.6988   LearningRate 0.0337   Epoch: 8   Global Step: 104140   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:43:40,709-Speed 3256.18 samples/sec   Loss 4.6342   LearningRate 0.0337   Epoch: 8   Global Step: 104150   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:43:43,812-Speed 3301.45 samples/sec   Loss 4.7320   LearningRate 0.0337   Epoch: 8   Global Step: 104160   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:43:46,880-Speed 3338.69 samples/sec   Loss 4.6512   LearningRate 0.0337   Epoch: 8   Global Step: 104170   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:43:49,960-Speed 3325.68 samples/sec   Loss 4.7622   LearningRate 0.0337   Epoch: 8   Global Step: 104180   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:43:53,038-Speed 3327.59 samples/sec   Loss 4.6693   LearningRate 0.0337   Epoch: 8   Global Step: 104190   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:43:56,146-Speed 3296.41 samples/sec   Loss 4.6054   LearningRate 0.0337   Epoch: 8   Global Step: 104200   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:43:59,234-Speed 3316.83 samples/sec   Loss 4.5860   LearningRate 0.0337   Epoch: 8   Global Step: 104210   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:44:02,341-Speed 3296.60 samples/sec   Loss 4.5930   LearningRate 0.0337   Epoch: 8   Global Step: 104220   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:44:05,510-Speed 3232.09 samples/sec   Loss 4.7540   LearningRate 0.0337   Epoch: 8   Global Step: 104230   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:44:08,619-Speed 3295.21 samples/sec   Loss 4.7260   LearningRate 0.0337   Epoch: 8   Global Step: 104240   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 10:44:11,703-Speed 3322.12 samples/sec   Loss 4.7150   LearningRate 0.0337   Epoch: 8   Global Step: 104250   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 10:44:14,795-Speed 3312.15 samples/sec   Loss 4.7436   LearningRate 0.0337   Epoch: 8   Global Step: 104260   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 10:44:17,894-Speed 3305.98 samples/sec   Loss 4.7123   LearningRate 0.0337   Epoch: 8   Global Step: 104270   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 10:44:20,988-Speed 3310.78 samples/sec   Loss 4.7423   LearningRate 0.0337   Epoch: 8   Global Step: 104280   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 10:44:24,058-Speed 3336.65 samples/sec   Loss 4.6466   LearningRate 0.0337   Epoch: 8   Global Step: 104290   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 10:44:27,182-Speed 3278.67 samples/sec   Loss 4.6750   LearningRate 0.0337   Epoch: 8   Global Step: 104300   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 10:44:30,354-Speed 3229.09 samples/sec   Loss 4.6964   LearningRate 0.0337   Epoch: 8   Global Step: 104310   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 10:44:33,444-Speed 3315.14 samples/sec   Loss 4.7234   LearningRate 0.0336   Epoch: 8   Global Step: 104320   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:44:36,615-Speed 3230.33 samples/sec   Loss 4.7154   LearningRate 0.0336   Epoch: 8   Global Step: 104330   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:44:39,716-Speed 3302.50 samples/sec   Loss 4.7099   LearningRate 0.0336   Epoch: 8   Global Step: 104340   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:44:42,771-Speed 3352.80 samples/sec   Loss 4.7182   LearningRate 0.0336   Epoch: 8   Global Step: 104350   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:44:45,855-Speed 3321.64 samples/sec   Loss 4.7380   LearningRate 0.0336   Epoch: 8   Global Step: 104360   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:44:49,013-Speed 3244.22 samples/sec   Loss 4.7833   LearningRate 0.0336   Epoch: 8   Global Step: 104370   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:44:52,169-Speed 3244.50 samples/sec   Loss 4.7037   LearningRate 0.0336   Epoch: 8   Global Step: 104380   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:44:55,329-Speed 3241.96 samples/sec   Loss 4.7299   LearningRate 0.0336   Epoch: 8   Global Step: 104390   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:44:58,392-Speed 3344.28 samples/sec   Loss 4.7947   LearningRate 0.0336   Epoch: 8   Global Step: 104400   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:45:01,507-Speed 3288.37 samples/sec   Loss 4.8041   LearningRate 0.0336   Epoch: 8   Global Step: 104410   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:45:04,668-Speed 3240.41 samples/sec   Loss 4.7458   LearningRate 0.0336   Epoch: 8   Global Step: 104420   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:45:07,860-Speed 3208.67 samples/sec   Loss 4.7263   LearningRate 0.0336   Epoch: 8   Global Step: 104430   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:45:10,986-Speed 3277.11 samples/sec   Loss 4.7609   LearningRate 0.0336   Epoch: 8   Global Step: 104440   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:45:14,211-Speed 3176.22 samples/sec   Loss 4.7268   LearningRate 0.0336   Epoch: 8   Global Step: 104450   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:45:17,386-Speed 3226.80 samples/sec   Loss 4.7085   LearningRate 0.0336   Epoch: 8   Global Step: 104460   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:45:20,553-Speed 3233.95 samples/sec   Loss 4.7346   LearningRate 0.0336   Epoch: 8   Global Step: 104470   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:45:23,653-Speed 3304.35 samples/sec   Loss 4.7237   LearningRate 0.0336   Epoch: 8   Global Step: 104480   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:45:26,787-Speed 3268.31 samples/sec   Loss 4.6661   LearningRate 0.0336   Epoch: 8   Global Step: 104490   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:45:29,873-Speed 3319.64 samples/sec   Loss 4.7742   LearningRate 0.0336   Epoch: 8   Global Step: 104500   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:45:32,959-Speed 3318.95 samples/sec   Loss 4.7061   LearningRate 0.0336   Epoch: 8   Global Step: 104510   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:45:36,164-Speed 3196.33 samples/sec   Loss 4.7381   LearningRate 0.0336   Epoch: 8   Global Step: 104520   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:45:39,264-Speed 3303.52 samples/sec   Loss 4.7155   LearningRate 0.0335   Epoch: 8   Global Step: 104530   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:45:42,322-Speed 3350.62 samples/sec   Loss 4.7431   LearningRate 0.0335   Epoch: 8   Global Step: 104540   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:45:45,384-Speed 3345.06 samples/sec   Loss 4.7420   LearningRate 0.0335   Epoch: 8   Global Step: 104550   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:45:48,559-Speed 3225.72 samples/sec   Loss 4.7741   LearningRate 0.0335   Epoch: 8   Global Step: 104560   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:45:51,688-Speed 3273.65 samples/sec   Loss 4.7533   LearningRate 0.0335   Epoch: 8   Global Step: 104570   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:45:54,789-Speed 3302.92 samples/sec   Loss 4.7233   LearningRate 0.0335   Epoch: 8   Global Step: 104580   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:45:57,886-Speed 3308.28 samples/sec   Loss 4.7895   LearningRate 0.0335   Epoch: 8   Global Step: 104590   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:46:01,002-Speed 3287.66 samples/sec   Loss 4.6707   LearningRate 0.0335   Epoch: 8   Global Step: 104600   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:46:04,134-Speed 3269.91 samples/sec   Loss 4.6198   LearningRate 0.0335   Epoch: 8   Global Step: 104610   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:46:07,247-Speed 3290.95 samples/sec   Loss 4.6690   LearningRate 0.0335   Epoch: 8   Global Step: 104620   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:46:10,337-Speed 3314.25 samples/sec   Loss 4.7410   LearningRate 0.0335   Epoch: 8   Global Step: 104630   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:46:13,427-Speed 3315.56 samples/sec   Loss 4.7670   LearningRate 0.0335   Epoch: 8   Global Step: 104640   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:46:16,511-Speed 3321.39 samples/sec   Loss 4.8185   LearningRate 0.0335   Epoch: 8   Global Step: 104650   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 10:46:19,627-Speed 3287.06 samples/sec   Loss 4.7573   LearningRate 0.0335   Epoch: 8   Global Step: 104660   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 10:46:22,730-Speed 3301.03 samples/sec   Loss 4.7011   LearningRate 0.0335   Epoch: 8   Global Step: 104670   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 10:46:25,892-Speed 3239.20 samples/sec   Loss 4.6925   LearningRate 0.0335   Epoch: 8   Global Step: 104680   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:46:28,991-Speed 3305.52 samples/sec   Loss 4.7832   LearningRate 0.0335   Epoch: 8   Global Step: 104690   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:46:32,069-Speed 3327.64 samples/sec   Loss 4.7274   LearningRate 0.0335   Epoch: 8   Global Step: 104700   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:46:35,163-Speed 3310.84 samples/sec   Loss 4.7418   LearningRate 0.0335   Epoch: 8   Global Step: 104710   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:46:38,246-Speed 3322.53 samples/sec   Loss 4.8923   LearningRate 0.0335   Epoch: 8   Global Step: 104720   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:46:41,361-Speed 3287.57 samples/sec   Loss 4.7834   LearningRate 0.0335   Epoch: 8   Global Step: 104730   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:46:44,416-Speed 3353.49 samples/sec   Loss 4.7430   LearningRate 0.0335   Epoch: 8   Global Step: 104740   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:46:47,506-Speed 3314.71 samples/sec   Loss 4.8115   LearningRate 0.0334   Epoch: 8   Global Step: 104750   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:46:50,644-Speed 3265.00 samples/sec   Loss 4.7088   LearningRate 0.0334   Epoch: 8   Global Step: 104760   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:46:53,717-Speed 3333.34 samples/sec   Loss 4.7315   LearningRate 0.0334   Epoch: 8   Global Step: 104770   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:46:56,772-Speed 3352.87 samples/sec   Loss 4.8313   LearningRate 0.0334   Epoch: 8   Global Step: 104780   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 10:46:59,856-Speed 3320.83 samples/sec   Loss 4.7124   LearningRate 0.0334   Epoch: 8   Global Step: 104790   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 10:47:02,983-Speed 3275.83 samples/sec   Loss 4.7103   LearningRate 0.0334   Epoch: 8   Global Step: 104800   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:47:06,120-Speed 3265.77 samples/sec   Loss 4.6600   LearningRate 0.0334   Epoch: 8   Global Step: 104810   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:47:09,230-Speed 3293.51 samples/sec   Loss 4.7048   LearningRate 0.0334   Epoch: 8   Global Step: 104820   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:47:12,293-Speed 3344.50 samples/sec   Loss 4.7709   LearningRate 0.0334   Epoch: 8   Global Step: 104830   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:47:15,412-Speed 3284.14 samples/sec   Loss 4.7651   LearningRate 0.0334   Epoch: 8   Global Step: 104840   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:47:18,494-Speed 3322.63 samples/sec   Loss 4.7940   LearningRate 0.0334   Epoch: 8   Global Step: 104850   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:47:21,621-Speed 3276.01 samples/sec   Loss 4.8275   LearningRate 0.0334   Epoch: 8   Global Step: 104860   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:47:24,779-Speed 3243.10 samples/sec   Loss 4.6991   LearningRate 0.0334   Epoch: 8   Global Step: 104870   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:47:27,946-Speed 3234.07 samples/sec   Loss 4.7090   LearningRate 0.0334   Epoch: 8   Global Step: 104880   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:47:31,145-Speed 3203.00 samples/sec   Loss 4.8091   LearningRate 0.0334   Epoch: 8   Global Step: 104890   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:47:34,370-Speed 3175.99 samples/sec   Loss 4.7425   LearningRate 0.0334   Epoch: 8   Global Step: 104900   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:47:37,508-Speed 3263.89 samples/sec   Loss 4.7574   LearningRate 0.0334   Epoch: 8   Global Step: 104910   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:47:40,641-Speed 3269.19 samples/sec   Loss 4.8816   LearningRate 0.0334   Epoch: 8   Global Step: 104920   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:47:43,711-Speed 3336.76 samples/sec   Loss 4.7588   LearningRate 0.0334   Epoch: 8   Global Step: 104930   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:47:46,783-Speed 3334.82 samples/sec   Loss 4.7807   LearningRate 0.0334   Epoch: 8   Global Step: 104940   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:47:49,908-Speed 3276.92 samples/sec   Loss 4.7136   LearningRate 0.0334   Epoch: 8   Global Step: 104950   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:47:53,046-Speed 3265.23 samples/sec   Loss 4.6302   LearningRate 0.0333   Epoch: 8   Global Step: 104960   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:47:56,131-Speed 3320.04 samples/sec   Loss 4.7783   LearningRate 0.0333   Epoch: 8   Global Step: 104970   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:47:59,211-Speed 3325.29 samples/sec   Loss 4.6934   LearningRate 0.0333   Epoch: 8   Global Step: 104980   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:48:02,342-Speed 3271.66 samples/sec   Loss 4.7024   LearningRate 0.0333   Epoch: 8   Global Step: 104990   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:48:05,524-Speed 3219.12 samples/sec   Loss 4.8311   LearningRate 0.0333   Epoch: 8   Global Step: 105000   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:48:08,581-Speed 3350.86 samples/sec   Loss 4.7594   LearningRate 0.0333   Epoch: 8   Global Step: 105010   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:48:11,674-Speed 3312.06 samples/sec   Loss 4.8230   LearningRate 0.0333   Epoch: 8   Global Step: 105020   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:48:14,765-Speed 3313.68 samples/sec   Loss 4.6642   LearningRate 0.0333   Epoch: 8   Global Step: 105030   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:48:17,885-Speed 3282.66 samples/sec   Loss 4.6767   LearningRate 0.0333   Epoch: 8   Global Step: 105040   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:48:21,027-Speed 3260.32 samples/sec   Loss 4.7872   LearningRate 0.0333   Epoch: 8   Global Step: 105050   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:48:24,102-Speed 3330.71 samples/sec   Loss 4.6630   LearningRate 0.0333   Epoch: 8   Global Step: 105060   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:48:27,230-Speed 3274.52 samples/sec   Loss 4.7033   LearningRate 0.0333   Epoch: 8   Global Step: 105070   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:48:30,390-Speed 3241.93 samples/sec   Loss 4.8720   LearningRate 0.0333   Epoch: 8   Global Step: 105080   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:48:33,475-Speed 3319.87 samples/sec   Loss 4.7758   LearningRate 0.0333   Epoch: 8   Global Step: 105090   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:48:36,638-Speed 3239.28 samples/sec   Loss 4.6204   LearningRate 0.0333   Epoch: 8   Global Step: 105100   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:48:39,789-Speed 3250.83 samples/sec   Loss 4.7878   LearningRate 0.0333   Epoch: 8   Global Step: 105110   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:48:42,857-Speed 3338.13 samples/sec   Loss 4.8174   LearningRate 0.0333   Epoch: 8   Global Step: 105120   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:48:45,914-Speed 3351.69 samples/sec   Loss 4.7610   LearningRate 0.0333   Epoch: 8   Global Step: 105130   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:48:49,104-Speed 3210.62 samples/sec   Loss 4.8452   LearningRate 0.0333   Epoch: 8   Global Step: 105140   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:48:52,224-Speed 3283.72 samples/sec   Loss 4.7876   LearningRate 0.0333   Epoch: 8   Global Step: 105150   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:48:55,282-Speed 3349.44 samples/sec   Loss 4.6869   LearningRate 0.0333   Epoch: 8   Global Step: 105160   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:48:58,331-Speed 3359.05 samples/sec   Loss 4.8600   LearningRate 0.0333   Epoch: 8   Global Step: 105170   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:49:01,413-Speed 3323.74 samples/sec   Loss 4.8673   LearningRate 0.0332   Epoch: 8   Global Step: 105180   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:49:04,561-Speed 3254.34 samples/sec   Loss 4.6991   LearningRate 0.0332   Epoch: 8   Global Step: 105190   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:49:07,624-Speed 3344.23 samples/sec   Loss 4.7507   LearningRate 0.0332   Epoch: 8   Global Step: 105200   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:49:10,743-Speed 3283.90 samples/sec   Loss 4.7860   LearningRate 0.0332   Epoch: 8   Global Step: 105210   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:49:13,834-Speed 3313.61 samples/sec   Loss 4.7407   LearningRate 0.0332   Epoch: 8   Global Step: 105220   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:49:16,952-Speed 3285.78 samples/sec   Loss 4.6603   LearningRate 0.0332   Epoch: 8   Global Step: 105230   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:49:20,036-Speed 3320.91 samples/sec   Loss 4.8616   LearningRate 0.0332   Epoch: 8   Global Step: 105240   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:49:23,113-Speed 3329.20 samples/sec   Loss 4.8392   LearningRate 0.0332   Epoch: 8   Global Step: 105250   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:49:26,167-Speed 3353.89 samples/sec   Loss 4.7092   LearningRate 0.0332   Epoch: 8   Global Step: 105260   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:49:29,221-Speed 3354.42 samples/sec   Loss 4.7253   LearningRate 0.0332   Epoch: 8   Global Step: 105270   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:49:32,299-Speed 3327.19 samples/sec   Loss 4.7709   LearningRate 0.0332   Epoch: 8   Global Step: 105280   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:49:35,352-Speed 3356.13 samples/sec   Loss 4.7435   LearningRate 0.0332   Epoch: 8   Global Step: 105290   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:49:38,424-Speed 3334.53 samples/sec   Loss 4.7190   LearningRate 0.0332   Epoch: 8   Global Step: 105300   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:49:41,531-Speed 3295.94 samples/sec   Loss 4.8083   LearningRate 0.0332   Epoch: 8   Global Step: 105310   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:49:44,617-Speed 3319.65 samples/sec   Loss 4.7579   LearningRate 0.0332   Epoch: 8   Global Step: 105320   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:49:47,737-Speed 3283.60 samples/sec   Loss 4.8252   LearningRate 0.0332   Epoch: 8   Global Step: 105330   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:49:50,832-Speed 3309.03 samples/sec   Loss 4.8090   LearningRate 0.0332   Epoch: 8   Global Step: 105340   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:49:53,904-Speed 3334.44 samples/sec   Loss 4.7178   LearningRate 0.0332   Epoch: 8   Global Step: 105350   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:49:56,981-Speed 3329.27 samples/sec   Loss 4.7653   LearningRate 0.0332   Epoch: 8   Global Step: 105360   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:50:00,038-Speed 3351.21 samples/sec   Loss 4.7773   LearningRate 0.0332   Epoch: 8   Global Step: 105370   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:50:03,117-Speed 3326.36 samples/sec   Loss 4.7413   LearningRate 0.0332   Epoch: 8   Global Step: 105380   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:50:06,225-Speed 3295.50 samples/sec   Loss 4.8454   LearningRate 0.0331   Epoch: 8   Global Step: 105390   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 10:50:09,268-Speed 3366.56 samples/sec   Loss 4.7611   LearningRate 0.0331   Epoch: 8   Global Step: 105400   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:50:12,481-Speed 3187.26 samples/sec   Loss 4.8121   LearningRate 0.0331   Epoch: 8   Global Step: 105410   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:50:15,666-Speed 3216.28 samples/sec   Loss 4.7008   LearningRate 0.0331   Epoch: 8   Global Step: 105420   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:50:18,911-Speed 3156.88 samples/sec   Loss 4.8150   LearningRate 0.0331   Epoch: 8   Global Step: 105430   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:50:21,964-Speed 3354.76 samples/sec   Loss 4.7284   LearningRate 0.0331   Epoch: 8   Global Step: 105440   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:50:25,054-Speed 3315.79 samples/sec   Loss 4.8403   LearningRate 0.0331   Epoch: 8   Global Step: 105450   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:50:28,141-Speed 3317.66 samples/sec   Loss 4.7737   LearningRate 0.0331   Epoch: 8   Global Step: 105460   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:50:31,260-Speed 3284.47 samples/sec   Loss 4.8006   LearningRate 0.0331   Epoch: 8   Global Step: 105470   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:50:34,315-Speed 3352.56 samples/sec   Loss 4.8910   LearningRate 0.0331   Epoch: 8   Global Step: 105480   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:50:37,403-Speed 3316.92 samples/sec   Loss 4.7654   LearningRate 0.0331   Epoch: 8   Global Step: 105490   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:50:40,495-Speed 3313.36 samples/sec   Loss 4.7760   LearningRate 0.0331   Epoch: 8   Global Step: 105500   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:50:43,545-Speed 3358.72 samples/sec   Loss 4.7644   LearningRate 0.0331   Epoch: 8   Global Step: 105510   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:50:46,683-Speed 3263.50 samples/sec   Loss 4.7437   LearningRate 0.0331   Epoch: 8   Global Step: 105520   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:50:49,854-Speed 3230.28 samples/sec   Loss 4.7202   LearningRate 0.0331   Epoch: 8   Global Step: 105530   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:50:52,973-Speed 3283.97 samples/sec   Loss 4.7140   LearningRate 0.0331   Epoch: 8   Global Step: 105540   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:50:56,148-Speed 3226.71 samples/sec   Loss 4.7077   LearningRate 0.0331   Epoch: 8   Global Step: 105550   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:50:59,200-Speed 3356.06 samples/sec   Loss 4.7843   LearningRate 0.0331   Epoch: 8   Global Step: 105560   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:51:02,333-Speed 3268.87 samples/sec   Loss 4.8938   LearningRate 0.0331   Epoch: 8   Global Step: 105570   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:51:05,460-Speed 3276.55 samples/sec   Loss 4.7814   LearningRate 0.0331   Epoch: 8   Global Step: 105580   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:51:08,582-Speed 3280.25 samples/sec   Loss 4.7207   LearningRate 0.0331   Epoch: 8   Global Step: 105590   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:51:11,685-Speed 3300.84 samples/sec   Loss 4.8849   LearningRate 0.0331   Epoch: 8   Global Step: 105600   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:51:14,770-Speed 3320.94 samples/sec   Loss 4.8406   LearningRate 0.0330   Epoch: 8   Global Step: 105610   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 10:51:17,838-Speed 3339.09 samples/sec   Loss 4.7168   LearningRate 0.0330   Epoch: 8   Global Step: 105620   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 10:51:20,889-Speed 3356.47 samples/sec   Loss 4.7588   LearningRate 0.0330   Epoch: 8   Global Step: 105630   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:51:24,006-Speed 3286.29 samples/sec   Loss 4.7737   LearningRate 0.0330   Epoch: 8   Global Step: 105640   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:51:27,209-Speed 3198.39 samples/sec   Loss 4.8687   LearningRate 0.0330   Epoch: 8   Global Step: 105650   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:51:30,324-Speed 3288.89 samples/sec   Loss 4.8462   LearningRate 0.0330   Epoch: 8   Global Step: 105660   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:51:33,391-Speed 3338.97 samples/sec   Loss 4.7764   LearningRate 0.0330   Epoch: 8   Global Step: 105670   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:51:36,511-Speed 3282.78 samples/sec   Loss 4.8479   LearningRate 0.0330   Epoch: 8   Global Step: 105680   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:51:39,597-Speed 3319.56 samples/sec   Loss 4.8630   LearningRate 0.0330   Epoch: 8   Global Step: 105690   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:51:42,718-Speed 3282.53 samples/sec   Loss 4.8491   LearningRate 0.0330   Epoch: 8   Global Step: 105700   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:51:45,782-Speed 3343.09 samples/sec   Loss 4.7066   LearningRate 0.0330   Epoch: 8   Global Step: 105710   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:51:48,869-Speed 3317.78 samples/sec   Loss 4.7783   LearningRate 0.0330   Epoch: 8   Global Step: 105720   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:51:51,924-Speed 3354.23 samples/sec   Loss 4.8782   LearningRate 0.0330   Epoch: 8   Global Step: 105730   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 10:51:54,989-Speed 3341.43 samples/sec   Loss 4.8698   LearningRate 0.0330   Epoch: 8   Global Step: 105740   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:51:58,049-Speed 3348.30 samples/sec   Loss 4.9494   LearningRate 0.0330   Epoch: 8   Global Step: 105750   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:52:01,130-Speed 3324.29 samples/sec   Loss 4.7985   LearningRate 0.0330   Epoch: 8   Global Step: 105760   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:52:04,204-Speed 3331.74 samples/sec   Loss 4.8049   LearningRate 0.0330   Epoch: 8   Global Step: 105770   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:52:07,258-Speed 3354.49 samples/sec   Loss 4.7977   LearningRate 0.0330   Epoch: 8   Global Step: 105780   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:52:10,340-Speed 3323.47 samples/sec   Loss 4.8246   LearningRate 0.0330   Epoch: 8   Global Step: 105790   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:52:13,490-Speed 3251.95 samples/sec   Loss 4.8126   LearningRate 0.0330   Epoch: 8   Global Step: 105800   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:52:16,659-Speed 3232.71 samples/sec   Loss 4.7642   LearningRate 0.0330   Epoch: 8   Global Step: 105810   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:52:19,716-Speed 3350.05 samples/sec   Loss 4.7742   LearningRate 0.0330   Epoch: 8   Global Step: 105820   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:52:22,782-Speed 3340.81 samples/sec   Loss 4.8657   LearningRate 0.0329   Epoch: 8   Global Step: 105830   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:52:25,908-Speed 3277.32 samples/sec   Loss 4.8771   LearningRate 0.0329   Epoch: 8   Global Step: 105840   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:52:29,071-Speed 3238.72 samples/sec   Loss 4.8456   LearningRate 0.0329   Epoch: 8   Global Step: 105850   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:52:32,170-Speed 3304.91 samples/sec   Loss 4.8459   LearningRate 0.0329   Epoch: 8   Global Step: 105860   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:52:35,241-Speed 3336.03 samples/sec   Loss 4.8931   LearningRate 0.0329   Epoch: 8   Global Step: 105870   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:52:38,328-Speed 3317.14 samples/sec   Loss 4.9690   LearningRate 0.0329   Epoch: 8   Global Step: 105880   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:52:41,406-Speed 3328.83 samples/sec   Loss 4.8932   LearningRate 0.0329   Epoch: 8   Global Step: 105890   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:52:44,468-Speed 3344.93 samples/sec   Loss 4.8377   LearningRate 0.0329   Epoch: 8   Global Step: 105900   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:52:47,533-Speed 3342.76 samples/sec   Loss 4.9214   LearningRate 0.0329   Epoch: 8   Global Step: 105910   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:52:50,595-Speed 3344.83 samples/sec   Loss 4.7995   LearningRate 0.0329   Epoch: 8   Global Step: 105920   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:52:53,671-Speed 3330.39 samples/sec   Loss 4.8949   LearningRate 0.0329   Epoch: 8   Global Step: 105930   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:52:56,737-Speed 3340.82 samples/sec   Loss 4.7976   LearningRate 0.0329   Epoch: 8   Global Step: 105940   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:52:59,814-Speed 3329.28 samples/sec   Loss 4.7893   LearningRate 0.0329   Epoch: 8   Global Step: 105950   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:53:02,928-Speed 3288.77 samples/sec   Loss 4.8005   LearningRate 0.0329   Epoch: 8   Global Step: 105960   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:53:06,013-Speed 3320.00 samples/sec   Loss 4.8835   LearningRate 0.0329   Epoch: 8   Global Step: 105970   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:53:09,090-Speed 3329.27 samples/sec   Loss 4.8095   LearningRate 0.0329   Epoch: 8   Global Step: 105980   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:53:12,172-Speed 3323.93 samples/sec   Loss 4.8425   LearningRate 0.0329   Epoch: 8   Global Step: 105990   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:53:15,305-Speed 3269.17 samples/sec   Loss 4.7856   LearningRate 0.0329   Epoch: 8   Global Step: 106000   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:53:18,417-Speed 3291.24 samples/sec   Loss 4.7714   LearningRate 0.0329   Epoch: 8   Global Step: 106010   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:53:21,486-Speed 3338.10 samples/sec   Loss 4.7656   LearningRate 0.0329   Epoch: 8   Global Step: 106020   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:53:24,631-Speed 3256.86 samples/sec   Loss 4.7583   LearningRate 0.0329   Epoch: 8   Global Step: 106030   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:53:27,703-Speed 3334.07 samples/sec   Loss 4.7427   LearningRate 0.0328   Epoch: 8   Global Step: 106040   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:53:30,830-Speed 3275.91 samples/sec   Loss 4.8180   LearningRate 0.0328   Epoch: 8   Global Step: 106050   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:53:33,967-Speed 3265.17 samples/sec   Loss 4.7648   LearningRate 0.0328   Epoch: 8   Global Step: 106060   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:53:37,080-Speed 3290.55 samples/sec   Loss 4.8795   LearningRate 0.0328   Epoch: 8   Global Step: 106070   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:53:40,245-Speed 3236.84 samples/sec   Loss 4.9245   LearningRate 0.0328   Epoch: 8   Global Step: 106080   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:53:43,369-Speed 3278.72 samples/sec   Loss 4.9360   LearningRate 0.0328   Epoch: 8   Global Step: 106090   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:53:46,432-Speed 3344.92 samples/sec   Loss 4.8000   LearningRate 0.0328   Epoch: 8   Global Step: 106100   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 10:53:49,548-Speed 3286.70 samples/sec   Loss 4.7651   LearningRate 0.0328   Epoch: 8   Global Step: 106110   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:53:52,758-Speed 3190.62 samples/sec   Loss 4.8605   LearningRate 0.0328   Epoch: 8   Global Step: 106120   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:53:55,863-Speed 3299.24 samples/sec   Loss 4.7843   LearningRate 0.0328   Epoch: 8   Global Step: 106130   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:53:58,981-Speed 3285.16 samples/sec   Loss 4.8086   LearningRate 0.0328   Epoch: 8   Global Step: 106140   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:54:02,082-Speed 3303.34 samples/sec   Loss 4.8501   LearningRate 0.0328   Epoch: 8   Global Step: 106150   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:54:05,154-Speed 3335.43 samples/sec   Loss 4.7818   LearningRate 0.0328   Epoch: 8   Global Step: 106160   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:54:08,204-Speed 3357.79 samples/sec   Loss 4.9124   LearningRate 0.0328   Epoch: 8   Global Step: 106170   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:54:11,302-Speed 3306.03 samples/sec   Loss 4.9020   LearningRate 0.0328   Epoch: 8   Global Step: 106180   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:54:14,397-Speed 3310.35 samples/sec   Loss 4.8806   LearningRate 0.0328   Epoch: 8   Global Step: 106190   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:54:17,524-Speed 3275.39 samples/sec   Loss 4.8565   LearningRate 0.0328   Epoch: 8   Global Step: 106200   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:54:20,607-Speed 3322.77 samples/sec   Loss 4.6871   LearningRate 0.0328   Epoch: 8   Global Step: 106210   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 10:54:23,700-Speed 3312.11 samples/sec   Loss 4.8540   LearningRate 0.0328   Epoch: 8   Global Step: 106220   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 10:54:26,798-Speed 3305.54 samples/sec   Loss 4.8057   LearningRate 0.0328   Epoch: 8   Global Step: 106230   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:54:29,876-Speed 3327.76 samples/sec   Loss 4.8091   LearningRate 0.0328   Epoch: 8   Global Step: 106240   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:54:32,970-Speed 3311.12 samples/sec   Loss 4.8888   LearningRate 0.0328   Epoch: 8   Global Step: 106250   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:54:36,090-Speed 3283.83 samples/sec   Loss 4.8282   LearningRate 0.0327   Epoch: 8   Global Step: 106260   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:54:39,163-Speed 3332.65 samples/sec   Loss 4.7770   LearningRate 0.0327   Epoch: 8   Global Step: 106270   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:54:42,349-Speed 3215.03 samples/sec   Loss 4.7948   LearningRate 0.0327   Epoch: 8   Global Step: 106280   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:54:45,431-Speed 3324.39 samples/sec   Loss 4.8189   LearningRate 0.0327   Epoch: 8   Global Step: 106290   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:54:48,588-Speed 3243.74 samples/sec   Loss 4.6973   LearningRate 0.0327   Epoch: 8   Global Step: 106300   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:54:51,694-Speed 3298.51 samples/sec   Loss 4.7824   LearningRate 0.0327   Epoch: 8   Global Step: 106310   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:54:54,827-Speed 3268.94 samples/sec   Loss 4.9043   LearningRate 0.0327   Epoch: 8   Global Step: 106320   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:54:57,902-Speed 3331.51 samples/sec   Loss 4.8665   LearningRate 0.0327   Epoch: 8   Global Step: 106330   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:55:00,954-Speed 3357.60 samples/sec   Loss 4.7920   LearningRate 0.0327   Epoch: 8   Global Step: 106340   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:55:04,091-Speed 3264.50 samples/sec   Loss 4.8696   LearningRate 0.0327   Epoch: 8   Global Step: 106350   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:55:07,252-Speed 3240.53 samples/sec   Loss 4.8448   LearningRate 0.0327   Epoch: 8   Global Step: 106360   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:55:10,317-Speed 3342.52 samples/sec   Loss 4.7975   LearningRate 0.0327   Epoch: 8   Global Step: 106370   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:55:13,391-Speed 3332.13 samples/sec   Loss 4.7741   LearningRate 0.0327   Epoch: 8   Global Step: 106380   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:55:16,511-Speed 3283.10 samples/sec   Loss 4.7806   LearningRate 0.0327   Epoch: 8   Global Step: 106390   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:55:19,655-Speed 3258.10 samples/sec   Loss 4.7931   LearningRate 0.0327   Epoch: 8   Global Step: 106400   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:55:22,726-Speed 3335.60 samples/sec   Loss 4.8262   LearningRate 0.0327   Epoch: 8   Global Step: 106410   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:55:25,796-Speed 3336.97 samples/sec   Loss 4.8325   LearningRate 0.0327   Epoch: 8   Global Step: 106420   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:55:28,912-Speed 3287.45 samples/sec   Loss 4.8299   LearningRate 0.0327   Epoch: 8   Global Step: 106430   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:55:32,030-Speed 3284.28 samples/sec   Loss 4.7746   LearningRate 0.0327   Epoch: 8   Global Step: 106440   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:55:35,150-Speed 3283.54 samples/sec   Loss 4.8029   LearningRate 0.0327   Epoch: 8   Global Step: 106450   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:55:38,252-Speed 3301.90 samples/sec   Loss 4.8406   LearningRate 0.0327   Epoch: 8   Global Step: 106460   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:55:41,412-Speed 3242.05 samples/sec   Loss 4.7305   LearningRate 0.0327   Epoch: 8   Global Step: 106470   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:55:44,466-Speed 3353.67 samples/sec   Loss 4.7035   LearningRate 0.0326   Epoch: 8   Global Step: 106480   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:55:47,577-Speed 3292.42 samples/sec   Loss 4.7789   LearningRate 0.0326   Epoch: 8   Global Step: 106490   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:55:50,634-Speed 3350.72 samples/sec   Loss 4.8198   LearningRate 0.0326   Epoch: 8   Global Step: 106500   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:55:53,749-Speed 3288.78 samples/sec   Loss 4.8841   LearningRate 0.0326   Epoch: 8   Global Step: 106510   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:55:56,847-Speed 3306.66 samples/sec   Loss 4.9072   LearningRate 0.0326   Epoch: 8   Global Step: 106520   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:55:59,980-Speed 3269.02 samples/sec   Loss 4.9244   LearningRate 0.0326   Epoch: 8   Global Step: 106530   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:56:03,145-Speed 3236.41 samples/sec   Loss 4.7711   LearningRate 0.0326   Epoch: 8   Global Step: 106540   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:56:06,315-Speed 3231.25 samples/sec   Loss 4.7675   LearningRate 0.0326   Epoch: 8   Global Step: 106550   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:56:09,426-Speed 3292.88 samples/sec   Loss 4.8622   LearningRate 0.0326   Epoch: 8   Global Step: 106560   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:56:12,506-Speed 3325.20 samples/sec   Loss 4.8474   LearningRate 0.0326   Epoch: 8   Global Step: 106570   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:56:15,605-Speed 3305.69 samples/sec   Loss 4.9255   LearningRate 0.0326   Epoch: 8   Global Step: 106580   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:56:18,699-Speed 3310.66 samples/sec   Loss 4.7894   LearningRate 0.0326   Epoch: 8   Global Step: 106590   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:56:21,758-Speed 3348.36 samples/sec   Loss 4.7611   LearningRate 0.0326   Epoch: 8   Global Step: 106600   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:56:24,872-Speed 3289.52 samples/sec   Loss 4.9594   LearningRate 0.0326   Epoch: 8   Global Step: 106610   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:56:27,974-Speed 3301.46 samples/sec   Loss 4.9121   LearningRate 0.0326   Epoch: 8   Global Step: 106620   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:56:31,123-Speed 3253.17 samples/sec   Loss 4.8450   LearningRate 0.0326   Epoch: 8   Global Step: 106630   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:56:34,203-Speed 3325.81 samples/sec   Loss 4.8468   LearningRate 0.0326   Epoch: 8   Global Step: 106640   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:56:37,322-Speed 3284.14 samples/sec   Loss 4.6997   LearningRate 0.0326   Epoch: 8   Global Step: 106650   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:56:40,383-Speed 3346.06 samples/sec   Loss 4.8231   LearningRate 0.0326   Epoch: 8   Global Step: 106660   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:56:43,480-Speed 3308.09 samples/sec   Loss 4.7079   LearningRate 0.0326   Epoch: 8   Global Step: 106670   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:56:46,560-Speed 3325.66 samples/sec   Loss 4.8262   LearningRate 0.0326   Epoch: 8   Global Step: 106680   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:56:49,653-Speed 3311.63 samples/sec   Loss 4.8390   LearningRate 0.0325   Epoch: 8   Global Step: 106690   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:56:52,760-Speed 3296.48 samples/sec   Loss 4.8330   LearningRate 0.0325   Epoch: 8   Global Step: 106700   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:56:55,848-Speed 3317.36 samples/sec   Loss 4.8898   LearningRate 0.0325   Epoch: 8   Global Step: 106710   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:56:58,943-Speed 3309.46 samples/sec   Loss 4.8143   LearningRate 0.0325   Epoch: 8   Global Step: 106720   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:57:02,060-Speed 3286.52 samples/sec   Loss 4.7698   LearningRate 0.0325   Epoch: 8   Global Step: 106730   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:57:05,145-Speed 3320.71 samples/sec   Loss 4.7654   LearningRate 0.0325   Epoch: 8   Global Step: 106740   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:57:08,291-Speed 3255.84 samples/sec   Loss 4.7617   LearningRate 0.0325   Epoch: 8   Global Step: 106750   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:57:11,402-Speed 3292.39 samples/sec   Loss 4.7814   LearningRate 0.0325   Epoch: 8   Global Step: 106760   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:57:14,508-Speed 3297.84 samples/sec   Loss 4.8154   LearningRate 0.0325   Epoch: 8   Global Step: 106770   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:57:17,660-Speed 3250.17 samples/sec   Loss 4.8252   LearningRate 0.0325   Epoch: 8   Global Step: 106780   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 10:57:20,731-Speed 3335.41 samples/sec   Loss 4.8761   LearningRate 0.0325   Epoch: 8   Global Step: 106790   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 10:57:23,820-Speed 3316.22 samples/sec   Loss 4.8446   LearningRate 0.0325   Epoch: 8   Global Step: 106800   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 10:57:26,947-Speed 3275.43 samples/sec   Loss 4.8290   LearningRate 0.0325   Epoch: 8   Global Step: 106810   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:57:30,137-Speed 3211.25 samples/sec   Loss 4.8296   LearningRate 0.0325   Epoch: 8   Global Step: 106820   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:57:33,219-Speed 3323.63 samples/sec   Loss 4.8747   LearningRate 0.0325   Epoch: 8   Global Step: 106830   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:57:36,364-Speed 3257.42 samples/sec   Loss 4.8401   LearningRate 0.0325   Epoch: 8   Global Step: 106840   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:57:39,532-Speed 3233.57 samples/sec   Loss 4.9261   LearningRate 0.0325   Epoch: 8   Global Step: 106850   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:57:42,615-Speed 3322.11 samples/sec   Loss 4.7870   LearningRate 0.0325   Epoch: 8   Global Step: 106860   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:57:45,691-Speed 3329.59 samples/sec   Loss 4.8261   LearningRate 0.0325   Epoch: 8   Global Step: 106870   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:57:48,775-Speed 3321.78 samples/sec   Loss 4.7404   LearningRate 0.0325   Epoch: 8   Global Step: 106880   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:57:51,900-Speed 3277.12 samples/sec   Loss 4.7702   LearningRate 0.0325   Epoch: 8   Global Step: 106890   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:57:55,065-Speed 3237.50 samples/sec   Loss 4.9020   LearningRate 0.0325   Epoch: 8   Global Step: 106900   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:57:58,112-Speed 3361.70 samples/sec   Loss 4.7854   LearningRate 0.0324   Epoch: 8   Global Step: 106910   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:58:01,212-Speed 3304.21 samples/sec   Loss 4.8262   LearningRate 0.0324   Epoch: 8   Global Step: 106920   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:58:04,315-Speed 3300.08 samples/sec   Loss 4.8014   LearningRate 0.0324   Epoch: 8   Global Step: 106930   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:58:07,447-Speed 3271.03 samples/sec   Loss 4.7938   LearningRate 0.0324   Epoch: 8   Global Step: 106940   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:58:10,536-Speed 3315.82 samples/sec   Loss 4.7565   LearningRate 0.0324   Epoch: 8   Global Step: 106950   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:58:13,623-Speed 3318.75 samples/sec   Loss 4.8191   LearningRate 0.0324   Epoch: 8   Global Step: 106960   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:58:16,751-Speed 3274.24 samples/sec   Loss 4.7797   LearningRate 0.0324   Epoch: 8   Global Step: 106970   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:58:19,837-Speed 3319.41 samples/sec   Loss 4.8657   LearningRate 0.0324   Epoch: 8   Global Step: 106980   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:58:22,964-Speed 3276.20 samples/sec   Loss 4.8462   LearningRate 0.0324   Epoch: 8   Global Step: 106990   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:58:26,116-Speed 3249.27 samples/sec   Loss 4.7261   LearningRate 0.0324   Epoch: 8   Global Step: 107000   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:58:29,219-Speed 3301.08 samples/sec   Loss 4.9493   LearningRate 0.0324   Epoch: 8   Global Step: 107010   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:58:32,323-Speed 3300.14 samples/sec   Loss 4.7657   LearningRate 0.0324   Epoch: 8   Global Step: 107020   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:58:35,419-Speed 3307.97 samples/sec   Loss 4.8538   LearningRate 0.0324   Epoch: 8   Global Step: 107030   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:58:38,506-Speed 3318.80 samples/sec   Loss 4.7991   LearningRate 0.0324   Epoch: 8   Global Step: 107040   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:58:41,658-Speed 3249.24 samples/sec   Loss 4.6994   LearningRate 0.0324   Epoch: 8   Global Step: 107050   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:58:44,756-Speed 3306.90 samples/sec   Loss 4.7720   LearningRate 0.0324   Epoch: 8   Global Step: 107060   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:58:47,899-Speed 3258.77 samples/sec   Loss 4.8552   LearningRate 0.0324   Epoch: 8   Global Step: 107070   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:58:50,988-Speed 3315.70 samples/sec   Loss 4.9093   LearningRate 0.0324   Epoch: 8   Global Step: 107080   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:58:54,091-Speed 3300.44 samples/sec   Loss 4.8095   LearningRate 0.0324   Epoch: 8   Global Step: 107090   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:58:57,171-Speed 3326.41 samples/sec   Loss 4.8527   LearningRate 0.0324   Epoch: 8   Global Step: 107100   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:59:00,277-Speed 3297.94 samples/sec   Loss 4.8346   LearningRate 0.0324   Epoch: 8   Global Step: 107110   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:59:03,351-Speed 3332.36 samples/sec   Loss 4.8781   LearningRate 0.0324   Epoch: 8   Global Step: 107120   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:59:06,409-Speed 3350.80 samples/sec   Loss 4.8335   LearningRate 0.0323   Epoch: 8   Global Step: 107130   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:59:09,463-Speed 3353.94 samples/sec   Loss 4.8290   LearningRate 0.0323   Epoch: 8   Global Step: 107140   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:59:12,528-Speed 3342.45 samples/sec   Loss 4.8763   LearningRate 0.0323   Epoch: 8   Global Step: 107150   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:59:15,676-Speed 3253.14 samples/sec   Loss 4.8631   LearningRate 0.0323   Epoch: 8   Global Step: 107160   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:59:18,750-Speed 3333.23 samples/sec   Loss 4.8997   LearningRate 0.0323   Epoch: 8   Global Step: 107170   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:59:21,810-Speed 3347.36 samples/sec   Loss 4.9055   LearningRate 0.0323   Epoch: 8   Global Step: 107180   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:59:24,944-Speed 3267.86 samples/sec   Loss 4.8684   LearningRate 0.0323   Epoch: 8   Global Step: 107190   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:59:28,014-Speed 3336.92 samples/sec   Loss 4.6725   LearningRate 0.0323   Epoch: 8   Global Step: 107200   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:59:31,102-Speed 3317.25 samples/sec   Loss 4.8659   LearningRate 0.0323   Epoch: 8   Global Step: 107210   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:59:34,270-Speed 3233.51 samples/sec   Loss 4.8271   LearningRate 0.0323   Epoch: 8   Global Step: 107220   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 10:59:37,374-Speed 3300.41 samples/sec   Loss 4.8638   LearningRate 0.0323   Epoch: 8   Global Step: 107230   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:59:40,464-Speed 3314.65 samples/sec   Loss 4.8902   LearningRate 0.0323   Epoch: 8   Global Step: 107240   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:59:43,536-Speed 3333.92 samples/sec   Loss 4.8445   LearningRate 0.0323   Epoch: 8   Global Step: 107250   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:59:46,615-Speed 3327.30 samples/sec   Loss 4.9319   LearningRate 0.0323   Epoch: 8   Global Step: 107260   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:59:49,757-Speed 3259.98 samples/sec   Loss 4.8343   LearningRate 0.0323   Epoch: 8   Global Step: 107270   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:59:52,937-Speed 3220.78 samples/sec   Loss 4.7934   LearningRate 0.0323   Epoch: 8   Global Step: 107280   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:59:56,069-Speed 3270.67 samples/sec   Loss 4.8419   LearningRate 0.0323   Epoch: 8   Global Step: 107290   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 10:59:59,150-Speed 3324.87 samples/sec   Loss 4.8235   LearningRate 0.0323   Epoch: 8   Global Step: 107300   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:00:02,257-Speed 3297.11 samples/sec   Loss 4.8376   LearningRate 0.0323   Epoch: 8   Global Step: 107310   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:00:05,372-Speed 3287.86 samples/sec   Loss 4.9231   LearningRate 0.0323   Epoch: 8   Global Step: 107320   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:00:08,446-Speed 3331.71 samples/sec   Loss 4.9120   LearningRate 0.0323   Epoch: 8   Global Step: 107330   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:00:11,518-Speed 3335.60 samples/sec   Loss 4.8804   LearningRate 0.0323   Epoch: 8   Global Step: 107340   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:00:14,613-Speed 3309.81 samples/sec   Loss 4.8261   LearningRate 0.0322   Epoch: 8   Global Step: 107350   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:00:17,733-Speed 3282.43 samples/sec   Loss 4.8864   LearningRate 0.0322   Epoch: 8   Global Step: 107360   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:00:20,791-Speed 3350.02 samples/sec   Loss 4.8649   LearningRate 0.0322   Epoch: 8   Global Step: 107370   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:00:23,867-Speed 3329.64 samples/sec   Loss 4.9316   LearningRate 0.0322   Epoch: 8   Global Step: 107380   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:00:26,954-Speed 3319.21 samples/sec   Loss 4.8380   LearningRate 0.0322   Epoch: 8   Global Step: 107390   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:00:30,074-Speed 3282.88 samples/sec   Loss 4.9101   LearningRate 0.0322   Epoch: 8   Global Step: 107400   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:00:33,150-Speed 3329.54 samples/sec   Loss 4.9034   LearningRate 0.0322   Epoch: 8   Global Step: 107410   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:00:36,230-Speed 3326.26 samples/sec   Loss 4.9464   LearningRate 0.0322   Epoch: 8   Global Step: 107420   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:00:39,346-Speed 3287.27 samples/sec   Loss 4.7782   LearningRate 0.0322   Epoch: 8   Global Step: 107430   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:00:42,464-Speed 3284.31 samples/sec   Loss 4.8956   LearningRate 0.0322   Epoch: 8   Global Step: 107440   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:00:45,550-Speed 3319.86 samples/sec   Loss 4.8377   LearningRate 0.0322   Epoch: 8   Global Step: 107450   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:00:48,695-Speed 3256.39 samples/sec   Loss 4.8332   LearningRate 0.0322   Epoch: 8   Global Step: 107460   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:00:51,813-Speed 3285.50 samples/sec   Loss 4.9071   LearningRate 0.0322   Epoch: 8   Global Step: 107470   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:00:54,908-Speed 3309.35 samples/sec   Loss 4.8197   LearningRate 0.0322   Epoch: 8   Global Step: 107480   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:00:57,996-Speed 3317.16 samples/sec   Loss 4.9211   LearningRate 0.0322   Epoch: 8   Global Step: 107490   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:01:01,120-Speed 3278.57 samples/sec   Loss 4.8157   LearningRate 0.0322   Epoch: 8   Global Step: 107500   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:01:04,211-Speed 3313.98 samples/sec   Loss 4.8115   LearningRate 0.0322   Epoch: 8   Global Step: 107510   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:01:07,302-Speed 3314.62 samples/sec   Loss 4.8730   LearningRate 0.0322   Epoch: 8   Global Step: 107520   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:01:10,375-Speed 3333.04 samples/sec   Loss 4.8576   LearningRate 0.0322   Epoch: 8   Global Step: 107530   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:01:13,491-Speed 3287.03 samples/sec   Loss 4.8859   LearningRate 0.0322   Epoch: 8   Global Step: 107540   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:01:16,568-Speed 3329.91 samples/sec   Loss 4.8496   LearningRate 0.0322   Epoch: 8   Global Step: 107550   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:01:19,744-Speed 3224.24 samples/sec   Loss 4.8569   LearningRate 0.0322   Epoch: 8   Global Step: 107560   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:01:22,824-Speed 3326.49 samples/sec   Loss 4.8432   LearningRate 0.0321   Epoch: 8   Global Step: 107570   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:01:26,013-Speed 3212.16 samples/sec   Loss 4.9072   LearningRate 0.0321   Epoch: 8   Global Step: 107580   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:01:29,229-Speed 3184.55 samples/sec   Loss 4.8568   LearningRate 0.0321   Epoch: 8   Global Step: 107590   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:01:32,327-Speed 3305.86 samples/sec   Loss 4.8374   LearningRate 0.0321   Epoch: 8   Global Step: 107600   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:01:35,384-Speed 3352.01 samples/sec   Loss 4.8775   LearningRate 0.0321   Epoch: 8   Global Step: 107610   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:01:38,498-Speed 3288.63 samples/sec   Loss 4.7904   LearningRate 0.0321   Epoch: 8   Global Step: 107620   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:01:41,593-Speed 3310.06 samples/sec   Loss 4.8600   LearningRate 0.0321   Epoch: 8   Global Step: 107630   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:01:44,713-Speed 3283.05 samples/sec   Loss 4.8743   LearningRate 0.0321   Epoch: 8   Global Step: 107640   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:01:47,841-Speed 3275.11 samples/sec   Loss 4.9563   LearningRate 0.0321   Epoch: 8   Global Step: 107650   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:01:50,977-Speed 3267.49 samples/sec   Loss 4.8087   LearningRate 0.0321   Epoch: 8   Global Step: 107660   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:01:54,107-Speed 3273.39 samples/sec   Loss 4.8351   LearningRate 0.0321   Epoch: 8   Global Step: 107670   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:01:57,180-Speed 3332.64 samples/sec   Loss 4.8937   LearningRate 0.0321   Epoch: 8   Global Step: 107680   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:02:00,284-Speed 3299.87 samples/sec   Loss 4.8285   LearningRate 0.0321   Epoch: 8   Global Step: 107690   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:02:03,421-Speed 3265.86 samples/sec   Loss 4.7419   LearningRate 0.0321   Epoch: 8   Global Step: 107700   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:02:06,555-Speed 3268.65 samples/sec   Loss 4.8427   LearningRate 0.0321   Epoch: 8   Global Step: 107710   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:02:09,645-Speed 3314.45 samples/sec   Loss 4.7997   LearningRate 0.0321   Epoch: 8   Global Step: 107720   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:02:12,733-Speed 3317.08 samples/sec   Loss 4.8704   LearningRate 0.0321   Epoch: 8   Global Step: 107730   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:02:15,793-Speed 3347.78 samples/sec   Loss 4.8975   LearningRate 0.0321   Epoch: 8   Global Step: 107740   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:02:18,877-Speed 3320.71 samples/sec   Loss 4.8396   LearningRate 0.0321   Epoch: 8   Global Step: 107750   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:02:21,928-Speed 3357.45 samples/sec   Loss 4.8082   LearningRate 0.0321   Epoch: 8   Global Step: 107760   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:02:25,129-Speed 3200.37 samples/sec   Loss 4.9223   LearningRate 0.0321   Epoch: 8   Global Step: 107770   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:02:28,286-Speed 3245.02 samples/sec   Loss 4.8393   LearningRate 0.0321   Epoch: 8   Global Step: 107780   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:02:31,389-Speed 3300.53 samples/sec   Loss 4.9053   LearningRate 0.0320   Epoch: 8   Global Step: 107790   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:02:34,451-Speed 3345.69 samples/sec   Loss 4.7827   LearningRate 0.0320   Epoch: 8   Global Step: 107800   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:02:37,584-Speed 3269.33 samples/sec   Loss 4.8844   LearningRate 0.0320   Epoch: 8   Global Step: 107810   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:02:40,655-Speed 3335.24 samples/sec   Loss 4.8967   LearningRate 0.0320   Epoch: 8   Global Step: 107820   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:02:43,822-Speed 3234.74 samples/sec   Loss 4.8927   LearningRate 0.0320   Epoch: 8   Global Step: 107830   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:02:46,897-Speed 3330.53 samples/sec   Loss 4.8184   LearningRate 0.0320   Epoch: 8   Global Step: 107840   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:02:50,059-Speed 3240.14 samples/sec   Loss 4.8927   LearningRate 0.0320   Epoch: 8   Global Step: 107850   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:02:53,224-Speed 3235.96 samples/sec   Loss 4.9321   LearningRate 0.0320   Epoch: 8   Global Step: 107860   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:02:56,285-Speed 3346.41 samples/sec   Loss 4.8954   LearningRate 0.0320   Epoch: 8   Global Step: 107870   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:02:59,363-Speed 3327.67 samples/sec   Loss 4.8163   LearningRate 0.0320   Epoch: 8   Global Step: 107880   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:03:02,501-Speed 3264.13 samples/sec   Loss 4.8008   LearningRate 0.0320   Epoch: 8   Global Step: 107890   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:03:05,613-Speed 3292.59 samples/sec   Loss 4.7146   LearningRate 0.0320   Epoch: 8   Global Step: 107900   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:03:08,711-Speed 3306.03 samples/sec   Loss 4.8080   LearningRate 0.0320   Epoch: 8   Global Step: 107910   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:03:11,844-Speed 3269.60 samples/sec   Loss 4.8615   LearningRate 0.0320   Epoch: 8   Global Step: 107920   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:03:14,952-Speed 3294.87 samples/sec   Loss 4.8252   LearningRate 0.0320   Epoch: 8   Global Step: 107930   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:03:18,045-Speed 3312.35 samples/sec   Loss 4.8937   LearningRate 0.0320   Epoch: 8   Global Step: 107940   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:03:21,129-Speed 3321.33 samples/sec   Loss 4.8889   LearningRate 0.0320   Epoch: 8   Global Step: 107950   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:03:24,189-Speed 3348.10 samples/sec   Loss 4.8617   LearningRate 0.0320   Epoch: 8   Global Step: 107960   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:03:27,305-Speed 3287.00 samples/sec   Loss 4.8285   LearningRate 0.0320   Epoch: 8   Global Step: 107970   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:03:30,413-Speed 3295.57 samples/sec   Loss 4.7938   LearningRate 0.0320   Epoch: 8   Global Step: 107980   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:03:33,508-Speed 3309.91 samples/sec   Loss 4.8971   LearningRate 0.0320   Epoch: 8   Global Step: 107990   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:03:36,650-Speed 3259.38 samples/sec   Loss 4.8383   LearningRate 0.0320   Epoch: 8   Global Step: 108000   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:03:39,717-Speed 3339.81 samples/sec   Loss 4.9204   LearningRate 0.0319   Epoch: 8   Global Step: 108010   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:03:42,818-Speed 3303.46 samples/sec   Loss 4.9242   LearningRate 0.0319   Epoch: 8   Global Step: 108020   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:03:45,886-Speed 3339.53 samples/sec   Loss 4.7549   LearningRate 0.0319   Epoch: 8   Global Step: 108030   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:03:48,966-Speed 3325.29 samples/sec   Loss 4.8359   LearningRate 0.0319   Epoch: 8   Global Step: 108040   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:03:52,135-Speed 3232.99 samples/sec   Loss 4.8664   LearningRate 0.0319   Epoch: 8   Global Step: 108050   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:03:55,256-Speed 3281.80 samples/sec   Loss 4.8885   LearningRate 0.0319   Epoch: 8   Global Step: 108060   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:03:58,318-Speed 3345.14 samples/sec   Loss 4.8362   LearningRate 0.0319   Epoch: 8   Global Step: 108070   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:04:01,456-Speed 3264.23 samples/sec   Loss 4.7950   LearningRate 0.0319   Epoch: 8   Global Step: 108080   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:04:04,564-Speed 3296.52 samples/sec   Loss 4.8015   LearningRate 0.0319   Epoch: 8   Global Step: 108090   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:04:07,712-Speed 3253.75 samples/sec   Loss 4.8908   LearningRate 0.0319   Epoch: 8   Global Step: 108100   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:04:10,822-Speed 3293.08 samples/sec   Loss 4.8852   LearningRate 0.0319   Epoch: 8   Global Step: 108110   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:04:13,881-Speed 3348.54 samples/sec   Loss 4.9081   LearningRate 0.0319   Epoch: 8   Global Step: 108120   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:04:17,003-Speed 3281.08 samples/sec   Loss 4.8186   LearningRate 0.0319   Epoch: 8   Global Step: 108130   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:04:20,153-Speed 3251.84 samples/sec   Loss 4.8346   LearningRate 0.0319   Epoch: 8   Global Step: 108140   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:04:23,225-Speed 3334.18 samples/sec   Loss 4.8573   LearningRate 0.0319   Epoch: 8   Global Step: 108150   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:04:26,360-Speed 3267.55 samples/sec   Loss 4.8803   LearningRate 0.0319   Epoch: 8   Global Step: 108160   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:04:29,430-Speed 3337.34 samples/sec   Loss 4.8097   LearningRate 0.0319   Epoch: 8   Global Step: 108170   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:04:32,501-Speed 3335.07 samples/sec   Loss 4.8559   LearningRate 0.0319   Epoch: 8   Global Step: 108180   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:04:35,650-Speed 3253.18 samples/sec   Loss 4.7822   LearningRate 0.0319   Epoch: 8   Global Step: 108190   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:04:38,785-Speed 3266.82 samples/sec   Loss 4.7164   LearningRate 0.0319   Epoch: 8   Global Step: 108200   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:04:41,866-Speed 3325.05 samples/sec   Loss 4.8631   LearningRate 0.0319   Epoch: 8   Global Step: 108210   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:04:44,972-Speed 3297.52 samples/sec   Loss 4.8293   LearningRate 0.0319   Epoch: 8   Global Step: 108220   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:04:48,148-Speed 3225.38 samples/sec   Loss 4.9706   LearningRate 0.0318   Epoch: 8   Global Step: 108230   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:04:51,216-Speed 3338.73 samples/sec   Loss 4.9352   LearningRate 0.0318   Epoch: 8   Global Step: 108240   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:04:54,301-Speed 3321.20 samples/sec   Loss 4.7464   LearningRate 0.0318   Epoch: 8   Global Step: 108250   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:04:57,371-Speed 3336.47 samples/sec   Loss 4.8077   LearningRate 0.0318   Epoch: 8   Global Step: 108260   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:05:00,463-Speed 3312.97 samples/sec   Loss 4.7966   LearningRate 0.0318   Epoch: 8   Global Step: 108270   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:05:03,553-Speed 3314.77 samples/sec   Loss 4.8364   LearningRate 0.0318   Epoch: 8   Global Step: 108280   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:05:06,677-Speed 3278.78 samples/sec   Loss 4.9654   LearningRate 0.0318   Epoch: 8   Global Step: 108290   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:05:09,779-Speed 3302.04 samples/sec   Loss 4.8717   LearningRate 0.0318   Epoch: 8   Global Step: 108300   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:05:12,953-Speed 3226.73 samples/sec   Loss 4.8568   LearningRate 0.0318   Epoch: 8   Global Step: 108310   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:05:16,092-Speed 3263.49 samples/sec   Loss 4.8968   LearningRate 0.0318   Epoch: 8   Global Step: 108320   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:05:19,206-Speed 3288.90 samples/sec   Loss 4.8819   LearningRate 0.0318   Epoch: 8   Global Step: 108330   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:05:22,302-Speed 3309.29 samples/sec   Loss 4.8550   LearningRate 0.0318   Epoch: 8   Global Step: 108340   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:05:25,458-Speed 3244.93 samples/sec   Loss 4.7535   LearningRate 0.0318   Epoch: 8   Global Step: 108350   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:05:28,585-Speed 3276.20 samples/sec   Loss 4.8732   LearningRate 0.0318   Epoch: 8   Global Step: 108360   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:05:31,701-Speed 3286.78 samples/sec   Loss 4.8899   LearningRate 0.0318   Epoch: 8   Global Step: 108370   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:05:34,818-Speed 3286.40 samples/sec   Loss 4.8988   LearningRate 0.0318   Epoch: 8   Global Step: 108380   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:05:37,908-Speed 3315.04 samples/sec   Loss 4.8581   LearningRate 0.0318   Epoch: 8   Global Step: 108390   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:05:40,950-Speed 3367.40 samples/sec   Loss 4.9179   LearningRate 0.0318   Epoch: 8   Global Step: 108400   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:05:44,000-Speed 3358.23 samples/sec   Loss 4.8447   LearningRate 0.0318   Epoch: 8   Global Step: 108410   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:05:47,057-Speed 3351.78 samples/sec   Loss 4.8945   LearningRate 0.0318   Epoch: 8   Global Step: 108420   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:05:50,210-Speed 3248.06 samples/sec   Loss 4.8509   LearningRate 0.0318   Epoch: 8   Global Step: 108430   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:05:53,341-Speed 3271.20 samples/sec   Loss 4.9069   LearningRate 0.0318   Epoch: 8   Global Step: 108440   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:05:57,032-Speed 2775.40 samples/sec   Loss 4.7967   LearningRate 0.0317   Epoch: 8   Global Step: 108450   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:06:00,155-Speed 3280.09 samples/sec   Loss 4.8557   LearningRate 0.0317   Epoch: 8   Global Step: 108460   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:06:03,271-Speed 3287.22 samples/sec   Loss 4.8913   LearningRate 0.0317   Epoch: 8   Global Step: 108470   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:06:06,385-Speed 3290.17 samples/sec   Loss 4.8136   LearningRate 0.0317   Epoch: 8   Global Step: 108480   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:06:09,418-Speed 3376.54 samples/sec   Loss 4.8466   LearningRate 0.0317   Epoch: 8   Global Step: 108490   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:06:12,565-Speed 3255.51 samples/sec   Loss 4.8118   LearningRate 0.0317   Epoch: 8   Global Step: 108500   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:06:15,653-Speed 3316.75 samples/sec   Loss 4.8239   LearningRate 0.0317   Epoch: 8   Global Step: 108510   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:06:18,746-Speed 3312.28 samples/sec   Loss 4.8367   LearningRate 0.0317   Epoch: 8   Global Step: 108520   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:06:21,816-Speed 3335.87 samples/sec   Loss 4.9848   LearningRate 0.0317   Epoch: 8   Global Step: 108530   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:06:24,957-Speed 3260.72 samples/sec   Loss 4.8344   LearningRate 0.0317   Epoch: 8   Global Step: 108540   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:06:28,068-Speed 3293.62 samples/sec   Loss 4.8835   LearningRate 0.0317   Epoch: 8   Global Step: 108550   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:06:31,142-Speed 3331.90 samples/sec   Loss 4.8451   LearningRate 0.0317   Epoch: 8   Global Step: 108560   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:06:34,249-Speed 3296.79 samples/sec   Loss 4.8171   LearningRate 0.0317   Epoch: 8   Global Step: 108570   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:06:37,458-Speed 3192.47 samples/sec   Loss 4.8881   LearningRate 0.0317   Epoch: 8   Global Step: 108580   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:06:40,604-Speed 3255.25 samples/sec   Loss 4.8687   LearningRate 0.0317   Epoch: 8   Global Step: 108590   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:06:43,703-Speed 3305.51 samples/sec   Loss 4.9300   LearningRate 0.0317   Epoch: 8   Global Step: 108600   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:06:46,801-Speed 3308.22 samples/sec   Loss 4.8384   LearningRate 0.0317   Epoch: 8   Global Step: 108610   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:06:49,892-Speed 3313.67 samples/sec   Loss 4.8274   LearningRate 0.0317   Epoch: 8   Global Step: 108620   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:06:53,023-Speed 3271.34 samples/sec   Loss 4.6870   LearningRate 0.0317   Epoch: 8   Global Step: 108630   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:06:56,172-Speed 3252.78 samples/sec   Loss 4.9044   LearningRate 0.0317   Epoch: 8   Global Step: 108640   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:06:59,268-Speed 3308.59 samples/sec   Loss 4.8379   LearningRate 0.0317   Epoch: 8   Global Step: 108650   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:07:02,358-Speed 3314.63 samples/sec   Loss 4.7970   LearningRate 0.0317   Epoch: 8   Global Step: 108660   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:07:05,436-Speed 3328.01 samples/sec   Loss 4.7917   LearningRate 0.0316   Epoch: 8   Global Step: 108670   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:07:08,543-Speed 3296.94 samples/sec   Loss 4.8224   LearningRate 0.0316   Epoch: 8   Global Step: 108680   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:07:11,698-Speed 3246.07 samples/sec   Loss 4.8257   LearningRate 0.0316   Epoch: 8   Global Step: 108690   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:07:14,842-Speed 3257.99 samples/sec   Loss 4.7919   LearningRate 0.0316   Epoch: 8   Global Step: 108700   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:07:17,971-Speed 3274.07 samples/sec   Loss 4.8600   LearningRate 0.0316   Epoch: 8   Global Step: 108710   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:07:21,093-Speed 3281.55 samples/sec   Loss 4.8405   LearningRate 0.0316   Epoch: 8   Global Step: 108720   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:07:24,196-Speed 3300.54 samples/sec   Loss 4.7873   LearningRate 0.0316   Epoch: 8   Global Step: 108730   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:07:27,297-Speed 3302.61 samples/sec   Loss 4.8026   LearningRate 0.0316   Epoch: 8   Global Step: 108740   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:07:30,406-Speed 3295.41 samples/sec   Loss 4.8515   LearningRate 0.0316   Epoch: 8   Global Step: 108750   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:07:34,811-Speed 2325.03 samples/sec   Loss 4.9120   LearningRate 0.0316   Epoch: 8   Global Step: 108760   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:07:39,673-Speed 2106.79 samples/sec   Loss 4.8782   LearningRate 0.0316   Epoch: 8   Global Step: 108770   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:07:42,786-Speed 3290.49 samples/sec   Loss 4.9277   LearningRate 0.0316   Epoch: 8   Global Step: 108780   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:07:45,857-Speed 3335.27 samples/sec   Loss 4.8952   LearningRate 0.0316   Epoch: 8   Global Step: 108790   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:07:48,978-Speed 3282.32 samples/sec   Loss 4.8400   LearningRate 0.0316   Epoch: 8   Global Step: 108800   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:07:52,116-Speed 3264.14 samples/sec   Loss 4.8153   LearningRate 0.0316   Epoch: 8   Global Step: 108810   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:07:55,226-Speed 3292.58 samples/sec   Loss 4.8793   LearningRate 0.0316   Epoch: 8   Global Step: 108820   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:07:58,319-Speed 3313.00 samples/sec   Loss 4.8119   LearningRate 0.0316   Epoch: 8   Global Step: 108830   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:08:01,508-Speed 3211.17 samples/sec   Loss 4.7919   LearningRate 0.0316   Epoch: 8   Global Step: 108840   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:08:04,658-Speed 3252.05 samples/sec   Loss 4.8467   LearningRate 0.0316   Epoch: 8   Global Step: 108850   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:08:07,835-Speed 3224.14 samples/sec   Loss 4.8847   LearningRate 0.0316   Epoch: 8   Global Step: 108860   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:08:10,906-Speed 3335.40 samples/sec   Loss 4.8588   LearningRate 0.0316   Epoch: 8   Global Step: 108870   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:08:14,056-Speed 3251.98 samples/sec   Loss 4.8263   LearningRate 0.0316   Epoch: 8   Global Step: 108880   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:08:17,164-Speed 3296.16 samples/sec   Loss 4.8821   LearningRate 0.0315   Epoch: 8   Global Step: 108890   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:08:20,230-Speed 3341.20 samples/sec   Loss 4.7699   LearningRate 0.0315   Epoch: 8   Global Step: 108900   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:08:23,325-Speed 3308.90 samples/sec   Loss 4.9519   LearningRate 0.0315   Epoch: 8   Global Step: 108910   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:08:26,479-Speed 3247.20 samples/sec   Loss 4.7888   LearningRate 0.0315   Epoch: 8   Global Step: 108920   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:08:29,562-Speed 3323.04 samples/sec   Loss 4.9239   LearningRate 0.0315   Epoch: 8   Global Step: 108930   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:08:32,643-Speed 3324.88 samples/sec   Loss 4.9050   LearningRate 0.0315   Epoch: 8   Global Step: 108940   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:08:35,799-Speed 3245.92 samples/sec   Loss 4.9304   LearningRate 0.0315   Epoch: 8   Global Step: 108950   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:08:38,910-Speed 3292.10 samples/sec   Loss 4.8497   LearningRate 0.0315   Epoch: 8   Global Step: 108960   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:08:41,997-Speed 3318.43 samples/sec   Loss 4.8669   LearningRate 0.0315   Epoch: 8   Global Step: 108970   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:08:45,088-Speed 3313.90 samples/sec   Loss 4.7133   LearningRate 0.0315   Epoch: 8   Global Step: 108980   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:08:48,181-Speed 3311.41 samples/sec   Loss 4.8797   LearningRate 0.0315   Epoch: 8   Global Step: 108990   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:08:51,273-Speed 3312.91 samples/sec   Loss 4.8218   LearningRate 0.0315   Epoch: 8   Global Step: 109000   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:08:54,364-Speed 3313.74 samples/sec   Loss 4.9640   LearningRate 0.0315   Epoch: 8   Global Step: 109010   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:08:57,418-Speed 3354.31 samples/sec   Loss 4.8582   LearningRate 0.0315   Epoch: 8   Global Step: 109020   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:09:00,471-Speed 3355.02 samples/sec   Loss 4.9551   LearningRate 0.0315   Epoch: 8   Global Step: 109030   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:09:03,574-Speed 3301.02 samples/sec   Loss 4.9109   LearningRate 0.0315   Epoch: 8   Global Step: 109040   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:09:06,675-Speed 3303.31 samples/sec   Loss 4.9102   LearningRate 0.0315   Epoch: 8   Global Step: 109050   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:09:09,793-Speed 3285.33 samples/sec   Loss 4.7866   LearningRate 0.0315   Epoch: 8   Global Step: 109060   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:09:12,924-Speed 3272.29 samples/sec   Loss 4.8805   LearningRate 0.0315   Epoch: 8   Global Step: 109070   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:09:16,041-Speed 3286.06 samples/sec   Loss 4.8465   LearningRate 0.0315   Epoch: 8   Global Step: 109080   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:09:19,149-Speed 3294.99 samples/sec   Loss 4.9119   LearningRate 0.0315   Epoch: 8   Global Step: 109090   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:09:22,246-Speed 3308.01 samples/sec   Loss 4.8312   LearningRate 0.0315   Epoch: 8   Global Step: 109100   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:09:25,428-Speed 3218.76 samples/sec   Loss 4.8415   LearningRate 0.0314   Epoch: 8   Global Step: 109110   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:09:28,515-Speed 3318.51 samples/sec   Loss 4.8716   LearningRate 0.0314   Epoch: 8   Global Step: 109120   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:09:31,641-Speed 3276.63 samples/sec   Loss 4.8902   LearningRate 0.0314   Epoch: 8   Global Step: 109130   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:09:34,714-Speed 3333.77 samples/sec   Loss 4.7684   LearningRate 0.0314   Epoch: 8   Global Step: 109140   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:09:37,796-Speed 3325.22 samples/sec   Loss 4.8697   LearningRate 0.0314   Epoch: 8   Global Step: 109150   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:09:40,874-Speed 3327.60 samples/sec   Loss 4.9367   LearningRate 0.0314   Epoch: 8   Global Step: 109160   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:09:43,952-Speed 3327.58 samples/sec   Loss 5.0028   LearningRate 0.0314   Epoch: 8   Global Step: 109170   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:09:47,048-Speed 3308.44 samples/sec   Loss 4.8870   LearningRate 0.0314   Epoch: 8   Global Step: 109180   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:09:50,161-Speed 3290.47 samples/sec   Loss 4.7661   LearningRate 0.0314   Epoch: 8   Global Step: 109190   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:09:53,290-Speed 3274.04 samples/sec   Loss 4.8870   LearningRate 0.0314   Epoch: 8   Global Step: 109200   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:09:56,356-Speed 3341.16 samples/sec   Loss 4.8366   LearningRate 0.0314   Epoch: 8   Global Step: 109210   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:09:59,437-Speed 3324.45 samples/sec   Loss 4.8214   LearningRate 0.0314   Epoch: 8   Global Step: 109220   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:10:02,500-Speed 3344.05 samples/sec   Loss 4.7812   LearningRate 0.0314   Epoch: 8   Global Step: 109230   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:10:05,617-Speed 3286.45 samples/sec   Loss 4.9447   LearningRate 0.0314   Epoch: 8   Global Step: 109240   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:10:08,688-Speed 3334.75 samples/sec   Loss 4.7968   LearningRate 0.0314   Epoch: 8   Global Step: 109250   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:10:11,793-Speed 3299.55 samples/sec   Loss 4.8940   LearningRate 0.0314   Epoch: 8   Global Step: 109260   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:10:14,908-Speed 3288.55 samples/sec   Loss 4.9357   LearningRate 0.0314   Epoch: 8   Global Step: 109270   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:10:18,032-Speed 3279.42 samples/sec   Loss 4.9612   LearningRate 0.0314   Epoch: 8   Global Step: 109280   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:10:21,103-Speed 3334.81 samples/sec   Loss 4.9177   LearningRate 0.0314   Epoch: 8   Global Step: 109290   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:10:24,179-Speed 3330.79 samples/sec   Loss 4.8322   LearningRate 0.0314   Epoch: 8   Global Step: 109300   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:10:27,282-Speed 3300.97 samples/sec   Loss 4.8393   LearningRate 0.0314   Epoch: 8   Global Step: 109310   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:10:30,393-Speed 3291.73 samples/sec   Loss 4.8203   LearningRate 0.0314   Epoch: 8   Global Step: 109320   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:10:33,453-Speed 3347.24 samples/sec   Loss 4.8197   LearningRate 0.0313   Epoch: 8   Global Step: 109330   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:10:36,633-Speed 3221.37 samples/sec   Loss 4.7626   LearningRate 0.0313   Epoch: 8   Global Step: 109340   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:10:39,755-Speed 3280.85 samples/sec   Loss 4.9078   LearningRate 0.0313   Epoch: 8   Global Step: 109350   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:10:42,913-Speed 3244.40 samples/sec   Loss 4.9136   LearningRate 0.0313   Epoch: 8   Global Step: 109360   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:10:45,977-Speed 3342.69 samples/sec   Loss 4.8885   LearningRate 0.0313   Epoch: 8   Global Step: 109370   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:10:49,076-Speed 3305.72 samples/sec   Loss 4.7976   LearningRate 0.0313   Epoch: 8   Global Step: 109380   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:10:52,141-Speed 3341.03 samples/sec   Loss 4.9283   LearningRate 0.0313   Epoch: 8   Global Step: 109390   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:10:55,182-Speed 3368.82 samples/sec   Loss 4.9291   LearningRate 0.0313   Epoch: 8   Global Step: 109400   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:10:58,260-Speed 3327.78 samples/sec   Loss 4.9097   LearningRate 0.0313   Epoch: 8   Global Step: 109410   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:11:01,346-Speed 3319.66 samples/sec   Loss 4.8746   LearningRate 0.0313   Epoch: 8   Global Step: 109420   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:11:04,464-Speed 3284.34 samples/sec   Loss 4.8869   LearningRate 0.0313   Epoch: 8   Global Step: 109430   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:11:07,576-Speed 3292.45 samples/sec   Loss 4.8071   LearningRate 0.0313   Epoch: 8   Global Step: 109440   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:11:10,672-Speed 3308.04 samples/sec   Loss 4.9304   LearningRate 0.0313   Epoch: 8   Global Step: 109450   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:11:13,835-Speed 3239.02 samples/sec   Loss 4.9796   LearningRate 0.0313   Epoch: 8   Global Step: 109460   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:11:17,029-Speed 3206.51 samples/sec   Loss 4.9136   LearningRate 0.0313   Epoch: 8   Global Step: 109470   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:11:20,144-Speed 3288.34 samples/sec   Loss 4.8204   LearningRate 0.0313   Epoch: 8   Global Step: 109480   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:11:23,229-Speed 3320.21 samples/sec   Loss 4.8548   LearningRate 0.0313   Epoch: 8   Global Step: 109490   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:11:26,336-Speed 3297.15 samples/sec   Loss 4.8944   LearningRate 0.0313   Epoch: 8   Global Step: 109500   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:11:29,409-Speed 3333.26 samples/sec   Loss 4.8143   LearningRate 0.0313   Epoch: 8   Global Step: 109510   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:11:32,508-Speed 3305.69 samples/sec   Loss 4.9208   LearningRate 0.0313   Epoch: 8   Global Step: 109520   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:11:35,675-Speed 3233.94 samples/sec   Loss 4.8427   LearningRate 0.0313   Epoch: 8   Global Step: 109530   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:11:38,842-Speed 3234.66 samples/sec   Loss 4.7879   LearningRate 0.0313   Epoch: 8   Global Step: 109540   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:11:41,974-Speed 3270.45 samples/sec   Loss 4.8389   LearningRate 0.0312   Epoch: 8   Global Step: 109550   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:11:45,053-Speed 3326.46 samples/sec   Loss 4.9299   LearningRate 0.0312   Epoch: 8   Global Step: 109560   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:11:48,208-Speed 3246.78 samples/sec   Loss 4.8063   LearningRate 0.0312   Epoch: 8   Global Step: 109570   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:11:51,364-Speed 3245.99 samples/sec   Loss 4.9092   LearningRate 0.0312   Epoch: 8   Global Step: 109580   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:11:54,503-Speed 3263.21 samples/sec   Loss 4.8123   LearningRate 0.0312   Epoch: 8   Global Step: 109590   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:11:57,640-Speed 3265.51 samples/sec   Loss 4.8874   LearningRate 0.0312   Epoch: 8   Global Step: 109600   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:12:00,829-Speed 3211.80 samples/sec   Loss 4.8779   LearningRate 0.0312   Epoch: 8   Global Step: 109610   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:12:03,922-Speed 3311.24 samples/sec   Loss 4.8397   LearningRate 0.0312   Epoch: 8   Global Step: 109620   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:12:06,996-Speed 3333.01 samples/sec   Loss 4.8437   LearningRate 0.0312   Epoch: 8   Global Step: 109630   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:12:10,074-Speed 3327.34 samples/sec   Loss 4.9163   LearningRate 0.0312   Epoch: 8   Global Step: 109640   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:12:13,202-Speed 3274.34 samples/sec   Loss 4.8494   LearningRate 0.0312   Epoch: 8   Global Step: 109650   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:12:16,273-Speed 3335.87 samples/sec   Loss 4.8122   LearningRate 0.0312   Epoch: 8   Global Step: 109660   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:12:19,385-Speed 3292.11 samples/sec   Loss 4.9407   LearningRate 0.0312   Epoch: 8   Global Step: 109670   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:12:22,476-Speed 3313.21 samples/sec   Loss 4.9312   LearningRate 0.0312   Epoch: 8   Global Step: 109680   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:12:25,613-Speed 3265.17 samples/sec   Loss 4.9931   LearningRate 0.0312   Epoch: 8   Global Step: 109690   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:12:28,803-Speed 3211.95 samples/sec   Loss 4.8598   LearningRate 0.0312   Epoch: 8   Global Step: 109700   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:12:31,903-Speed 3304.06 samples/sec   Loss 4.9365   LearningRate 0.0312   Epoch: 8   Global Step: 109710   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:12:35,012-Speed 3294.13 samples/sec   Loss 4.9373   LearningRate 0.0312   Epoch: 8   Global Step: 109720   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:12:38,184-Speed 3229.42 samples/sec   Loss 4.8793   LearningRate 0.0312   Epoch: 8   Global Step: 109730   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:12:41,399-Speed 3186.15 samples/sec   Loss 4.9069   LearningRate 0.0312   Epoch: 8   Global Step: 109740   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:12:44,551-Speed 3249.95 samples/sec   Loss 4.6950   LearningRate 0.0312   Epoch: 8   Global Step: 109750   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:12:47,685-Speed 3267.76 samples/sec   Loss 4.9282   LearningRate 0.0312   Epoch: 8   Global Step: 109760   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:12:50,829-Speed 3258.60 samples/sec   Loss 4.8251   LearningRate 0.0312   Epoch: 8   Global Step: 109770   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:12:53,933-Speed 3300.52 samples/sec   Loss 4.7805   LearningRate 0.0311   Epoch: 8   Global Step: 109780   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:12:57,026-Speed 3311.12 samples/sec   Loss 4.8537   LearningRate 0.0311   Epoch: 8   Global Step: 109790   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:13:00,153-Speed 3276.20 samples/sec   Loss 4.8828   LearningRate 0.0311   Epoch: 8   Global Step: 109800   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:13:03,343-Speed 3211.44 samples/sec   Loss 4.8896   LearningRate 0.0311   Epoch: 8   Global Step: 109810   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:13:06,533-Speed 3210.98 samples/sec   Loss 4.9266   LearningRate 0.0311   Epoch: 8   Global Step: 109820   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:13:09,629-Speed 3308.98 samples/sec   Loss 4.8841   LearningRate 0.0311   Epoch: 8   Global Step: 109830   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:13:12,822-Speed 3207.14 samples/sec   Loss 4.8338   LearningRate 0.0311   Epoch: 8   Global Step: 109840   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:13:16,030-Speed 3193.32 samples/sec   Loss 4.8732   LearningRate 0.0311   Epoch: 8   Global Step: 109850   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:13:19,165-Speed 3267.55 samples/sec   Loss 4.9820   LearningRate 0.0311   Epoch: 8   Global Step: 109860   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:13:22,239-Speed 3332.09 samples/sec   Loss 4.8228   LearningRate 0.0311   Epoch: 8   Global Step: 109870   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:13:25,357-Speed 3285.17 samples/sec   Loss 4.8476   LearningRate 0.0311   Epoch: 8   Global Step: 109880   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:13:28,450-Speed 3312.35 samples/sec   Loss 4.8011   LearningRate 0.0311   Epoch: 8   Global Step: 109890   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:13:31,589-Speed 3262.18 samples/sec   Loss 4.9359   LearningRate 0.0311   Epoch: 8   Global Step: 109900   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:13:34,707-Speed 3285.32 samples/sec   Loss 4.9028   LearningRate 0.0311   Epoch: 8   Global Step: 109910   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:13:37,841-Speed 3268.82 samples/sec   Loss 4.9013   LearningRate 0.0311   Epoch: 8   Global Step: 109920   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:13:41,039-Speed 3202.77 samples/sec   Loss 4.8976   LearningRate 0.0311   Epoch: 8   Global Step: 109930   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:13:44,171-Speed 3271.68 samples/sec   Loss 4.8970   LearningRate 0.0311   Epoch: 8   Global Step: 109940   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:13:47,317-Speed 3255.18 samples/sec   Loss 4.8067   LearningRate 0.0311   Epoch: 8   Global Step: 109950   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:13:50,479-Speed 3239.27 samples/sec   Loss 4.8594   LearningRate 0.0311   Epoch: 8   Global Step: 109960   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:13:53,600-Speed 3281.94 samples/sec   Loss 4.8191   LearningRate 0.0311   Epoch: 8   Global Step: 109970   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:13:56,669-Speed 3337.60 samples/sec   Loss 4.9167   LearningRate 0.0311   Epoch: 8   Global Step: 109980   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:13:59,740-Speed 3335.90 samples/sec   Loss 4.9054   LearningRate 0.0311   Epoch: 8   Global Step: 109990   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:14:02,865-Speed 3277.95 samples/sec   Loss 4.8000   LearningRate 0.0310   Epoch: 8   Global Step: 110000   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:14:05,926-Speed 3345.84 samples/sec   Loss 4.8714   LearningRate 0.0310   Epoch: 8   Global Step: 110010   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:14:08,971-Speed 3364.05 samples/sec   Loss 4.9177   LearningRate 0.0310   Epoch: 8   Global Step: 110020   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:14:12,091-Speed 3283.31 samples/sec   Loss 4.9894   LearningRate 0.0310   Epoch: 8   Global Step: 110030   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:14:15,199-Speed 3295.98 samples/sec   Loss 4.7432   LearningRate 0.0310   Epoch: 8   Global Step: 110040   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:14:18,343-Speed 3257.79 samples/sec   Loss 4.9296   LearningRate 0.0310   Epoch: 8   Global Step: 110050   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:14:21,464-Speed 3281.63 samples/sec   Loss 4.9153   LearningRate 0.0310   Epoch: 8   Global Step: 110060   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:14:24,612-Speed 3254.95 samples/sec   Loss 4.8876   LearningRate 0.0310   Epoch: 8   Global Step: 110070   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:14:27,693-Speed 3324.56 samples/sec   Loss 4.7754   LearningRate 0.0310   Epoch: 8   Global Step: 110080   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:14:30,818-Speed 3277.98 samples/sec   Loss 4.8851   LearningRate 0.0310   Epoch: 8   Global Step: 110090   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:14:33,952-Speed 3268.32 samples/sec   Loss 4.8181   LearningRate 0.0310   Epoch: 8   Global Step: 110100   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:14:37,078-Speed 3277.35 samples/sec   Loss 4.9152   LearningRate 0.0310   Epoch: 8   Global Step: 110110   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:14:40,171-Speed 3310.77 samples/sec   Loss 4.8840   LearningRate 0.0310   Epoch: 8   Global Step: 110120   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:14:43,292-Speed 3283.12 samples/sec   Loss 4.8480   LearningRate 0.0310   Epoch: 8   Global Step: 110130   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:14:46,338-Speed 3362.37 samples/sec   Loss 4.8332   LearningRate 0.0310   Epoch: 8   Global Step: 110140   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:14:49,403-Speed 3341.52 samples/sec   Loss 4.8346   LearningRate 0.0310   Epoch: 8   Global Step: 110150   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:14:52,547-Speed 3258.49 samples/sec   Loss 4.8345   LearningRate 0.0310   Epoch: 8   Global Step: 110160   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:14:55,708-Speed 3241.19 samples/sec   Loss 4.8050   LearningRate 0.0310   Epoch: 8   Global Step: 110170   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:14:58,757-Speed 3359.36 samples/sec   Loss 4.8962   LearningRate 0.0310   Epoch: 8   Global Step: 110180   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:15:01,916-Speed 3242.28 samples/sec   Loss 4.9190   LearningRate 0.0310   Epoch: 8   Global Step: 110190   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:15:05,077-Speed 3240.60 samples/sec   Loss 4.7583   LearningRate 0.0310   Epoch: 8   Global Step: 110200   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:15:08,194-Speed 3286.33 samples/sec   Loss 4.8167   LearningRate 0.0310   Epoch: 8   Global Step: 110210   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:15:11,257-Speed 3344.66 samples/sec   Loss 4.8635   LearningRate 0.0309   Epoch: 8   Global Step: 110220   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:15:14,441-Speed 3216.77 samples/sec   Loss 4.7840   LearningRate 0.0309   Epoch: 8   Global Step: 110230   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:15:17,577-Speed 3265.91 samples/sec   Loss 4.7223   LearningRate 0.0309   Epoch: 8   Global Step: 110240   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:15:20,626-Speed 3359.60 samples/sec   Loss 4.9596   LearningRate 0.0309   Epoch: 8   Global Step: 110250   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:15:23,734-Speed 3295.98 samples/sec   Loss 4.8214   LearningRate 0.0309   Epoch: 8   Global Step: 110260   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:15:26,920-Speed 3214.85 samples/sec   Loss 4.8129   LearningRate 0.0309   Epoch: 8   Global Step: 110270   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:15:29,982-Speed 3345.42 samples/sec   Loss 4.8319   LearningRate 0.0309   Epoch: 8   Global Step: 110280   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:15:33,071-Speed 3315.59 samples/sec   Loss 4.8311   LearningRate 0.0309   Epoch: 8   Global Step: 110290   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:15:36,175-Speed 3300.41 samples/sec   Loss 4.9273   LearningRate 0.0309   Epoch: 8   Global Step: 110300   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:15:39,259-Speed 3321.03 samples/sec   Loss 4.9200   LearningRate 0.0309   Epoch: 8   Global Step: 110310   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:15:42,331-Speed 3334.18 samples/sec   Loss 4.9367   LearningRate 0.0309   Epoch: 8   Global Step: 110320   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:15:45,423-Speed 3313.35 samples/sec   Loss 4.8206   LearningRate 0.0309   Epoch: 8   Global Step: 110330   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:15:48,489-Speed 3340.90 samples/sec   Loss 4.9025   LearningRate 0.0309   Epoch: 8   Global Step: 110340   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:15:51,562-Speed 3332.97 samples/sec   Loss 4.9965   LearningRate 0.0309   Epoch: 8   Global Step: 110350   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:15:54,645-Speed 3323.69 samples/sec   Loss 4.8321   LearningRate 0.0309   Epoch: 8   Global Step: 110360   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:15:57,697-Speed 3355.46 samples/sec   Loss 4.8645   LearningRate 0.0309   Epoch: 8   Global Step: 110370   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:16:00,816-Speed 3284.66 samples/sec   Loss 4.8632   LearningRate 0.0309   Epoch: 8   Global Step: 110380   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:16:03,878-Speed 3344.66 samples/sec   Loss 4.9392   LearningRate 0.0309   Epoch: 8   Global Step: 110390   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:16:06,934-Speed 3351.92 samples/sec   Loss 4.8665   LearningRate 0.0309   Epoch: 8   Global Step: 110400   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:16:10,008-Speed 3332.46 samples/sec   Loss 4.8643   LearningRate 0.0309   Epoch: 8   Global Step: 110410   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:16:13,130-Speed 3280.90 samples/sec   Loss 4.9398   LearningRate 0.0309   Epoch: 8   Global Step: 110420   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:16:16,243-Speed 3290.06 samples/sec   Loss 4.8466   LearningRate 0.0309   Epoch: 8   Global Step: 110430   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:16:19,367-Speed 3279.39 samples/sec   Loss 4.8568   LearningRate 0.0309   Epoch: 8   Global Step: 110440   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:16:22,443-Speed 3330.06 samples/sec   Loss 4.9552   LearningRate 0.0308   Epoch: 8   Global Step: 110450   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:16:25,528-Speed 3320.69 samples/sec   Loss 4.8332   LearningRate 0.0308   Epoch: 8   Global Step: 110460   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:16:28,632-Speed 3300.62 samples/sec   Loss 4.9016   LearningRate 0.0308   Epoch: 8   Global Step: 110470   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:16:31,698-Speed 3340.32 samples/sec   Loss 4.8442   LearningRate 0.0308   Epoch: 8   Global Step: 110480   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:16:34,828-Speed 3272.31 samples/sec   Loss 4.9040   LearningRate 0.0308   Epoch: 8   Global Step: 110490   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:16:37,900-Speed 3334.37 samples/sec   Loss 4.8602   LearningRate 0.0308   Epoch: 8   Global Step: 110500   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:16:41,010-Speed 3294.13 samples/sec   Loss 4.9266   LearningRate 0.0308   Epoch: 8   Global Step: 110510   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:16:44,087-Speed 3328.49 samples/sec   Loss 4.9094   LearningRate 0.0308   Epoch: 8   Global Step: 110520   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:16:47,135-Speed 3361.02 samples/sec   Loss 4.8984   LearningRate 0.0308   Epoch: 8   Global Step: 110530   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:16:50,202-Speed 3340.34 samples/sec   Loss 4.8206   LearningRate 0.0308   Epoch: 8   Global Step: 110540   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:16:53,310-Speed 3295.16 samples/sec   Loss 4.7813   LearningRate 0.0308   Epoch: 8   Global Step: 110550   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:16:56,406-Speed 3308.99 samples/sec   Loss 4.8329   LearningRate 0.0308   Epoch: 8   Global Step: 110560   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:16:59,459-Speed 3354.67 samples/sec   Loss 4.8085   LearningRate 0.0308   Epoch: 8   Global Step: 110570   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:17:02,542-Speed 3323.02 samples/sec   Loss 4.8496   LearningRate 0.0308   Epoch: 8   Global Step: 110580   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:17:05,661-Speed 3283.78 samples/sec   Loss 4.9385   LearningRate 0.0308   Epoch: 8   Global Step: 110590   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:17:08,727-Speed 3340.86 samples/sec   Loss 4.8589   LearningRate 0.0308   Epoch: 8   Global Step: 110600   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:17:11,787-Speed 3347.60 samples/sec   Loss 4.8908   LearningRate 0.0308   Epoch: 8   Global Step: 110610   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:17:14,894-Speed 3297.42 samples/sec   Loss 4.8827   LearningRate 0.0308   Epoch: 8   Global Step: 110620   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:17:17,950-Speed 3351.27 samples/sec   Loss 4.8161   LearningRate 0.0308   Epoch: 8   Global Step: 110630   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:17:21,009-Speed 3348.38 samples/sec   Loss 4.8450   LearningRate 0.0308   Epoch: 8   Global Step: 110640   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:17:24,070-Speed 3347.04 samples/sec   Loss 4.8813   LearningRate 0.0308   Epoch: 8   Global Step: 110650   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:17:27,257-Speed 3214.27 samples/sec   Loss 4.8568   LearningRate 0.0308   Epoch: 8   Global Step: 110660   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:17:30,344-Speed 3317.87 samples/sec   Loss 4.8678   LearningRate 0.0307   Epoch: 8   Global Step: 110670   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:17:33,402-Speed 3349.83 samples/sec   Loss 4.8195   LearningRate 0.0307   Epoch: 8   Global Step: 110680   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:17:36,535-Speed 3269.67 samples/sec   Loss 4.8288   LearningRate 0.0307   Epoch: 8   Global Step: 110690   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:17:39,651-Speed 3286.33 samples/sec   Loss 4.9065   LearningRate 0.0307   Epoch: 8   Global Step: 110700   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:17:42,742-Speed 3314.58 samples/sec   Loss 4.8973   LearningRate 0.0307   Epoch: 8   Global Step: 110710   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:17:45,818-Speed 3329.94 samples/sec   Loss 4.8529   LearningRate 0.0307   Epoch: 8   Global Step: 110720   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:17:48,960-Speed 3259.53 samples/sec   Loss 4.7904   LearningRate 0.0307   Epoch: 8   Global Step: 110730   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:17:52,124-Speed 3237.77 samples/sec   Loss 4.9121   LearningRate 0.0307   Epoch: 8   Global Step: 110740   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:17:55,275-Speed 3250.23 samples/sec   Loss 4.8186   LearningRate 0.0307   Epoch: 8   Global Step: 110750   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:17:58,332-Speed 3350.74 samples/sec   Loss 4.9068   LearningRate 0.0307   Epoch: 8   Global Step: 110760   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:18:01,505-Speed 3228.48 samples/sec   Loss 4.8012   LearningRate 0.0307   Epoch: 8   Global Step: 110770   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:18:04,600-Speed 3309.40 samples/sec   Loss 4.9246   LearningRate 0.0307   Epoch: 8   Global Step: 110780   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:18:07,773-Speed 3228.23 samples/sec   Loss 4.9864   LearningRate 0.0307   Epoch: 8   Global Step: 110790   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:18:10,894-Speed 3282.89 samples/sec   Loss 4.8039   LearningRate 0.0307   Epoch: 8   Global Step: 110800   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:18:14,027-Speed 3268.61 samples/sec   Loss 4.7266   LearningRate 0.0307   Epoch: 8   Global Step: 110810   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:18:17,181-Speed 3248.02 samples/sec   Loss 4.8282   LearningRate 0.0307   Epoch: 8   Global Step: 110820   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:18:20,232-Speed 3357.24 samples/sec   Loss 4.8675   LearningRate 0.0307   Epoch: 8   Global Step: 110830   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:18:23,321-Speed 3316.78 samples/sec   Loss 4.8520   LearningRate 0.0307   Epoch: 8   Global Step: 110840   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:18:26,405-Speed 3321.37 samples/sec   Loss 4.8072   LearningRate 0.0307   Epoch: 8   Global Step: 110850   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:18:29,530-Speed 3277.59 samples/sec   Loss 4.8539   LearningRate 0.0307   Epoch: 8   Global Step: 110860   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:18:32,615-Speed 3320.36 samples/sec   Loss 4.9055   LearningRate 0.0307   Epoch: 8   Global Step: 110870   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:18:35,770-Speed 3246.09 samples/sec   Loss 4.8244   LearningRate 0.0307   Epoch: 8   Global Step: 110880   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:18:38,878-Speed 3295.35 samples/sec   Loss 4.9179   LearningRate 0.0306   Epoch: 8   Global Step: 110890   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:18:41,924-Speed 3363.96 samples/sec   Loss 4.8404   LearningRate 0.0306   Epoch: 8   Global Step: 110900   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:18:45,012-Speed 3317.44 samples/sec   Loss 4.8836   LearningRate 0.0306   Epoch: 8   Global Step: 110910   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:18:48,168-Speed 3245.12 samples/sec   Loss 4.8822   LearningRate 0.0306   Epoch: 8   Global Step: 110920   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:18:51,240-Speed 3335.29 samples/sec   Loss 4.8425   LearningRate 0.0306   Epoch: 8   Global Step: 110930   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:18:54,350-Speed 3293.30 samples/sec   Loss 4.9117   LearningRate 0.0306   Epoch: 8   Global Step: 110940   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:18:57,409-Speed 3349.33 samples/sec   Loss 4.9605   LearningRate 0.0306   Epoch: 8   Global Step: 110950   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:19:00,545-Speed 3265.57 samples/sec   Loss 4.9548   LearningRate 0.0306   Epoch: 8   Global Step: 110960   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:19:03,646-Speed 3303.14 samples/sec   Loss 4.9286   LearningRate 0.0306   Epoch: 8   Global Step: 110970   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:19:06,716-Speed 3337.26 samples/sec   Loss 4.9421   LearningRate 0.0306   Epoch: 8   Global Step: 110980   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:19:09,790-Speed 3331.57 samples/sec   Loss 4.8316   LearningRate 0.0306   Epoch: 8   Global Step: 110990   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:19:12,934-Speed 3259.03 samples/sec   Loss 4.8707   LearningRate 0.0306   Epoch: 8   Global Step: 111000   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:19:16,115-Speed 3220.03 samples/sec   Loss 4.8678   LearningRate 0.0306   Epoch: 8   Global Step: 111010   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:19:19,235-Speed 3283.16 samples/sec   Loss 4.8176   LearningRate 0.0306   Epoch: 8   Global Step: 111020   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:19:22,297-Speed 3345.12 samples/sec   Loss 4.8533   LearningRate 0.0306   Epoch: 8   Global Step: 111030   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:19:25,379-Speed 3324.31 samples/sec   Loss 4.8167   LearningRate 0.0306   Epoch: 8   Global Step: 111040   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:19:28,496-Speed 3285.91 samples/sec   Loss 4.7346   LearningRate 0.0306   Epoch: 8   Global Step: 111050   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:19:31,591-Speed 3309.88 samples/sec   Loss 4.9524   LearningRate 0.0306   Epoch: 8   Global Step: 111060   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:19:34,693-Speed 3302.68 samples/sec   Loss 4.9317   LearningRate 0.0306   Epoch: 8   Global Step: 111070   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:19:37,832-Speed 3263.20 samples/sec   Loss 4.8770   LearningRate 0.0306   Epoch: 8   Global Step: 111080   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:19:40,890-Speed 3349.82 samples/sec   Loss 4.8515   LearningRate 0.0306   Epoch: 8   Global Step: 111090   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:19:44,000-Speed 3293.39 samples/sec   Loss 4.8076   LearningRate 0.0306   Epoch: 8   Global Step: 111100   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:19:47,114-Speed 3290.05 samples/sec   Loss 4.9090   LearningRate 0.0306   Epoch: 8   Global Step: 111110   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:19:50,215-Speed 3302.62 samples/sec   Loss 4.7631   LearningRate 0.0305   Epoch: 8   Global Step: 111120   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:19:53,278-Speed 3343.89 samples/sec   Loss 4.8488   LearningRate 0.0305   Epoch: 8   Global Step: 111130   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:19:56,330-Speed 3356.27 samples/sec   Loss 4.8829   LearningRate 0.0305   Epoch: 8   Global Step: 111140   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:19:59,402-Speed 3334.82 samples/sec   Loss 4.9359   LearningRate 0.0305   Epoch: 8   Global Step: 111150   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:20:02,511-Speed 3295.32 samples/sec   Loss 4.9313   LearningRate 0.0305   Epoch: 8   Global Step: 111160   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:20:05,630-Speed 3283.51 samples/sec   Loss 4.8267   LearningRate 0.0305   Epoch: 8   Global Step: 111170   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:20:08,719-Speed 3316.08 samples/sec   Loss 4.8911   LearningRate 0.0305   Epoch: 8   Global Step: 111180   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:20:11,844-Speed 3278.73 samples/sec   Loss 4.9124   LearningRate 0.0305   Epoch: 8   Global Step: 111190   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:20:14,942-Speed 3306.24 samples/sec   Loss 4.7897   LearningRate 0.0305   Epoch: 8   Global Step: 111200   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:20:18,042-Speed 3303.98 samples/sec   Loss 4.8933   LearningRate 0.0305   Epoch: 8   Global Step: 111210   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:20:21,130-Speed 3316.95 samples/sec   Loss 4.8675   LearningRate 0.0305   Epoch: 8   Global Step: 111220   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:20:24,207-Speed 3328.90 samples/sec   Loss 4.9389   LearningRate 0.0305   Epoch: 8   Global Step: 111230   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:20:27,312-Speed 3299.57 samples/sec   Loss 4.8758   LearningRate 0.0305   Epoch: 8   Global Step: 111240   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:20:30,413-Speed 3303.26 samples/sec   Loss 4.8372   LearningRate 0.0305   Epoch: 8   Global Step: 111250   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:20:33,482-Speed 3337.85 samples/sec   Loss 4.8396   LearningRate 0.0305   Epoch: 8   Global Step: 111260   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:20:36,586-Speed 3300.37 samples/sec   Loss 4.8864   LearningRate 0.0305   Epoch: 8   Global Step: 111270   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:20:39,693-Speed 3296.55 samples/sec   Loss 4.7759   LearningRate 0.0305   Epoch: 8   Global Step: 111280   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:20:42,806-Speed 3290.61 samples/sec   Loss 4.9399   LearningRate 0.0305   Epoch: 8   Global Step: 111290   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:20:45,881-Speed 3331.08 samples/sec   Loss 4.8677   LearningRate 0.0305   Epoch: 8   Global Step: 111300   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:20:48,980-Speed 3305.55 samples/sec   Loss 4.8813   LearningRate 0.0305   Epoch: 8   Global Step: 111310   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:20:52,047-Speed 3339.71 samples/sec   Loss 4.7510   LearningRate 0.0305   Epoch: 8   Global Step: 111320   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:20:55,202-Speed 3246.32 samples/sec   Loss 4.8894   LearningRate 0.0305   Epoch: 8   Global Step: 111330   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:20:58,270-Speed 3338.65 samples/sec   Loss 4.8954   LearningRate 0.0304   Epoch: 8   Global Step: 111340   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:21:01,349-Speed 3328.83 samples/sec   Loss 4.9992   LearningRate 0.0304   Epoch: 8   Global Step: 111350   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:21:04,454-Speed 3299.40 samples/sec   Loss 4.8144   LearningRate 0.0304   Epoch: 8   Global Step: 111360   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:21:07,557-Speed 3301.40 samples/sec   Loss 4.7974   LearningRate 0.0304   Epoch: 8   Global Step: 111370   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:21:10,609-Speed 3355.66 samples/sec   Loss 4.8407   LearningRate 0.0304   Epoch: 8   Global Step: 111380   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:21:13,737-Speed 3274.79 samples/sec   Loss 4.9411   LearningRate 0.0304   Epoch: 8   Global Step: 111390   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:21:16,831-Speed 3311.19 samples/sec   Loss 4.8349   LearningRate 0.0304   Epoch: 8   Global Step: 111400   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:21:19,913-Speed 3322.88 samples/sec   Loss 4.8716   LearningRate 0.0304   Epoch: 8   Global Step: 111410   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:21:22,978-Speed 3342.02 samples/sec   Loss 4.8353   LearningRate 0.0304   Epoch: 8   Global Step: 111420   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:21:26,091-Speed 3291.18 samples/sec   Loss 4.8993   LearningRate 0.0304   Epoch: 8   Global Step: 111430   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:21:29,176-Speed 3319.84 samples/sec   Loss 4.7774   LearningRate 0.0304   Epoch: 8   Global Step: 111440   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:21:32,302-Speed 3277.25 samples/sec   Loss 4.8774   LearningRate 0.0304   Epoch: 8   Global Step: 111450   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:21:35,441-Speed 3263.36 samples/sec   Loss 4.9025   LearningRate 0.0304   Epoch: 8   Global Step: 111460   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:21:38,542-Speed 3303.41 samples/sec   Loss 4.9497   LearningRate 0.0304   Epoch: 8   Global Step: 111470   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:21:41,678-Speed 3266.04 samples/sec   Loss 4.9077   LearningRate 0.0304   Epoch: 8   Global Step: 111480   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:21:44,748-Speed 3336.20 samples/sec   Loss 4.8924   LearningRate 0.0304   Epoch: 8   Global Step: 111490   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:21:47,810-Speed 3345.56 samples/sec   Loss 4.8325   LearningRate 0.0304   Epoch: 8   Global Step: 111500   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:21:50,893-Speed 3322.24 samples/sec   Loss 4.9085   LearningRate 0.0304   Epoch: 8   Global Step: 111510   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:21:54,011-Speed 3285.06 samples/sec   Loss 4.8429   LearningRate 0.0304   Epoch: 8   Global Step: 111520   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:21:57,140-Speed 3273.40 samples/sec   Loss 4.8908   LearningRate 0.0304   Epoch: 8   Global Step: 111530   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:22:00,226-Speed 3320.03 samples/sec   Loss 4.7954   LearningRate 0.0304   Epoch: 8   Global Step: 111540   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:22:03,338-Speed 3290.58 samples/sec   Loss 4.8789   LearningRate 0.0304   Epoch: 8   Global Step: 111550   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:22:06,421-Speed 3323.13 samples/sec   Loss 4.8727   LearningRate 0.0304   Epoch: 8   Global Step: 111560   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:22:09,535-Speed 3289.28 samples/sec   Loss 4.8084   LearningRate 0.0303   Epoch: 8   Global Step: 111570   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:22:12,658-Speed 3279.38 samples/sec   Loss 4.8583   LearningRate 0.0303   Epoch: 8   Global Step: 111580   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:22:15,757-Speed 3305.66 samples/sec   Loss 4.9671   LearningRate 0.0303   Epoch: 8   Global Step: 111590   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:22:18,857-Speed 3304.30 samples/sec   Loss 4.7442   LearningRate 0.0303   Epoch: 8   Global Step: 111600   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:22:21,961-Speed 3299.54 samples/sec   Loss 4.8057   LearningRate 0.0303   Epoch: 8   Global Step: 111610   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:22:25,060-Speed 3305.52 samples/sec   Loss 4.7932   LearningRate 0.0303   Epoch: 8   Global Step: 111620   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:22:28,178-Speed 3284.77 samples/sec   Loss 4.7099   LearningRate 0.0303   Epoch: 8   Global Step: 111630   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:22:31,299-Speed 3282.77 samples/sec   Loss 4.9246   LearningRate 0.0303   Epoch: 8   Global Step: 111640   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:22:34,417-Speed 3284.96 samples/sec   Loss 4.9088   LearningRate 0.0303   Epoch: 8   Global Step: 111650   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:22:37,526-Speed 3294.99 samples/sec   Loss 4.8242   LearningRate 0.0303   Epoch: 8   Global Step: 111660   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:22:40,614-Speed 3317.17 samples/sec   Loss 4.9965   LearningRate 0.0303   Epoch: 8   Global Step: 111670   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:22:43,706-Speed 3313.11 samples/sec   Loss 4.7630   LearningRate 0.0303   Epoch: 8   Global Step: 111680   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:22:46,818-Speed 3291.20 samples/sec   Loss 4.8283   LearningRate 0.0303   Epoch: 8   Global Step: 111690   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:22:49,907-Speed 3315.51 samples/sec   Loss 4.8525   LearningRate 0.0303   Epoch: 8   Global Step: 111700   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:22:53,019-Speed 3291.32 samples/sec   Loss 4.8473   LearningRate 0.0303   Epoch: 8   Global Step: 111710   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:22:56,110-Speed 3314.15 samples/sec   Loss 4.8267   LearningRate 0.0303   Epoch: 8   Global Step: 111720   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:22:59,237-Speed 3276.42 samples/sec   Loss 4.8429   LearningRate 0.0303   Epoch: 8   Global Step: 111730   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:23:02,411-Speed 3226.13 samples/sec   Loss 4.8057   LearningRate 0.0303   Epoch: 8   Global Step: 111740   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:23:05,554-Speed 3259.97 samples/sec   Loss 4.7319   LearningRate 0.0303   Epoch: 8   Global Step: 111750   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:23:08,656-Speed 3301.49 samples/sec   Loss 4.8529   LearningRate 0.0303   Epoch: 8   Global Step: 111760   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:23:11,740-Speed 3321.44 samples/sec   Loss 4.7930   LearningRate 0.0303   Epoch: 8   Global Step: 111770   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:23:15,044-Speed 3100.51 samples/sec   Loss 4.9289   LearningRate 0.0303   Epoch: 8   Global Step: 111780   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:23:46,372-Speed 326.87 samples/sec   Loss 4.7892   LearningRate 0.0302   Epoch: 9   Global Step: 111790   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:23:49,871-Speed 2927.71 samples/sec   Loss 3.5898   LearningRate 0.0302   Epoch: 9   Global Step: 111800   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:23:53,055-Speed 3218.04 samples/sec   Loss 3.6624   LearningRate 0.0302   Epoch: 9   Global Step: 111810   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:23:56,142-Speed 3318.83 samples/sec   Loss 3.5538   LearningRate 0.0302   Epoch: 9   Global Step: 111820   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:23:59,229-Speed 3317.87 samples/sec   Loss 3.6540   LearningRate 0.0302   Epoch: 9   Global Step: 111830   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:24:02,381-Speed 3250.21 samples/sec   Loss 3.6090   LearningRate 0.0302   Epoch: 9   Global Step: 111840   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:24:05,715-Speed 3071.89 samples/sec   Loss 3.7103   LearningRate 0.0302   Epoch: 9   Global Step: 111850   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:24:08,814-Speed 3305.39 samples/sec   Loss 3.6201   LearningRate 0.0302   Epoch: 9   Global Step: 111860   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:24:12,053-Speed 3162.97 samples/sec   Loss 3.4516   LearningRate 0.0302   Epoch: 9   Global Step: 111870   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:24:15,227-Speed 3226.78 samples/sec   Loss 3.6424   LearningRate 0.0302   Epoch: 9   Global Step: 111880   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:24:18,380-Speed 3248.91 samples/sec   Loss 3.6586   LearningRate 0.0302   Epoch: 9   Global Step: 111890   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:24:21,449-Speed 3337.26 samples/sec   Loss 3.6319   LearningRate 0.0302   Epoch: 9   Global Step: 111900   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:24:24,610-Speed 3241.14 samples/sec   Loss 3.6270   LearningRate 0.0302   Epoch: 9   Global Step: 111910   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:24:27,692-Speed 3323.18 samples/sec   Loss 3.5789   LearningRate 0.0302   Epoch: 9   Global Step: 111920   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:24:30,781-Speed 3316.81 samples/sec   Loss 3.6072   LearningRate 0.0302   Epoch: 9   Global Step: 111930   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:24:33,888-Speed 3296.27 samples/sec   Loss 3.4913   LearningRate 0.0302   Epoch: 9   Global Step: 111940   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:24:37,001-Speed 3289.95 samples/sec   Loss 3.6041   LearningRate 0.0302   Epoch: 9   Global Step: 111950   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:24:40,082-Speed 3324.95 samples/sec   Loss 3.5902   LearningRate 0.0302   Epoch: 9   Global Step: 111960   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:24:43,224-Speed 3260.18 samples/sec   Loss 3.5982   LearningRate 0.0302   Epoch: 9   Global Step: 111970   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:24:46,296-Speed 3334.78 samples/sec   Loss 3.6634   LearningRate 0.0302   Epoch: 9   Global Step: 111980   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:24:49,505-Speed 3191.87 samples/sec   Loss 3.5467   LearningRate 0.0302   Epoch: 9   Global Step: 111990   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:24:52,570-Speed 3341.37 samples/sec   Loss 3.5299   LearningRate 0.0302   Epoch: 9   Global Step: 112000   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:24:55,699-Speed 3273.67 samples/sec   Loss 3.6116   LearningRate 0.0302   Epoch: 9   Global Step: 112010   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:24:58,809-Speed 3294.24 samples/sec   Loss 3.6673   LearningRate 0.0301   Epoch: 9   Global Step: 112020   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:25:01,942-Speed 3268.90 samples/sec   Loss 3.5575   LearningRate 0.0301   Epoch: 9   Global Step: 112030   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:25:05,065-Speed 3279.56 samples/sec   Loss 3.7011   LearningRate 0.0301   Epoch: 9   Global Step: 112040   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:25:08,146-Speed 3325.06 samples/sec   Loss 3.5918   LearningRate 0.0301   Epoch: 9   Global Step: 112050   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:25:11,248-Speed 3302.28 samples/sec   Loss 3.6412   LearningRate 0.0301   Epoch: 9   Global Step: 112060   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:25:14,398-Speed 3252.29 samples/sec   Loss 3.6569   LearningRate 0.0301   Epoch: 9   Global Step: 112070   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:25:17,469-Speed 3335.18 samples/sec   Loss 3.6177   LearningRate 0.0301   Epoch: 9   Global Step: 112080   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:25:20,530-Speed 3345.73 samples/sec   Loss 3.7815   LearningRate 0.0301   Epoch: 9   Global Step: 112090   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:25:23,604-Speed 3332.46 samples/sec   Loss 3.7385   LearningRate 0.0301   Epoch: 9   Global Step: 112100   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:25:26,681-Speed 3329.49 samples/sec   Loss 3.6278   LearningRate 0.0301   Epoch: 9   Global Step: 112110   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:25:29,801-Speed 3282.80 samples/sec   Loss 3.6212   LearningRate 0.0301   Epoch: 9   Global Step: 112120   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:25:32,913-Speed 3293.90 samples/sec   Loss 3.6330   LearningRate 0.0301   Epoch: 9   Global Step: 112130   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:25:36,035-Speed 3281.30 samples/sec   Loss 3.7392   LearningRate 0.0301   Epoch: 9   Global Step: 112140   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:25:39,138-Speed 3301.07 samples/sec   Loss 3.5948   LearningRate 0.0301   Epoch: 9   Global Step: 112150   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:25:42,276-Speed 3263.92 samples/sec   Loss 3.7531   LearningRate 0.0301   Epoch: 9   Global Step: 112160   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:25:45,338-Speed 3345.26 samples/sec   Loss 3.6484   LearningRate 0.0301   Epoch: 9   Global Step: 112170   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:25:48,367-Speed 3382.25 samples/sec   Loss 3.7776   LearningRate 0.0301   Epoch: 9   Global Step: 112180   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:25:51,487-Speed 3282.99 samples/sec   Loss 3.6300   LearningRate 0.0301   Epoch: 9   Global Step: 112190   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:25:54,636-Speed 3253.12 samples/sec   Loss 3.6549   LearningRate 0.0301   Epoch: 9   Global Step: 112200   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:25:57,690-Speed 3354.51 samples/sec   Loss 3.7363   LearningRate 0.0301   Epoch: 9   Global Step: 112210   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:26:00,797-Speed 3296.28 samples/sec   Loss 3.8035   LearningRate 0.0301   Epoch: 9   Global Step: 112220   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:26:03,868-Speed 3335.76 samples/sec   Loss 3.7230   LearningRate 0.0301   Epoch: 9   Global Step: 112230   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:26:06,954-Speed 3319.04 samples/sec   Loss 3.7054   LearningRate 0.0301   Epoch: 9   Global Step: 112240   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:26:10,014-Speed 3347.31 samples/sec   Loss 3.6187   LearningRate 0.0300   Epoch: 9   Global Step: 112250   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:26:13,141-Speed 3275.71 samples/sec   Loss 3.6766   LearningRate 0.0300   Epoch: 9   Global Step: 112260   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:26:16,197-Speed 3352.37 samples/sec   Loss 3.8018   LearningRate 0.0300   Epoch: 9   Global Step: 112270   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:26:19,246-Speed 3360.24 samples/sec   Loss 3.7643   LearningRate 0.0300   Epoch: 9   Global Step: 112280   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:26:22,301-Speed 3352.21 samples/sec   Loss 3.6792   LearningRate 0.0300   Epoch: 9   Global Step: 112290   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:26:25,405-Speed 3300.59 samples/sec   Loss 3.7004   LearningRate 0.0300   Epoch: 9   Global Step: 112300   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:26:28,576-Speed 3230.35 samples/sec   Loss 3.7032   LearningRate 0.0300   Epoch: 9   Global Step: 112310   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:26:31,707-Speed 3271.76 samples/sec   Loss 3.6839   LearningRate 0.0300   Epoch: 9   Global Step: 112320   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:26:34,784-Speed 3328.14 samples/sec   Loss 3.6765   LearningRate 0.0300   Epoch: 9   Global Step: 112330   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:26:37,861-Speed 3329.77 samples/sec   Loss 3.7271   LearningRate 0.0300   Epoch: 9   Global Step: 112340   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:26:40,971-Speed 3293.26 samples/sec   Loss 3.6695   LearningRate 0.0300   Epoch: 9   Global Step: 112350   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:26:44,046-Speed 3330.98 samples/sec   Loss 3.7807   LearningRate 0.0300   Epoch: 9   Global Step: 112360   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:26:47,149-Speed 3301.21 samples/sec   Loss 3.7834   LearningRate 0.0300   Epoch: 9   Global Step: 112370   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:26:50,255-Speed 3298.11 samples/sec   Loss 3.6717   LearningRate 0.0300   Epoch: 9   Global Step: 112380   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:26:53,333-Speed 3327.53 samples/sec   Loss 3.7490   LearningRate 0.0300   Epoch: 9   Global Step: 112390   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:26:56,422-Speed 3316.51 samples/sec   Loss 3.7006   LearningRate 0.0300   Epoch: 9   Global Step: 112400   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:26:59,483-Speed 3345.85 samples/sec   Loss 3.7837   LearningRate 0.0300   Epoch: 9   Global Step: 112410   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:27:02,639-Speed 3245.70 samples/sec   Loss 3.7294   LearningRate 0.0300   Epoch: 9   Global Step: 112420   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:27:05,802-Speed 3238.42 samples/sec   Loss 3.7013   LearningRate 0.0300   Epoch: 9   Global Step: 112430   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:27:08,911-Speed 3295.43 samples/sec   Loss 3.7547   LearningRate 0.0300   Epoch: 9   Global Step: 112440   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:27:12,009-Speed 3305.90 samples/sec   Loss 3.8122   LearningRate 0.0300   Epoch: 9   Global Step: 112450   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:27:15,181-Speed 3229.25 samples/sec   Loss 3.7609   LearningRate 0.0300   Epoch: 9   Global Step: 112460   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:27:18,364-Speed 3218.19 samples/sec   Loss 3.8248   LearningRate 0.0299   Epoch: 9   Global Step: 112470   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:27:21,438-Speed 3332.64 samples/sec   Loss 3.7349   LearningRate 0.0299   Epoch: 9   Global Step: 112480   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:27:24,507-Speed 3338.03 samples/sec   Loss 3.8082   LearningRate 0.0299   Epoch: 9   Global Step: 112490   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:27:27,615-Speed 3295.64 samples/sec   Loss 3.7977   LearningRate 0.0299   Epoch: 9   Global Step: 112500   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:27:30,665-Speed 3357.98 samples/sec   Loss 3.7189   LearningRate 0.0299   Epoch: 9   Global Step: 112510   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:27:33,786-Speed 3282.64 samples/sec   Loss 3.7214   LearningRate 0.0299   Epoch: 9   Global Step: 112520   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:27:36,943-Speed 3244.44 samples/sec   Loss 3.7197   LearningRate 0.0299   Epoch: 9   Global Step: 112530   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:27:40,090-Speed 3254.27 samples/sec   Loss 3.8397   LearningRate 0.0299   Epoch: 9   Global Step: 112540   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:27:43,204-Speed 3290.40 samples/sec   Loss 3.8423   LearningRate 0.0299   Epoch: 9   Global Step: 112550   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:27:46,325-Speed 3281.83 samples/sec   Loss 3.8114   LearningRate 0.0299   Epoch: 9   Global Step: 112560   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:27:49,416-Speed 3313.58 samples/sec   Loss 3.7632   LearningRate 0.0299   Epoch: 9   Global Step: 112570   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:27:52,588-Speed 3229.48 samples/sec   Loss 3.8106   LearningRate 0.0299   Epoch: 9   Global Step: 112580   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:27:55,671-Speed 3322.27 samples/sec   Loss 3.7790   LearningRate 0.0299   Epoch: 9   Global Step: 112590   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:27:58,732-Speed 3346.64 samples/sec   Loss 3.7684   LearningRate 0.0299   Epoch: 9   Global Step: 112600   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:28:01,792-Speed 3347.23 samples/sec   Loss 3.7950   LearningRate 0.0299   Epoch: 9   Global Step: 112610   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:28:04,846-Speed 3354.29 samples/sec   Loss 3.7842   LearningRate 0.0299   Epoch: 9   Global Step: 112620   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:28:07,909-Speed 3344.85 samples/sec   Loss 3.7781   LearningRate 0.0299   Epoch: 9   Global Step: 112630   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:28:10,982-Speed 3332.85 samples/sec   Loss 3.7829   LearningRate 0.0299   Epoch: 9   Global Step: 112640   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:28:14,146-Speed 3236.99 samples/sec   Loss 3.6777   LearningRate 0.0299   Epoch: 9   Global Step: 112650   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:28:17,283-Speed 3265.14 samples/sec   Loss 3.7862   LearningRate 0.0299   Epoch: 9   Global Step: 112660   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:28:20,385-Speed 3302.72 samples/sec   Loss 3.8396   LearningRate 0.0299   Epoch: 9   Global Step: 112670   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:28:23,500-Speed 3288.24 samples/sec   Loss 3.8416   LearningRate 0.0299   Epoch: 9   Global Step: 112680   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:28:26,637-Speed 3265.11 samples/sec   Loss 3.7523   LearningRate 0.0299   Epoch: 9   Global Step: 112690   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:28:29,811-Speed 3227.77 samples/sec   Loss 3.7861   LearningRate 0.0298   Epoch: 9   Global Step: 112700   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:28:32,868-Speed 3350.10 samples/sec   Loss 3.8585   LearningRate 0.0298   Epoch: 9   Global Step: 112710   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:28:35,929-Speed 3347.01 samples/sec   Loss 3.7941   LearningRate 0.0298   Epoch: 9   Global Step: 112720   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:28:39,065-Speed 3266.01 samples/sec   Loss 3.8224   LearningRate 0.0298   Epoch: 9   Global Step: 112730   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:28:42,199-Speed 3268.52 samples/sec   Loss 3.8599   LearningRate 0.0298   Epoch: 9   Global Step: 112740   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:28:45,265-Speed 3340.26 samples/sec   Loss 3.8355   LearningRate 0.0298   Epoch: 9   Global Step: 112750   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:28:48,377-Speed 3291.76 samples/sec   Loss 3.8869   LearningRate 0.0298   Epoch: 9   Global Step: 112760   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:28:51,461-Speed 3322.01 samples/sec   Loss 3.8643   LearningRate 0.0298   Epoch: 9   Global Step: 112770   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:28:54,529-Speed 3338.90 samples/sec   Loss 3.7841   LearningRate 0.0298   Epoch: 9   Global Step: 112780   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:28:57,620-Speed 3314.06 samples/sec   Loss 3.8220   LearningRate 0.0298   Epoch: 9   Global Step: 112790   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:29:00,767-Speed 3254.93 samples/sec   Loss 3.8389   LearningRate 0.0298   Epoch: 9   Global Step: 112800   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:29:03,900-Speed 3269.38 samples/sec   Loss 3.8847   LearningRate 0.0298   Epoch: 9   Global Step: 112810   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:29:07,038-Speed 3264.49 samples/sec   Loss 3.8086   LearningRate 0.0298   Epoch: 9   Global Step: 112820   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:29:10,108-Speed 3336.04 samples/sec   Loss 3.7660   LearningRate 0.0298   Epoch: 9   Global Step: 112830   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:29:13,205-Speed 3306.92 samples/sec   Loss 3.7873   LearningRate 0.0298   Epoch: 9   Global Step: 112840   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:29:16,266-Speed 3346.85 samples/sec   Loss 3.8721   LearningRate 0.0298   Epoch: 9   Global Step: 112850   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:29:19,364-Speed 3306.71 samples/sec   Loss 3.8349   LearningRate 0.0298   Epoch: 9   Global Step: 112860   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:29:22,415-Speed 3357.32 samples/sec   Loss 3.8197   LearningRate 0.0298   Epoch: 9   Global Step: 112870   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:29:25,521-Speed 3297.61 samples/sec   Loss 3.8601   LearningRate 0.0298   Epoch: 9   Global Step: 112880   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:29:28,700-Speed 3221.79 samples/sec   Loss 3.8202   LearningRate 0.0298   Epoch: 9   Global Step: 112890   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:29:31,804-Speed 3300.59 samples/sec   Loss 3.7228   LearningRate 0.0298   Epoch: 9   Global Step: 112900   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:29:34,890-Speed 3319.10 samples/sec   Loss 3.8872   LearningRate 0.0298   Epoch: 9   Global Step: 112910   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:29:37,982-Speed 3313.20 samples/sec   Loss 3.9478   LearningRate 0.0298   Epoch: 9   Global Step: 112920   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:29:41,066-Speed 3320.40 samples/sec   Loss 3.8567   LearningRate 0.0297   Epoch: 9   Global Step: 112930   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:29:44,155-Speed 3316.28 samples/sec   Loss 3.8311   LearningRate 0.0297   Epoch: 9   Global Step: 112940   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:29:47,242-Speed 3318.74 samples/sec   Loss 3.8875   LearningRate 0.0297   Epoch: 9   Global Step: 112950   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:29:50,303-Speed 3346.30 samples/sec   Loss 3.8188   LearningRate 0.0297   Epoch: 9   Global Step: 112960   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:29:53,525-Speed 3178.58 samples/sec   Loss 3.9297   LearningRate 0.0297   Epoch: 9   Global Step: 112970   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:29:56,618-Speed 3312.13 samples/sec   Loss 3.9085   LearningRate 0.0297   Epoch: 9   Global Step: 112980   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-04-27 11:29:59,663-Speed 3364.58 samples/sec   Loss 3.8366   LearningRate 0.0297   Epoch: 9   Global Step: 112990   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:30:02,808-Speed 3256.67 samples/sec   Loss 3.8092   LearningRate 0.0297   Epoch: 9   Global Step: 113000   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:30:05,870-Speed 3345.79 samples/sec   Loss 3.8690   LearningRate 0.0297   Epoch: 9   Global Step: 113010   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:30:08,941-Speed 3334.62 samples/sec   Loss 3.9181   LearningRate 0.0297   Epoch: 9   Global Step: 113020   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:30:12,006-Speed 3341.70 samples/sec   Loss 3.8884   LearningRate 0.0297   Epoch: 9   Global Step: 113030   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:30:15,089-Speed 3322.57 samples/sec   Loss 3.8316   LearningRate 0.0297   Epoch: 9   Global Step: 113040   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:30:18,200-Speed 3293.46 samples/sec   Loss 3.9288   LearningRate 0.0297   Epoch: 9   Global Step: 113050   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:30:21,260-Speed 3347.08 samples/sec   Loss 3.8806   LearningRate 0.0297   Epoch: 9   Global Step: 113060   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:30:24,330-Speed 3336.03 samples/sec   Loss 3.8828   LearningRate 0.0297   Epoch: 9   Global Step: 113070   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:30:27,386-Speed 3351.63 samples/sec   Loss 3.8500   LearningRate 0.0297   Epoch: 9   Global Step: 113080   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:30:30,524-Speed 3264.24 samples/sec   Loss 3.8493   LearningRate 0.0297   Epoch: 9   Global Step: 113090   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:30:33,606-Speed 3323.80 samples/sec   Loss 3.9139   LearningRate 0.0297   Epoch: 9   Global Step: 113100   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:30:36,675-Speed 3338.31 samples/sec   Loss 3.9044   LearningRate 0.0297   Epoch: 9   Global Step: 113110   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:30:39,775-Speed 3303.29 samples/sec   Loss 3.8866   LearningRate 0.0297   Epoch: 9   Global Step: 113120   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:30:42,841-Speed 3341.12 samples/sec   Loss 3.8405   LearningRate 0.0297   Epoch: 9   Global Step: 113130   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:30:45,876-Speed 3374.92 samples/sec   Loss 3.8983   LearningRate 0.0297   Epoch: 9   Global Step: 113140   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:30:48,983-Speed 3296.51 samples/sec   Loss 3.8650   LearningRate 0.0297   Epoch: 9   Global Step: 113150   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:30:52,187-Speed 3197.04 samples/sec   Loss 3.8462   LearningRate 0.0296   Epoch: 9   Global Step: 113160   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:30:55,295-Speed 3296.19 samples/sec   Loss 3.8440   LearningRate 0.0296   Epoch: 9   Global Step: 113170   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:30:58,401-Speed 3297.61 samples/sec   Loss 3.8564   LearningRate 0.0296   Epoch: 9   Global Step: 113180   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:31:01,492-Speed 3313.81 samples/sec   Loss 3.7704   LearningRate 0.0296   Epoch: 9   Global Step: 113190   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:31:04,571-Speed 3327.34 samples/sec   Loss 3.9235   LearningRate 0.0296   Epoch: 9   Global Step: 113200   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:31:07,697-Speed 3276.78 samples/sec   Loss 3.9881   LearningRate 0.0296   Epoch: 9   Global Step: 113210   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:31:10,764-Speed 3340.04 samples/sec   Loss 3.9455   LearningRate 0.0296   Epoch: 9   Global Step: 113220   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:31:13,877-Speed 3290.03 samples/sec   Loss 3.7813   LearningRate 0.0296   Epoch: 9   Global Step: 113230   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:31:16,991-Speed 3288.93 samples/sec   Loss 3.8930   LearningRate 0.0296   Epoch: 9   Global Step: 113240   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:31:20,108-Speed 3286.36 samples/sec   Loss 3.8944   LearningRate 0.0296   Epoch: 9   Global Step: 113250   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:31:23,222-Speed 3289.43 samples/sec   Loss 3.9409   LearningRate 0.0296   Epoch: 9   Global Step: 113260   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:31:26,278-Speed 3352.34 samples/sec   Loss 3.9928   LearningRate 0.0296   Epoch: 9   Global Step: 113270   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:31:29,356-Speed 3328.35 samples/sec   Loss 3.8617   LearningRate 0.0296   Epoch: 9   Global Step: 113280   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:31:32,500-Speed 3257.32 samples/sec   Loss 3.9703   LearningRate 0.0296   Epoch: 9   Global Step: 113290   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:31:35,601-Speed 3304.06 samples/sec   Loss 3.9345   LearningRate 0.0296   Epoch: 9   Global Step: 113300   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:31:38,742-Speed 3260.27 samples/sec   Loss 3.9766   LearningRate 0.0296   Epoch: 9   Global Step: 113310   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:31:41,855-Speed 3291.07 samples/sec   Loss 3.8621   LearningRate 0.0296   Epoch: 9   Global Step: 113320   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:31:44,910-Speed 3352.23 samples/sec   Loss 3.8696   LearningRate 0.0296   Epoch: 9   Global Step: 113330   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:31:48,000-Speed 3316.31 samples/sec   Loss 3.8800   LearningRate 0.0296   Epoch: 9   Global Step: 113340   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:31:51,082-Speed 3323.44 samples/sec   Loss 3.8938   LearningRate 0.0296   Epoch: 9   Global Step: 113350   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:31:54,223-Speed 3260.95 samples/sec   Loss 3.9559   LearningRate 0.0296   Epoch: 9   Global Step: 113360   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:31:57,317-Speed 3309.79 samples/sec   Loss 3.8660   LearningRate 0.0296   Epoch: 9   Global Step: 113370   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:32:00,425-Speed 3296.48 samples/sec   Loss 3.9238   LearningRate 0.0295   Epoch: 9   Global Step: 113380   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:32:03,596-Speed 3229.61 samples/sec   Loss 3.7605   LearningRate 0.0295   Epoch: 9   Global Step: 113390   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:32:06,661-Speed 3342.42 samples/sec   Loss 3.8405   LearningRate 0.0295   Epoch: 9   Global Step: 113400   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:32:09,762-Speed 3303.60 samples/sec   Loss 3.8419   LearningRate 0.0295   Epoch: 9   Global Step: 113410   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:32:12,850-Speed 3317.33 samples/sec   Loss 3.8581   LearningRate 0.0295   Epoch: 9   Global Step: 113420   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:32:15,937-Speed 3318.40 samples/sec   Loss 3.9257   LearningRate 0.0295   Epoch: 9   Global Step: 113430   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:32:19,053-Speed 3287.10 samples/sec   Loss 3.9270   LearningRate 0.0295   Epoch: 9   Global Step: 113440   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:32:22,131-Speed 3328.46 samples/sec   Loss 3.8921   LearningRate 0.0295   Epoch: 9   Global Step: 113450   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:32:25,252-Speed 3281.90 samples/sec   Loss 3.8545   LearningRate 0.0295   Epoch: 9   Global Step: 113460   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:32:28,309-Speed 3350.80 samples/sec   Loss 3.8389   LearningRate 0.0295   Epoch: 9   Global Step: 113470   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:32:31,395-Speed 3319.45 samples/sec   Loss 4.0339   LearningRate 0.0295   Epoch: 9   Global Step: 113480   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:32:34,466-Speed 3335.28 samples/sec   Loss 3.9416   LearningRate 0.0295   Epoch: 9   Global Step: 113490   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:32:37,641-Speed 3226.10 samples/sec   Loss 4.0512   LearningRate 0.0295   Epoch: 9   Global Step: 113500   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:32:40,760-Speed 3283.77 samples/sec   Loss 3.9086   LearningRate 0.0295   Epoch: 9   Global Step: 113510   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:32:43,919-Speed 3242.64 samples/sec   Loss 3.9838   LearningRate 0.0295   Epoch: 9   Global Step: 113520   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:32:47,026-Speed 3297.81 samples/sec   Loss 3.9209   LearningRate 0.0295   Epoch: 9   Global Step: 113530   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:32:50,086-Speed 3346.35 samples/sec   Loss 3.9170   LearningRate 0.0295   Epoch: 9   Global Step: 113540   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:32:53,146-Speed 3348.34 samples/sec   Loss 3.8735   LearningRate 0.0295   Epoch: 9   Global Step: 113550   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:32:56,241-Speed 3309.60 samples/sec   Loss 3.8640   LearningRate 0.0295   Epoch: 9   Global Step: 113560   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:32:59,349-Speed 3295.58 samples/sec   Loss 3.9839   LearningRate 0.0295   Epoch: 9   Global Step: 113570   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:33:02,434-Speed 3319.97 samples/sec   Loss 3.9637   LearningRate 0.0295   Epoch: 9   Global Step: 113580   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:33:05,493-Speed 3348.31 samples/sec   Loss 3.9469   LearningRate 0.0295   Epoch: 9   Global Step: 113590   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:33:08,557-Speed 3343.22 samples/sec   Loss 3.9493   LearningRate 0.0295   Epoch: 9   Global Step: 113600   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:33:11,608-Speed 3357.79 samples/sec   Loss 3.9040   LearningRate 0.0294   Epoch: 9   Global Step: 113610   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:33:14,671-Speed 3344.85 samples/sec   Loss 3.9920   LearningRate 0.0294   Epoch: 9   Global Step: 113620   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:33:17,803-Speed 3270.71 samples/sec   Loss 3.9555   LearningRate 0.0294   Epoch: 9   Global Step: 113630   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:33:20,879-Speed 3329.56 samples/sec   Loss 3.9185   LearningRate 0.0294   Epoch: 9   Global Step: 113640   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:33:23,985-Speed 3297.30 samples/sec   Loss 3.9254   LearningRate 0.0294   Epoch: 9   Global Step: 113650   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:33:27,048-Speed 3344.35 samples/sec   Loss 4.0611   LearningRate 0.0294   Epoch: 9   Global Step: 113660   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:33:30,142-Speed 3311.26 samples/sec   Loss 3.9068   LearningRate 0.0294   Epoch: 9   Global Step: 113670   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:33:33,325-Speed 3217.89 samples/sec   Loss 4.0222   LearningRate 0.0294   Epoch: 9   Global Step: 113680   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:33:36,508-Speed 3218.39 samples/sec   Loss 3.9687   LearningRate 0.0294   Epoch: 9   Global Step: 113690   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:33:39,652-Speed 3257.28 samples/sec   Loss 4.0257   LearningRate 0.0294   Epoch: 9   Global Step: 113700   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:33:42,803-Speed 3251.16 samples/sec   Loss 4.0060   LearningRate 0.0294   Epoch: 9   Global Step: 113710   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:33:45,883-Speed 3325.64 samples/sec   Loss 3.9458   LearningRate 0.0294   Epoch: 9   Global Step: 113720   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:33:49,005-Speed 3281.09 samples/sec   Loss 4.0383   LearningRate 0.0294   Epoch: 9   Global Step: 113730   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:33:52,127-Speed 3281.08 samples/sec   Loss 3.9265   LearningRate 0.0294   Epoch: 9   Global Step: 113740   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:33:55,293-Speed 3235.17 samples/sec   Loss 3.9944   LearningRate 0.0294   Epoch: 9   Global Step: 113750   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:33:58,445-Speed 3250.25 samples/sec   Loss 3.9926   LearningRate 0.0294   Epoch: 9   Global Step: 113760   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:34:01,536-Speed 3314.24 samples/sec   Loss 4.0160   LearningRate 0.0294   Epoch: 9   Global Step: 113770   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:34:04,629-Speed 3312.02 samples/sec   Loss 4.0209   LearningRate 0.0294   Epoch: 9   Global Step: 113780   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:34:07,687-Speed 3349.12 samples/sec   Loss 3.9741   LearningRate 0.0294   Epoch: 9   Global Step: 113790   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:34:10,763-Speed 3329.99 samples/sec   Loss 3.9429   LearningRate 0.0294   Epoch: 9   Global Step: 113800   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:34:13,841-Speed 3328.17 samples/sec   Loss 4.0533   LearningRate 0.0294   Epoch: 9   Global Step: 113810   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:34:16,895-Speed 3354.25 samples/sec   Loss 3.9768   LearningRate 0.0294   Epoch: 9   Global Step: 113820   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:34:19,969-Speed 3332.50 samples/sec   Loss 3.9895   LearningRate 0.0294   Epoch: 9   Global Step: 113830   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:34:23,073-Speed 3299.20 samples/sec   Loss 3.9401   LearningRate 0.0293   Epoch: 9   Global Step: 113840   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:34:26,131-Speed 3350.34 samples/sec   Loss 4.0228   LearningRate 0.0293   Epoch: 9   Global Step: 113850   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:34:29,192-Speed 3346.51 samples/sec   Loss 4.0404   LearningRate 0.0293   Epoch: 9   Global Step: 113860   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:34:32,296-Speed 3299.91 samples/sec   Loss 3.9355   LearningRate 0.0293   Epoch: 9   Global Step: 113870   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:34:35,407-Speed 3292.30 samples/sec   Loss 3.9685   LearningRate 0.0293   Epoch: 9   Global Step: 113880   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:34:38,518-Speed 3292.58 samples/sec   Loss 4.0563   LearningRate 0.0293   Epoch: 9   Global Step: 113890   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:34:41,642-Speed 3279.62 samples/sec   Loss 4.0598   LearningRate 0.0293   Epoch: 9   Global Step: 113900   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:34:44,706-Speed 3342.19 samples/sec   Loss 3.9899   LearningRate 0.0293   Epoch: 9   Global Step: 113910   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:34:47,820-Speed 3290.54 samples/sec   Loss 3.9815   LearningRate 0.0293   Epoch: 9   Global Step: 113920   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:34:50,877-Speed 3350.78 samples/sec   Loss 3.9854   LearningRate 0.0293   Epoch: 9   Global Step: 113930   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:34:53,934-Speed 3350.28 samples/sec   Loss 4.0077   LearningRate 0.0293   Epoch: 9   Global Step: 113940   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:34:57,025-Speed 3313.39 samples/sec   Loss 4.1063   LearningRate 0.0293   Epoch: 9   Global Step: 113950   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:35:00,135-Speed 3294.31 samples/sec   Loss 4.0154   LearningRate 0.0293   Epoch: 9   Global Step: 113960   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:35:03,270-Speed 3267.40 samples/sec   Loss 3.9971   LearningRate 0.0293   Epoch: 9   Global Step: 113970   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:35:06,360-Speed 3314.84 samples/sec   Loss 4.0305   LearningRate 0.0293   Epoch: 9   Global Step: 113980   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:35:09,495-Speed 3267.78 samples/sec   Loss 3.9483   LearningRate 0.0293   Epoch: 9   Global Step: 113990   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:35:12,645-Speed 3252.29 samples/sec   Loss 4.0654   LearningRate 0.0293   Epoch: 9   Global Step: 114000   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:35:15,769-Speed 3278.43 samples/sec   Loss 3.9901   LearningRate 0.0293   Epoch: 9   Global Step: 114010   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:35:18,895-Speed 3277.78 samples/sec   Loss 4.0155   LearningRate 0.0293   Epoch: 9   Global Step: 114020   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:35:21,989-Speed 3310.38 samples/sec   Loss 3.9769   LearningRate 0.0293   Epoch: 9   Global Step: 114030   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:35:25,140-Speed 3250.35 samples/sec   Loss 4.0800   LearningRate 0.0293   Epoch: 9   Global Step: 114040   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:35:28,287-Speed 3255.03 samples/sec   Loss 4.0904   LearningRate 0.0293   Epoch: 9   Global Step: 114050   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:35:31,382-Speed 3309.28 samples/sec   Loss 4.0671   LearningRate 0.0293   Epoch: 9   Global Step: 114060   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:35:34,512-Speed 3273.01 samples/sec   Loss 3.9431   LearningRate 0.0292   Epoch: 9   Global Step: 114070   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:35:37,618-Speed 3297.71 samples/sec   Loss 4.0622   LearningRate 0.0292   Epoch: 9   Global Step: 114080   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:35:40,768-Speed 3251.54 samples/sec   Loss 4.0271   LearningRate 0.0292   Epoch: 9   Global Step: 114090   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:35:43,906-Speed 3264.59 samples/sec   Loss 3.9867   LearningRate 0.0292   Epoch: 9   Global Step: 114100   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:35:47,018-Speed 3292.06 samples/sec   Loss 4.0194   LearningRate 0.0292   Epoch: 9   Global Step: 114110   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:35:50,070-Speed 3356.11 samples/sec   Loss 4.1465   LearningRate 0.0292   Epoch: 9   Global Step: 114120   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:35:53,199-Speed 3272.98 samples/sec   Loss 3.9596   LearningRate 0.0292   Epoch: 9   Global Step: 114130   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:35:56,313-Speed 3289.95 samples/sec   Loss 4.1496   LearningRate 0.0292   Epoch: 9   Global Step: 114140   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:35:59,449-Speed 3265.89 samples/sec   Loss 3.9898   LearningRate 0.0292   Epoch: 9   Global Step: 114150   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:36:02,538-Speed 3316.05 samples/sec   Loss 4.0145   LearningRate 0.0292   Epoch: 9   Global Step: 114160   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:36:05,673-Speed 3267.99 samples/sec   Loss 3.9915   LearningRate 0.0292   Epoch: 9   Global Step: 114170   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:36:08,766-Speed 3311.62 samples/sec   Loss 4.0957   LearningRate 0.0292   Epoch: 9   Global Step: 114180   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:36:11,838-Speed 3334.36 samples/sec   Loss 4.0103   LearningRate 0.0292   Epoch: 9   Global Step: 114190   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:36:14,917-Speed 3326.06 samples/sec   Loss 3.8924   LearningRate 0.0292   Epoch: 9   Global Step: 114200   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:36:18,016-Speed 3305.47 samples/sec   Loss 4.0395   LearningRate 0.0292   Epoch: 9   Global Step: 114210   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:36:21,111-Speed 3310.06 samples/sec   Loss 4.0905   LearningRate 0.0292   Epoch: 9   Global Step: 114220   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:36:24,244-Speed 3269.27 samples/sec   Loss 4.0240   LearningRate 0.0292   Epoch: 9   Global Step: 114230   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:36:27,333-Speed 3316.49 samples/sec   Loss 4.1131   LearningRate 0.0292   Epoch: 9   Global Step: 114240   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:36:30,433-Speed 3303.36 samples/sec   Loss 4.0463   LearningRate 0.0292   Epoch: 9   Global Step: 114250   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:36:33,510-Speed 3329.56 samples/sec   Loss 4.1159   LearningRate 0.0292   Epoch: 9   Global Step: 114260   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:36:36,613-Speed 3300.83 samples/sec   Loss 4.0112   LearningRate 0.0292   Epoch: 9   Global Step: 114270   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:36:39,696-Speed 3322.26 samples/sec   Loss 4.0773   LearningRate 0.0292   Epoch: 9   Global Step: 114280   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:36:42,820-Speed 3278.92 samples/sec   Loss 4.0333   LearningRate 0.0292   Epoch: 9   Global Step: 114290   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:36:45,920-Speed 3304.98 samples/sec   Loss 4.0108   LearningRate 0.0291   Epoch: 9   Global Step: 114300   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:36:49,078-Speed 3243.53 samples/sec   Loss 4.0857   LearningRate 0.0291   Epoch: 9   Global Step: 114310   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:36:52,152-Speed 3331.90 samples/sec   Loss 3.9855   LearningRate 0.0291   Epoch: 9   Global Step: 114320   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:36:55,271-Speed 3283.95 samples/sec   Loss 4.0746   LearningRate 0.0291   Epoch: 9   Global Step: 114330   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:36:58,382-Speed 3293.37 samples/sec   Loss 3.9154   LearningRate 0.0291   Epoch: 9   Global Step: 114340   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:37:01,526-Speed 3257.61 samples/sec   Loss 4.0774   LearningRate 0.0291   Epoch: 9   Global Step: 114350   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:37:04,641-Speed 3288.73 samples/sec   Loss 4.0483   LearningRate 0.0291   Epoch: 9   Global Step: 114360   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:37:07,819-Speed 3223.43 samples/sec   Loss 4.0607   LearningRate 0.0291   Epoch: 9   Global Step: 114370   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:37:10,900-Speed 3324.06 samples/sec   Loss 4.1186   LearningRate 0.0291   Epoch: 9   Global Step: 114380   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:37:14,041-Speed 3261.58 samples/sec   Loss 4.0654   LearningRate 0.0291   Epoch: 9   Global Step: 114390   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:37:17,185-Speed 3257.79 samples/sec   Loss 4.0656   LearningRate 0.0291   Epoch: 9   Global Step: 114400   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:37:20,278-Speed 3311.80 samples/sec   Loss 4.0828   LearningRate 0.0291   Epoch: 9   Global Step: 114410   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:37:23,391-Speed 3291.16 samples/sec   Loss 4.0873   LearningRate 0.0291   Epoch: 9   Global Step: 114420   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:37:26,465-Speed 3332.45 samples/sec   Loss 3.9653   LearningRate 0.0291   Epoch: 9   Global Step: 114430   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:37:29,573-Speed 3295.93 samples/sec   Loss 4.1416   LearningRate 0.0291   Epoch: 9   Global Step: 114440   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-04-27 11:37:32,707-Speed 3268.05 samples/sec   Loss 4.1213   LearningRate 0.0291   Epoch: 9   Global Step: 114450   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:37:35,854-Speed 3255.23 samples/sec   Loss 4.0638   LearningRate 0.0291   Epoch: 9   Global Step: 114460   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:37:38,973-Speed 3283.99 samples/sec   Loss 4.1820   LearningRate 0.0291   Epoch: 9   Global Step: 114470   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:37:42,076-Speed 3301.66 samples/sec   Loss 4.0627   LearningRate 0.0291   Epoch: 9   Global Step: 114480   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:37:45,168-Speed 3311.97 samples/sec   Loss 4.0595   LearningRate 0.0291   Epoch: 9   Global Step: 114490   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:37:48,275-Speed 3296.98 samples/sec   Loss 4.0480   LearningRate 0.0291   Epoch: 9   Global Step: 114500   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:37:51,426-Speed 3250.93 samples/sec   Loss 4.0604   LearningRate 0.0291   Epoch: 9   Global Step: 114510   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:37:54,582-Speed 3245.23 samples/sec   Loss 4.1037   LearningRate 0.0291   Epoch: 9   Global Step: 114520   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:37:57,734-Speed 3250.22 samples/sec   Loss 4.0049   LearningRate 0.0290   Epoch: 9   Global Step: 114530   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:38:00,893-Speed 3243.16 samples/sec   Loss 4.1733   LearningRate 0.0290   Epoch: 9   Global Step: 114540   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:38:04,009-Speed 3286.99 samples/sec   Loss 4.1959   LearningRate 0.0290   Epoch: 9   Global Step: 114550   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:38:07,181-Speed 3229.33 samples/sec   Loss 4.0671   LearningRate 0.0290   Epoch: 9   Global Step: 114560   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:38:10,274-Speed 3311.47 samples/sec   Loss 4.1674   LearningRate 0.0290   Epoch: 9   Global Step: 114570   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:38:13,401-Speed 3275.53 samples/sec   Loss 4.1061   LearningRate 0.0290   Epoch: 9   Global Step: 114580   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:38:16,552-Speed 3251.03 samples/sec   Loss 4.1106   LearningRate 0.0290   Epoch: 9   Global Step: 114590   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:38:19,681-Speed 3273.15 samples/sec   Loss 4.1738   LearningRate 0.0290   Epoch: 9   Global Step: 114600   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:38:22,806-Speed 3277.89 samples/sec   Loss 4.0693   LearningRate 0.0290   Epoch: 9   Global Step: 114610   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:38:25,963-Speed 3245.28 samples/sec   Loss 4.0753   LearningRate 0.0290   Epoch: 9   Global Step: 114620   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:38:29,163-Speed 3201.17 samples/sec   Loss 4.1141   LearningRate 0.0290   Epoch: 9   Global Step: 114630   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:38:32,274-Speed 3292.41 samples/sec   Loss 4.0303   LearningRate 0.0290   Epoch: 9   Global Step: 114640   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-04-27 11:38:35,422-Speed 3254.03 samples/sec   Loss 4.0717   LearningRate 0.0290   Epoch: 9   Global Step: 114650   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:38:38,556-Speed 3268.52 samples/sec   Loss 4.0331   LearningRate 0.0290   Epoch: 9   Global Step: 114660   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:38:41,626-Speed 3336.17 samples/sec   Loss 4.0633   LearningRate 0.0290   Epoch: 9   Global Step: 114670   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:38:44,735-Speed 3294.38 samples/sec   Loss 4.0847   LearningRate 0.0290   Epoch: 9   Global Step: 114680   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:38:47,916-Speed 3220.65 samples/sec   Loss 4.1029   LearningRate 0.0290   Epoch: 9   Global Step: 114690   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:38:51,097-Speed 3219.42 samples/sec   Loss 4.0763   LearningRate 0.0290   Epoch: 9   Global Step: 114700   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:38:54,234-Speed 3265.93 samples/sec   Loss 4.0857   LearningRate 0.0290   Epoch: 9   Global Step: 114710   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:38:57,396-Speed 3239.20 samples/sec   Loss 4.1542   LearningRate 0.0290   Epoch: 9   Global Step: 114720   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-04-27 11:39:00,483-Speed 3318.23 samples/sec   Loss 4.0154   LearningRate 0.0290   Epoch: 9   Global Step: 114730   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:39:03,635-Speed 3250.41 samples/sec   Loss 4.1553   LearningRate 0.0290   Epoch: 9   Global Step: 114740   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:39:06,770-Speed 3267.37 samples/sec   Loss 4.1497   LearningRate 0.0290   Epoch: 9   Global Step: 114750   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:39:09,823-Speed 3354.18 samples/sec   Loss 4.0142   LearningRate 0.0289   Epoch: 9   Global Step: 114760   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:39:12,972-Speed 3253.57 samples/sec   Loss 4.1288   LearningRate 0.0289   Epoch: 9   Global Step: 114770   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:39:16,054-Speed 3322.91 samples/sec   Loss 4.1517   LearningRate 0.0289   Epoch: 9   Global Step: 114780   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:39:19,158-Speed 3300.61 samples/sec   Loss 4.1979   LearningRate 0.0289   Epoch: 9   Global Step: 114790   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:39:22,279-Speed 3282.18 samples/sec   Loss 4.0498   LearningRate 0.0289   Epoch: 9   Global Step: 114800   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:39:25,368-Speed 3315.37 samples/sec   Loss 4.0577   LearningRate 0.0289   Epoch: 9   Global Step: 114810   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:39:28,497-Speed 3273.87 samples/sec   Loss 4.0605   LearningRate 0.0289   Epoch: 9   Global Step: 114820   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:39:31,651-Speed 3247.95 samples/sec   Loss 4.1317   LearningRate 0.0289   Epoch: 9   Global Step: 114830   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:39:34,748-Speed 3307.27 samples/sec   Loss 4.0031   LearningRate 0.0289   Epoch: 9   Global Step: 114840   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:39:37,868-Speed 3283.06 samples/sec   Loss 4.1479   LearningRate 0.0289   Epoch: 9   Global Step: 114850   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:39:40,996-Speed 3275.05 samples/sec   Loss 4.1102   LearningRate 0.0289   Epoch: 9   Global Step: 114860   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:39:44,089-Speed 3311.62 samples/sec   Loss 4.1462   LearningRate 0.0289   Epoch: 9   Global Step: 114870   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:39:47,195-Speed 3297.50 samples/sec   Loss 4.1397   LearningRate 0.0289   Epoch: 9   Global Step: 114880   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:39:50,269-Speed 3332.55 samples/sec   Loss 4.1162   LearningRate 0.0289   Epoch: 9   Global Step: 114890   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:39:53,361-Speed 3312.74 samples/sec   Loss 4.0956   LearningRate 0.0289   Epoch: 9   Global Step: 114900   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:39:56,444-Speed 3322.32 samples/sec   Loss 4.0997   LearningRate 0.0289   Epoch: 9   Global Step: 114910   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:39:59,567-Speed 3280.15 samples/sec   Loss 4.0967   LearningRate 0.0289   Epoch: 9   Global Step: 114920   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:40:02,710-Speed 3259.67 samples/sec   Loss 4.1908   LearningRate 0.0289   Epoch: 9   Global Step: 114930   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:40:05,864-Speed 3247.06 samples/sec   Loss 4.1049   LearningRate 0.0289   Epoch: 9   Global Step: 114940   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:40:08,959-Speed 3310.19 samples/sec   Loss 4.1062   LearningRate 0.0289   Epoch: 9   Global Step: 114950   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:40:12,023-Speed 3343.12 samples/sec   Loss 4.1074   LearningRate 0.0289   Epoch: 9   Global Step: 114960   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:40:15,118-Speed 3309.22 samples/sec   Loss 4.1477   LearningRate 0.0289   Epoch: 9   Global Step: 114970   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:40:18,254-Speed 3266.06 samples/sec   Loss 4.1293   LearningRate 0.0289   Epoch: 9   Global Step: 114980   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:40:21,358-Speed 3300.11 samples/sec   Loss 4.0796   LearningRate 0.0288   Epoch: 9   Global Step: 114990   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:40:24,454-Speed 3308.98 samples/sec   Loss 4.0733   LearningRate 0.0288   Epoch: 9   Global Step: 115000   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:40:27,548-Speed 3310.73 samples/sec   Loss 4.2012   LearningRate 0.0288   Epoch: 9   Global Step: 115010   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:40:30,685-Speed 3264.58 samples/sec   Loss 4.2116   LearningRate 0.0288   Epoch: 9   Global Step: 115020   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 11:40:33,749-Speed 3342.73 samples/sec   Loss 4.1884   LearningRate 0.0288   Epoch: 9   Global Step: 115030   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 11:40:36,784-Speed 3376.47 samples/sec   Loss 4.1906   LearningRate 0.0288   Epoch: 9   Global Step: 115040   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:40:39,913-Speed 3272.78 samples/sec   Loss 4.1497   LearningRate 0.0288   Epoch: 9   Global Step: 115050   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:40:42,997-Speed 3322.07 samples/sec   Loss 4.2372   LearningRate 0.0288   Epoch: 9   Global Step: 115060   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:40:46,056-Speed 3348.78 samples/sec   Loss 4.2180   LearningRate 0.0288   Epoch: 9   Global Step: 115070   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:40:49,132-Speed 3330.06 samples/sec   Loss 4.2014   LearningRate 0.0288   Epoch: 9   Global Step: 115080   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:40:52,257-Speed 3277.52 samples/sec   Loss 4.1466   LearningRate 0.0288   Epoch: 9   Global Step: 115090   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:40:55,411-Speed 3248.16 samples/sec   Loss 4.1873   LearningRate 0.0288   Epoch: 9   Global Step: 115100   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:40:58,487-Speed 3329.93 samples/sec   Loss 4.0276   LearningRate 0.0288   Epoch: 9   Global Step: 115110   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:41:01,644-Speed 3244.60 samples/sec   Loss 4.1244   LearningRate 0.0288   Epoch: 9   Global Step: 115120   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:41:04,763-Speed 3283.52 samples/sec   Loss 4.1623   LearningRate 0.0288   Epoch: 9   Global Step: 115130   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:41:07,876-Speed 3291.28 samples/sec   Loss 4.1383   LearningRate 0.0288   Epoch: 9   Global Step: 115140   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 11:41:10,996-Speed 3282.61 samples/sec   Loss 4.0790   LearningRate 0.0288   Epoch: 9   Global Step: 115150   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 11:41:14,150-Speed 3247.37 samples/sec   Loss 4.1568   LearningRate 0.0288   Epoch: 9   Global Step: 115160   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:41:17,361-Speed 3190.62 samples/sec   Loss 4.1714   LearningRate 0.0288   Epoch: 9   Global Step: 115170   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:41:20,492-Speed 3271.25 samples/sec   Loss 4.0597   LearningRate 0.0288   Epoch: 9   Global Step: 115180   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:41:23,595-Speed 3301.42 samples/sec   Loss 4.1880   LearningRate 0.0288   Epoch: 9   Global Step: 115190   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:41:26,719-Speed 3279.04 samples/sec   Loss 4.1693   LearningRate 0.0288   Epoch: 9   Global Step: 115200   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:41:29,860-Speed 3260.58 samples/sec   Loss 4.1646   LearningRate 0.0288   Epoch: 9   Global Step: 115210   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:41:32,955-Speed 3309.05 samples/sec   Loss 4.1595   LearningRate 0.0287   Epoch: 9   Global Step: 115220   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:41:36,084-Speed 3275.54 samples/sec   Loss 4.0828   LearningRate 0.0287   Epoch: 9   Global Step: 115230   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:41:39,212-Speed 3274.49 samples/sec   Loss 4.1999   LearningRate 0.0287   Epoch: 9   Global Step: 115240   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:41:42,328-Speed 3286.82 samples/sec   Loss 4.1514   LearningRate 0.0287   Epoch: 9   Global Step: 115250   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:41:45,421-Speed 3312.76 samples/sec   Loss 4.1930   LearningRate 0.0287   Epoch: 9   Global Step: 115260   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:41:48,505-Speed 3320.42 samples/sec   Loss 4.1726   LearningRate 0.0287   Epoch: 9   Global Step: 115270   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:41:51,674-Speed 3233.12 samples/sec   Loss 4.2318   LearningRate 0.0287   Epoch: 9   Global Step: 115280   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:41:54,837-Speed 3238.07 samples/sec   Loss 4.2205   LearningRate 0.0287   Epoch: 9   Global Step: 115290   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:41:57,944-Speed 3297.04 samples/sec   Loss 4.2011   LearningRate 0.0287   Epoch: 9   Global Step: 115300   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:42:01,037-Speed 3311.32 samples/sec   Loss 4.1455   LearningRate 0.0287   Epoch: 9   Global Step: 115310   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:42:04,242-Speed 3196.27 samples/sec   Loss 4.1808   LearningRate 0.0287   Epoch: 9   Global Step: 115320   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:42:07,353-Speed 3292.68 samples/sec   Loss 4.1276   LearningRate 0.0287   Epoch: 9   Global Step: 115330   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:42:10,414-Speed 3347.01 samples/sec   Loss 4.1139   LearningRate 0.0287   Epoch: 9   Global Step: 115340   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:42:13,525-Speed 3291.78 samples/sec   Loss 4.1720   LearningRate 0.0287   Epoch: 9   Global Step: 115350   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:42:16,660-Speed 3267.53 samples/sec   Loss 4.2175   LearningRate 0.0287   Epoch: 9   Global Step: 115360   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:42:19,836-Speed 3225.04 samples/sec   Loss 4.2212   LearningRate 0.0287   Epoch: 9   Global Step: 115370   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:42:22,928-Speed 3313.16 samples/sec   Loss 4.1571   LearningRate 0.0287   Epoch: 9   Global Step: 115380   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:42:26,055-Speed 3276.25 samples/sec   Loss 4.1830   LearningRate 0.0287   Epoch: 9   Global Step: 115390   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:42:29,151-Speed 3308.65 samples/sec   Loss 4.1588   LearningRate 0.0287   Epoch: 9   Global Step: 115400   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:42:32,349-Speed 3202.09 samples/sec   Loss 4.1354   LearningRate 0.0287   Epoch: 9   Global Step: 115410   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:42:35,458-Speed 3295.62 samples/sec   Loss 4.2161   LearningRate 0.0287   Epoch: 9   Global Step: 115420   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:42:38,534-Speed 3329.07 samples/sec   Loss 4.1290   LearningRate 0.0287   Epoch: 9   Global Step: 115430   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:42:41,652-Speed 3286.16 samples/sec   Loss 4.1109   LearningRate 0.0287   Epoch: 9   Global Step: 115440   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:42:44,786-Speed 3267.82 samples/sec   Loss 4.1541   LearningRate 0.0287   Epoch: 9   Global Step: 115450   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:42:47,872-Speed 3319.85 samples/sec   Loss 4.2427   LearningRate 0.0286   Epoch: 9   Global Step: 115460   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:42:51,011-Speed 3262.58 samples/sec   Loss 4.1430   LearningRate 0.0286   Epoch: 9   Global Step: 115470   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:42:54,197-Speed 3215.96 samples/sec   Loss 4.1199   LearningRate 0.0286   Epoch: 9   Global Step: 115480   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:42:57,293-Speed 3308.37 samples/sec   Loss 4.2557   LearningRate 0.0286   Epoch: 9   Global Step: 115490   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:43:00,425-Speed 3270.75 samples/sec   Loss 4.2149   LearningRate 0.0286   Epoch: 9   Global Step: 115500   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:43:03,523-Speed 3306.12 samples/sec   Loss 4.1318   LearningRate 0.0286   Epoch: 9   Global Step: 115510   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:43:06,611-Speed 3316.68 samples/sec   Loss 4.1654   LearningRate 0.0286   Epoch: 9   Global Step: 115520   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:43:09,674-Speed 3344.40 samples/sec   Loss 4.2100   LearningRate 0.0286   Epoch: 9   Global Step: 115530   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:43:12,780-Speed 3298.16 samples/sec   Loss 4.2417   LearningRate 0.0286   Epoch: 9   Global Step: 115540   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 11:43:15,889-Speed 3295.10 samples/sec   Loss 4.2654   LearningRate 0.0286   Epoch: 9   Global Step: 115550   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 11:43:19,042-Speed 3248.72 samples/sec   Loss 4.1196   LearningRate 0.0286   Epoch: 9   Global Step: 115560   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:43:22,108-Speed 3341.49 samples/sec   Loss 4.2711   LearningRate 0.0286   Epoch: 9   Global Step: 115570   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:43:25,216-Speed 3294.92 samples/sec   Loss 4.1847   LearningRate 0.0286   Epoch: 9   Global Step: 115580   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:43:28,297-Speed 3324.53 samples/sec   Loss 4.1279   LearningRate 0.0286   Epoch: 9   Global Step: 115590   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:43:31,400-Speed 3300.75 samples/sec   Loss 4.3126   LearningRate 0.0286   Epoch: 9   Global Step: 115600   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:43:34,487-Speed 3318.46 samples/sec   Loss 4.2098   LearningRate 0.0286   Epoch: 9   Global Step: 115610   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:43:37,628-Speed 3261.38 samples/sec   Loss 4.1292   LearningRate 0.0286   Epoch: 9   Global Step: 115620   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:43:40,755-Speed 3275.21 samples/sec   Loss 4.1868   LearningRate 0.0286   Epoch: 9   Global Step: 115630   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:43:43,894-Speed 3263.83 samples/sec   Loss 4.2999   LearningRate 0.0286   Epoch: 9   Global Step: 115640   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:43:47,095-Speed 3200.17 samples/sec   Loss 4.1406   LearningRate 0.0286   Epoch: 9   Global Step: 115650   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:43:50,161-Speed 3340.42 samples/sec   Loss 4.2489   LearningRate 0.0286   Epoch: 9   Global Step: 115660   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:43:53,247-Speed 3320.08 samples/sec   Loss 4.2655   LearningRate 0.0286   Epoch: 9   Global Step: 115670   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:43:56,307-Speed 3347.49 samples/sec   Loss 4.2162   LearningRate 0.0286   Epoch: 9   Global Step: 115680   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:43:59,392-Speed 3319.15 samples/sec   Loss 4.2076   LearningRate 0.0285   Epoch: 9   Global Step: 115690   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:44:02,552-Speed 3242.03 samples/sec   Loss 4.1149   LearningRate 0.0285   Epoch: 9   Global Step: 115700   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:44:05,634-Speed 3324.17 samples/sec   Loss 4.2768   LearningRate 0.0285   Epoch: 9   Global Step: 115710   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:44:08,686-Speed 3356.08 samples/sec   Loss 4.2923   LearningRate 0.0285   Epoch: 9   Global Step: 115720   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:44:11,809-Speed 3279.77 samples/sec   Loss 4.1646   LearningRate 0.0285   Epoch: 9   Global Step: 115730   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:44:14,957-Speed 3253.32 samples/sec   Loss 4.3696   LearningRate 0.0285   Epoch: 9   Global Step: 115740   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:44:18,120-Speed 3238.75 samples/sec   Loss 4.1991   LearningRate 0.0285   Epoch: 9   Global Step: 115750   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:44:21,197-Speed 3329.00 samples/sec   Loss 4.2507   LearningRate 0.0285   Epoch: 9   Global Step: 115760   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:44:24,335-Speed 3264.12 samples/sec   Loss 4.1366   LearningRate 0.0285   Epoch: 9   Global Step: 115770   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 11:44:27,401-Speed 3341.09 samples/sec   Loss 4.2160   LearningRate 0.0285   Epoch: 9   Global Step: 115780   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 11:44:30,479-Speed 3328.38 samples/sec   Loss 4.2410   LearningRate 0.0285   Epoch: 9   Global Step: 115790   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 11:44:33,544-Speed 3341.67 samples/sec   Loss 4.0996   LearningRate 0.0285   Epoch: 9   Global Step: 115800   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 11:44:36,613-Speed 3337.76 samples/sec   Loss 4.2953   LearningRate 0.0285   Epoch: 9   Global Step: 115810   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:44:39,684-Speed 3334.60 samples/sec   Loss 4.1574   LearningRate 0.0285   Epoch: 9   Global Step: 115820   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:44:42,759-Speed 3331.08 samples/sec   Loss 4.3107   LearningRate 0.0285   Epoch: 9   Global Step: 115830   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:44:45,827-Speed 3339.12 samples/sec   Loss 4.1561   LearningRate 0.0285   Epoch: 9   Global Step: 115840   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:44:48,885-Speed 3349.97 samples/sec   Loss 4.1456   LearningRate 0.0285   Epoch: 9   Global Step: 115850   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:44:51,951-Speed 3340.74 samples/sec   Loss 4.1299   LearningRate 0.0285   Epoch: 9   Global Step: 115860   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:44:55,006-Speed 3353.97 samples/sec   Loss 4.1979   LearningRate 0.0285   Epoch: 9   Global Step: 115870   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:44:58,059-Speed 3356.04 samples/sec   Loss 4.3016   LearningRate 0.0285   Epoch: 9   Global Step: 115880   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:45:01,179-Speed 3282.88 samples/sec   Loss 4.2306   LearningRate 0.0285   Epoch: 9   Global Step: 115890   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:45:04,310-Speed 3271.35 samples/sec   Loss 4.1388   LearningRate 0.0285   Epoch: 9   Global Step: 115900   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:45:07,426-Speed 3288.14 samples/sec   Loss 4.1906   LearningRate 0.0285   Epoch: 9   Global Step: 115910   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 11:45:10,488-Speed 3344.70 samples/sec   Loss 4.2881   LearningRate 0.0284   Epoch: 9   Global Step: 115920   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 11:45:13,560-Speed 3334.14 samples/sec   Loss 4.2401   LearningRate 0.0284   Epoch: 9   Global Step: 115930   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:45:16,675-Speed 3289.13 samples/sec   Loss 4.1870   LearningRate 0.0284   Epoch: 9   Global Step: 115940   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:45:19,772-Speed 3307.15 samples/sec   Loss 4.2281   LearningRate 0.0284   Epoch: 9   Global Step: 115950   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:45:22,914-Speed 3260.22 samples/sec   Loss 4.0962   LearningRate 0.0284   Epoch: 9   Global Step: 115960   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:45:26,082-Speed 3233.47 samples/sec   Loss 4.3209   LearningRate 0.0284   Epoch: 9   Global Step: 115970   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:45:29,225-Speed 3259.30 samples/sec   Loss 4.2664   LearningRate 0.0284   Epoch: 9   Global Step: 115980   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:45:32,351-Speed 3276.22 samples/sec   Loss 4.2032   LearningRate 0.0284   Epoch: 9   Global Step: 115990   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:45:35,459-Speed 3296.42 samples/sec   Loss 4.2826   LearningRate 0.0284   Epoch: 9   Global Step: 116000   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:45:38,535-Speed 3330.46 samples/sec   Loss 4.1173   LearningRate 0.0284   Epoch: 9   Global Step: 116010   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:45:41,601-Speed 3340.51 samples/sec   Loss 4.2659   LearningRate 0.0284   Epoch: 9   Global Step: 116020   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:45:44,666-Speed 3342.65 samples/sec   Loss 4.2865   LearningRate 0.0284   Epoch: 9   Global Step: 116030   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:45:47,737-Speed 3335.20 samples/sec   Loss 4.2270   LearningRate 0.0284   Epoch: 9   Global Step: 116040   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:45:50,813-Speed 3329.64 samples/sec   Loss 4.1638   LearningRate 0.0284   Epoch: 9   Global Step: 116050   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:45:53,893-Speed 3325.88 samples/sec   Loss 4.2624   LearningRate 0.0284   Epoch: 9   Global Step: 116060   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:45:56,980-Speed 3318.50 samples/sec   Loss 4.1987   LearningRate 0.0284   Epoch: 9   Global Step: 116070   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-27 11:46:00,089-Speed 3294.48 samples/sec   Loss 4.1934   LearningRate 0.0284   Epoch: 9   Global Step: 116080   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-27 11:46:03,185-Speed 3308.28 samples/sec   Loss 4.1655   LearningRate 0.0284   Epoch: 9   Global Step: 116090   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-27 11:46:06,369-Speed 3217.29 samples/sec   Loss 4.2632   LearningRate 0.0284   Epoch: 9   Global Step: 116100   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-27 11:46:09,443-Speed 3332.37 samples/sec   Loss 4.2998   LearningRate 0.0284   Epoch: 9   Global Step: 116110   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-27 11:46:12,548-Speed 3298.55 samples/sec   Loss 4.2117   LearningRate 0.0284   Epoch: 9   Global Step: 116120   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-27 11:46:15,668-Speed 3282.92 samples/sec   Loss 4.2411   LearningRate 0.0284   Epoch: 9   Global Step: 116130   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-27 11:46:18,742-Speed 3332.81 samples/sec   Loss 4.1826   LearningRate 0.0284   Epoch: 9   Global Step: 116140   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-27 11:46:21,828-Speed 3319.29 samples/sec   Loss 4.2463   LearningRate 0.0283   Epoch: 9   Global Step: 116150   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-27 11:46:24,981-Speed 3248.06 samples/sec   Loss 4.2317   LearningRate 0.0283   Epoch: 9   Global Step: 116160   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-27 11:46:28,117-Speed 3266.44 samples/sec   Loss 4.2784   LearningRate 0.0283   Epoch: 9   Global Step: 116170   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:46:31,237-Speed 3284.09 samples/sec   Loss 4.1198   LearningRate 0.0283   Epoch: 9   Global Step: 116180   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:46:34,289-Speed 3355.46 samples/sec   Loss 4.2292   LearningRate 0.0283   Epoch: 9   Global Step: 116190   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:46:37,421-Speed 3271.32 samples/sec   Loss 4.1294   LearningRate 0.0283   Epoch: 9   Global Step: 116200   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:46:40,488-Speed 3339.97 samples/sec   Loss 4.2793   LearningRate 0.0283   Epoch: 9   Global Step: 116210   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:46:43,640-Speed 3249.33 samples/sec   Loss 4.1675   LearningRate 0.0283   Epoch: 9   Global Step: 116220   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:46:46,744-Speed 3300.59 samples/sec   Loss 4.2922   LearningRate 0.0283   Epoch: 9   Global Step: 116230   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:46:49,824-Speed 3326.01 samples/sec   Loss 4.2368   LearningRate 0.0283   Epoch: 9   Global Step: 116240   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:46:52,962-Speed 3263.26 samples/sec   Loss 4.2820   LearningRate 0.0283   Epoch: 9   Global Step: 116250   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:46:56,020-Speed 3350.17 samples/sec   Loss 4.2447   LearningRate 0.0283   Epoch: 9   Global Step: 116260   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:46:59,072-Speed 3356.74 samples/sec   Loss 4.2037   LearningRate 0.0283   Epoch: 9   Global Step: 116270   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:47:02,219-Speed 3254.86 samples/sec   Loss 4.1553   LearningRate 0.0283   Epoch: 9   Global Step: 116280   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:47:05,321-Speed 3302.57 samples/sec   Loss 4.2645   LearningRate 0.0283   Epoch: 9   Global Step: 116290   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:47:08,414-Speed 3311.33 samples/sec   Loss 4.3301   LearningRate 0.0283   Epoch: 9   Global Step: 116300   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:47:11,530-Speed 3287.08 samples/sec   Loss 4.2018   LearningRate 0.0283   Epoch: 9   Global Step: 116310   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:47:14,691-Speed 3241.03 samples/sec   Loss 4.2405   LearningRate 0.0283   Epoch: 9   Global Step: 116320   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:47:17,817-Speed 3276.88 samples/sec   Loss 4.2575   LearningRate 0.0283   Epoch: 9   Global Step: 116330   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:47:20,952-Speed 3266.57 samples/sec   Loss 4.2479   LearningRate 0.0283   Epoch: 9   Global Step: 116340   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:47:24,124-Speed 3229.19 samples/sec   Loss 4.3123   LearningRate 0.0283   Epoch: 9   Global Step: 116350   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:47:27,245-Speed 3281.88 samples/sec   Loss 4.2012   LearningRate 0.0283   Epoch: 9   Global Step: 116360   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:47:30,407-Speed 3239.77 samples/sec   Loss 4.2288   LearningRate 0.0283   Epoch: 9   Global Step: 116370   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 11:47:33,448-Speed 3368.73 samples/sec   Loss 4.2410   LearningRate 0.0283   Epoch: 9   Global Step: 116380   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:47:36,616-Speed 3233.61 samples/sec   Loss 4.2540   LearningRate 0.0282   Epoch: 9   Global Step: 116390   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:47:39,705-Speed 3315.91 samples/sec   Loss 4.2567   LearningRate 0.0282   Epoch: 9   Global Step: 116400   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:47:42,792-Speed 3317.48 samples/sec   Loss 4.2492   LearningRate 0.0282   Epoch: 9   Global Step: 116410   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:47:45,867-Speed 3332.13 samples/sec   Loss 4.2648   LearningRate 0.0282   Epoch: 9   Global Step: 116420   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:47:48,998-Speed 3271.35 samples/sec   Loss 4.3413   LearningRate 0.0282   Epoch: 9   Global Step: 116430   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:47:52,144-Speed 3254.87 samples/sec   Loss 4.2665   LearningRate 0.0282   Epoch: 9   Global Step: 116440   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:47:55,257-Speed 3291.09 samples/sec   Loss 4.2032   LearningRate 0.0282   Epoch: 9   Global Step: 116450   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:47:58,364-Speed 3296.77 samples/sec   Loss 4.3204   LearningRate 0.0282   Epoch: 9   Global Step: 116460   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:48:01,469-Speed 3298.24 samples/sec   Loss 4.2670   LearningRate 0.0282   Epoch: 9   Global Step: 116470   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:48:04,601-Speed 3271.23 samples/sec   Loss 4.3673   LearningRate 0.0282   Epoch: 9   Global Step: 116480   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:48:07,740-Speed 3263.22 samples/sec   Loss 4.2893   LearningRate 0.0282   Epoch: 9   Global Step: 116490   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:48:10,848-Speed 3295.40 samples/sec   Loss 4.2225   LearningRate 0.0282   Epoch: 9   Global Step: 116500   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:48:13,980-Speed 3271.14 samples/sec   Loss 4.2421   LearningRate 0.0282   Epoch: 9   Global Step: 116510   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:48:17,121-Speed 3260.89 samples/sec   Loss 4.2939   LearningRate 0.0282   Epoch: 9   Global Step: 116520   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:48:20,196-Speed 3330.87 samples/sec   Loss 4.2006   LearningRate 0.0282   Epoch: 9   Global Step: 116530   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:48:23,307-Speed 3292.40 samples/sec   Loss 4.1298   LearningRate 0.0282   Epoch: 9   Global Step: 116540   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:48:26,386-Speed 3327.26 samples/sec   Loss 4.2899   LearningRate 0.0282   Epoch: 9   Global Step: 116550   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:48:29,565-Speed 3222.48 samples/sec   Loss 4.2136   LearningRate 0.0282   Epoch: 9   Global Step: 116560   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:48:32,702-Speed 3264.73 samples/sec   Loss 4.2420   LearningRate 0.0282   Epoch: 9   Global Step: 116570   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:48:35,773-Speed 3335.70 samples/sec   Loss 4.2720   LearningRate 0.0282   Epoch: 9   Global Step: 116580   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 11:48:38,923-Speed 3252.41 samples/sec   Loss 4.2875   LearningRate 0.0282   Epoch: 9   Global Step: 116590   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 11:48:42,070-Speed 3253.88 samples/sec   Loss 4.2639   LearningRate 0.0282   Epoch: 9   Global Step: 116600   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 11:48:45,179-Speed 3295.80 samples/sec   Loss 4.2418   LearningRate 0.0282   Epoch: 9   Global Step: 116610   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 11:48:48,272-Speed 3311.07 samples/sec   Loss 4.2372   LearningRate 0.0281   Epoch: 9   Global Step: 116620   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 11:48:51,459-Speed 3214.00 samples/sec   Loss 4.3016   LearningRate 0.0281   Epoch: 9   Global Step: 116630   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:48:54,546-Speed 3318.52 samples/sec   Loss 4.1907   LearningRate 0.0281   Epoch: 9   Global Step: 116640   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:48:57,606-Speed 3347.86 samples/sec   Loss 4.3008   LearningRate 0.0281   Epoch: 9   Global Step: 116650   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:49:00,676-Speed 3335.96 samples/sec   Loss 4.2495   LearningRate 0.0281   Epoch: 9   Global Step: 116660   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:49:03,750-Speed 3331.84 samples/sec   Loss 4.3331   LearningRate 0.0281   Epoch: 9   Global Step: 116670   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:49:06,947-Speed 3204.74 samples/sec   Loss 4.3364   LearningRate 0.0281   Epoch: 9   Global Step: 116680   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:49:09,997-Speed 3358.37 samples/sec   Loss 4.2788   LearningRate 0.0281   Epoch: 9   Global Step: 116690   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:49:13,085-Speed 3316.72 samples/sec   Loss 4.2549   LearningRate 0.0281   Epoch: 9   Global Step: 116700   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:49:16,245-Speed 3241.80 samples/sec   Loss 4.3056   LearningRate 0.0281   Epoch: 9   Global Step: 116710   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:49:19,321-Speed 3329.28 samples/sec   Loss 4.3280   LearningRate 0.0281   Epoch: 9   Global Step: 116720   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:49:22,375-Speed 3354.69 samples/sec   Loss 4.2868   LearningRate 0.0281   Epoch: 9   Global Step: 116730   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 11:49:25,473-Speed 3306.23 samples/sec   Loss 4.3367   LearningRate 0.0281   Epoch: 9   Global Step: 116740   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:49:28,526-Speed 3355.02 samples/sec   Loss 4.2633   LearningRate 0.0281   Epoch: 9   Global Step: 116750   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:49:31,622-Speed 3308.78 samples/sec   Loss 4.3178   LearningRate 0.0281   Epoch: 9   Global Step: 116760   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:49:34,685-Speed 3343.68 samples/sec   Loss 4.3607   LearningRate 0.0281   Epoch: 9   Global Step: 116770   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:49:37,777-Speed 3313.75 samples/sec   Loss 4.2223   LearningRate 0.0281   Epoch: 9   Global Step: 116780   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:49:40,855-Speed 3327.55 samples/sec   Loss 4.1807   LearningRate 0.0281   Epoch: 9   Global Step: 116790   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:49:43,955-Speed 3304.72 samples/sec   Loss 4.2937   LearningRate 0.0281   Epoch: 9   Global Step: 116800   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:49:47,059-Speed 3299.87 samples/sec   Loss 4.2549   LearningRate 0.0281   Epoch: 9   Global Step: 116810   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:49:50,206-Speed 3255.09 samples/sec   Loss 4.2672   LearningRate 0.0281   Epoch: 9   Global Step: 116820   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:49:53,276-Speed 3336.50 samples/sec   Loss 4.2991   LearningRate 0.0281   Epoch: 9   Global Step: 116830   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:49:56,428-Speed 3250.16 samples/sec   Loss 4.3060   LearningRate 0.0281   Epoch: 9   Global Step: 116840   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:49:59,502-Speed 3331.52 samples/sec   Loss 4.2470   LearningRate 0.0281   Epoch: 9   Global Step: 116850   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:50:02,582-Speed 3326.17 samples/sec   Loss 4.3767   LearningRate 0.0280   Epoch: 9   Global Step: 116860   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:50:05,663-Speed 3324.38 samples/sec   Loss 4.4089   LearningRate 0.0280   Epoch: 9   Global Step: 116870   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:50:08,753-Speed 3314.92 samples/sec   Loss 4.2883   LearningRate 0.0280   Epoch: 9   Global Step: 116880   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:50:11,812-Speed 3348.89 samples/sec   Loss 4.1990   LearningRate 0.0280   Epoch: 9   Global Step: 116890   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:50:14,886-Speed 3331.32 samples/sec   Loss 4.2976   LearningRate 0.0280   Epoch: 9   Global Step: 116900   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:50:18,018-Speed 3270.82 samples/sec   Loss 4.2308   LearningRate 0.0280   Epoch: 9   Global Step: 116910   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:50:21,072-Speed 3354.48 samples/sec   Loss 4.3292   LearningRate 0.0280   Epoch: 9   Global Step: 116920   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:50:24,158-Speed 3318.34 samples/sec   Loss 4.1927   LearningRate 0.0280   Epoch: 9   Global Step: 116930   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:50:27,285-Speed 3276.63 samples/sec   Loss 4.4082   LearningRate 0.0280   Epoch: 9   Global Step: 116940   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:50:30,427-Speed 3259.90 samples/sec   Loss 4.2369   LearningRate 0.0280   Epoch: 9   Global Step: 116950   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:50:33,487-Speed 3347.16 samples/sec   Loss 4.2312   LearningRate 0.0280   Epoch: 9   Global Step: 116960   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:50:36,618-Speed 3271.61 samples/sec   Loss 4.2709   LearningRate 0.0280   Epoch: 9   Global Step: 116970   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:50:39,733-Speed 3288.54 samples/sec   Loss 4.2392   LearningRate 0.0280   Epoch: 9   Global Step: 116980   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:50:42,877-Speed 3257.75 samples/sec   Loss 4.2175   LearningRate 0.0280   Epoch: 9   Global Step: 116990   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:50:45,934-Speed 3350.85 samples/sec   Loss 4.2594   LearningRate 0.0280   Epoch: 9   Global Step: 117000   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:50:49,065-Speed 3272.11 samples/sec   Loss 4.2223   LearningRate 0.0280   Epoch: 9   Global Step: 117010   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:50:52,147-Speed 3323.37 samples/sec   Loss 4.2285   LearningRate 0.0280   Epoch: 9   Global Step: 117020   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:50:55,261-Speed 3288.97 samples/sec   Loss 4.3336   LearningRate 0.0280   Epoch: 9   Global Step: 117030   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:50:58,324-Speed 3344.41 samples/sec   Loss 4.2914   LearningRate 0.0280   Epoch: 9   Global Step: 117040   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:51:01,405-Speed 3324.29 samples/sec   Loss 4.2789   LearningRate 0.0280   Epoch: 9   Global Step: 117050   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:51:04,493-Speed 3317.14 samples/sec   Loss 4.2362   LearningRate 0.0280   Epoch: 9   Global Step: 117060   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:51:07,564-Speed 3335.85 samples/sec   Loss 4.3160   LearningRate 0.0280   Epoch: 9   Global Step: 117070   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:51:10,641-Speed 3328.46 samples/sec   Loss 4.2398   LearningRate 0.0280   Epoch: 9   Global Step: 117080   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:51:13,738-Speed 3308.27 samples/sec   Loss 4.3099   LearningRate 0.0279   Epoch: 9   Global Step: 117090   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:51:16,830-Speed 3312.45 samples/sec   Loss 4.2666   LearningRate 0.0279   Epoch: 9   Global Step: 117100   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:51:19,908-Speed 3327.56 samples/sec   Loss 4.2456   LearningRate 0.0279   Epoch: 9   Global Step: 117110   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:51:22,981-Speed 3333.52 samples/sec   Loss 4.3041   LearningRate 0.0279   Epoch: 9   Global Step: 117120   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:51:26,047-Speed 3341.27 samples/sec   Loss 4.3030   LearningRate 0.0279   Epoch: 9   Global Step: 117130   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:51:29,138-Speed 3313.52 samples/sec   Loss 4.4190   LearningRate 0.0279   Epoch: 9   Global Step: 117140   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:51:32,187-Speed 3359.90 samples/sec   Loss 4.2440   LearningRate 0.0279   Epoch: 9   Global Step: 117150   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:51:35,288-Speed 3303.37 samples/sec   Loss 4.3663   LearningRate 0.0279   Epoch: 9   Global Step: 117160   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:51:38,346-Speed 3349.04 samples/sec   Loss 4.3427   LearningRate 0.0279   Epoch: 9   Global Step: 117170   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:51:41,399-Speed 3356.27 samples/sec   Loss 4.3071   LearningRate 0.0279   Epoch: 9   Global Step: 117180   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:51:44,468-Speed 3336.66 samples/sec   Loss 4.3645   LearningRate 0.0279   Epoch: 9   Global Step: 117190   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:51:47,516-Speed 3361.87 samples/sec   Loss 4.2655   LearningRate 0.0279   Epoch: 9   Global Step: 117200   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:51:50,602-Speed 3319.11 samples/sec   Loss 4.3074   LearningRate 0.0279   Epoch: 9   Global Step: 117210   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:51:53,676-Speed 3332.18 samples/sec   Loss 4.2209   LearningRate 0.0279   Epoch: 9   Global Step: 117220   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:51:56,727-Speed 3357.47 samples/sec   Loss 4.4450   LearningRate 0.0279   Epoch: 9   Global Step: 117230   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:51:59,784-Speed 3350.90 samples/sec   Loss 4.2986   LearningRate 0.0279   Epoch: 9   Global Step: 117240   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:52:02,905-Speed 3281.27 samples/sec   Loss 4.2475   LearningRate 0.0279   Epoch: 9   Global Step: 117250   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:52:06,027-Speed 3281.01 samples/sec   Loss 4.3254   LearningRate 0.0279   Epoch: 9   Global Step: 117260   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:52:09,101-Speed 3332.80 samples/sec   Loss 4.2495   LearningRate 0.0279   Epoch: 9   Global Step: 117270   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:52:12,240-Speed 3263.43 samples/sec   Loss 4.2173   LearningRate 0.0279   Epoch: 9   Global Step: 117280   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:52:15,389-Speed 3252.33 samples/sec   Loss 4.3321   LearningRate 0.0279   Epoch: 9   Global Step: 117290   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:52:18,484-Speed 3309.41 samples/sec   Loss 4.3333   LearningRate 0.0279   Epoch: 9   Global Step: 117300   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:52:21,593-Speed 3294.52 samples/sec   Loss 4.2768   LearningRate 0.0279   Epoch: 9   Global Step: 117310   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:52:24,684-Speed 3313.98 samples/sec   Loss 4.2403   LearningRate 0.0279   Epoch: 9   Global Step: 117320   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:52:27,817-Speed 3269.41 samples/sec   Loss 4.3540   LearningRate 0.0278   Epoch: 9   Global Step: 117330   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:52:30,929-Speed 3292.23 samples/sec   Loss 4.3956   LearningRate 0.0278   Epoch: 9   Global Step: 117340   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:52:34,015-Speed 3318.80 samples/sec   Loss 4.3419   LearningRate 0.0278   Epoch: 9   Global Step: 117350   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:52:37,149-Speed 3268.76 samples/sec   Loss 4.3619   LearningRate 0.0278   Epoch: 9   Global Step: 117360   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:52:40,283-Speed 3268.54 samples/sec   Loss 4.4001   LearningRate 0.0278   Epoch: 9   Global Step: 117370   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:52:43,375-Speed 3312.32 samples/sec   Loss 4.2561   LearningRate 0.0278   Epoch: 9   Global Step: 117380   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:52:46,452-Speed 3328.67 samples/sec   Loss 4.2952   LearningRate 0.0278   Epoch: 9   Global Step: 117390   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:52:49,578-Speed 3277.04 samples/sec   Loss 4.2021   LearningRate 0.0278   Epoch: 9   Global Step: 117400   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:52:52,708-Speed 3272.22 samples/sec   Loss 4.2819   LearningRate 0.0278   Epoch: 9   Global Step: 117410   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:52:55,795-Speed 3318.97 samples/sec   Loss 4.2612   LearningRate 0.0278   Epoch: 9   Global Step: 117420   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:52:58,915-Speed 3282.70 samples/sec   Loss 4.2546   LearningRate 0.0278   Epoch: 9   Global Step: 117430   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:53:02,002-Speed 3317.90 samples/sec   Loss 4.3132   LearningRate 0.0278   Epoch: 9   Global Step: 117440   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:53:05,183-Speed 3220.06 samples/sec   Loss 4.3151   LearningRate 0.0278   Epoch: 9   Global Step: 117450   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:53:08,303-Speed 3283.17 samples/sec   Loss 4.3785   LearningRate 0.0278   Epoch: 9   Global Step: 117460   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:53:11,501-Speed 3203.69 samples/sec   Loss 4.3353   LearningRate 0.0278   Epoch: 9   Global Step: 117470   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:53:14,589-Speed 3316.72 samples/sec   Loss 4.3088   LearningRate 0.0278   Epoch: 9   Global Step: 117480   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:53:17,753-Speed 3237.67 samples/sec   Loss 4.3387   LearningRate 0.0278   Epoch: 9   Global Step: 117490   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:53:20,806-Speed 3355.36 samples/sec   Loss 4.3812   LearningRate 0.0278   Epoch: 9   Global Step: 117500   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:53:23,928-Speed 3281.08 samples/sec   Loss 4.3353   LearningRate 0.0278   Epoch: 9   Global Step: 117510   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:53:27,085-Speed 3244.12 samples/sec   Loss 4.3780   LearningRate 0.0278   Epoch: 9   Global Step: 117520   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:53:30,217-Speed 3270.39 samples/sec   Loss 4.3347   LearningRate 0.0278   Epoch: 9   Global Step: 117530   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:53:33,300-Speed 3322.83 samples/sec   Loss 4.3257   LearningRate 0.0278   Epoch: 9   Global Step: 117540   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:53:36,420-Speed 3283.42 samples/sec   Loss 4.2913   LearningRate 0.0278   Epoch: 9   Global Step: 117550   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:53:39,534-Speed 3288.88 samples/sec   Loss 4.3136   LearningRate 0.0277   Epoch: 9   Global Step: 117560   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:53:42,678-Speed 3258.52 samples/sec   Loss 4.2128   LearningRate 0.0277   Epoch: 9   Global Step: 117570   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:53:45,737-Speed 3348.86 samples/sec   Loss 4.3514   LearningRate 0.0277   Epoch: 9   Global Step: 117580   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:53:48,888-Speed 3250.03 samples/sec   Loss 4.3477   LearningRate 0.0277   Epoch: 9   Global Step: 117590   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:53:52,008-Speed 3283.47 samples/sec   Loss 4.2563   LearningRate 0.0277   Epoch: 9   Global Step: 117600   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:53:55,093-Speed 3320.13 samples/sec   Loss 4.3089   LearningRate 0.0277   Epoch: 9   Global Step: 117610   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:53:58,149-Speed 3351.44 samples/sec   Loss 4.4025   LearningRate 0.0277   Epoch: 9   Global Step: 117620   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:54:02,035-Speed 2635.61 samples/sec   Loss 4.2950   LearningRate 0.0277   Epoch: 9   Global Step: 117630   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:54:05,175-Speed 3262.18 samples/sec   Loss 4.3142   LearningRate 0.0277   Epoch: 9   Global Step: 117640   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:54:08,260-Speed 3320.99 samples/sec   Loss 4.3266   LearningRate 0.0277   Epoch: 9   Global Step: 117650   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:54:11,415-Speed 3246.29 samples/sec   Loss 4.2295   LearningRate 0.0277   Epoch: 9   Global Step: 117660   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:54:14,484-Speed 3337.84 samples/sec   Loss 4.3190   LearningRate 0.0277   Epoch: 9   Global Step: 117670   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:54:17,571-Speed 3317.44 samples/sec   Loss 4.3164   LearningRate 0.0277   Epoch: 9   Global Step: 117680   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:54:20,627-Speed 3352.56 samples/sec   Loss 4.3465   LearningRate 0.0277   Epoch: 9   Global Step: 117690   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:54:23,736-Speed 3294.94 samples/sec   Loss 4.4158   LearningRate 0.0277   Epoch: 9   Global Step: 117700   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:54:26,905-Speed 3231.87 samples/sec   Loss 4.3857   LearningRate 0.0277   Epoch: 9   Global Step: 117710   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:54:30,087-Speed 3218.51 samples/sec   Loss 4.3428   LearningRate 0.0277   Epoch: 9   Global Step: 117720   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 11:54:33,173-Speed 3319.19 samples/sec   Loss 4.4410   LearningRate 0.0277   Epoch: 9   Global Step: 117730   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 11:54:36,345-Speed 3229.27 samples/sec   Loss 4.2803   LearningRate 0.0277   Epoch: 9   Global Step: 117740   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 11:54:39,510-Speed 3236.87 samples/sec   Loss 4.4129   LearningRate 0.0277   Epoch: 9   Global Step: 117750   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 11:54:42,606-Speed 3308.22 samples/sec   Loss 4.2257   LearningRate 0.0277   Epoch: 9   Global Step: 117760   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:54:45,715-Speed 3294.60 samples/sec   Loss 4.3836   LearningRate 0.0277   Epoch: 9   Global Step: 117770   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:54:48,825-Speed 3293.68 samples/sec   Loss 4.3524   LearningRate 0.0277   Epoch: 9   Global Step: 117780   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:54:51,962-Speed 3266.00 samples/sec   Loss 4.2175   LearningRate 0.0277   Epoch: 9   Global Step: 117790   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:54:55,054-Speed 3312.35 samples/sec   Loss 4.2061   LearningRate 0.0276   Epoch: 9   Global Step: 117800   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:54:58,133-Speed 3327.24 samples/sec   Loss 4.2620   LearningRate 0.0276   Epoch: 9   Global Step: 117810   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:55:01,249-Speed 3287.32 samples/sec   Loss 4.3177   LearningRate 0.0276   Epoch: 9   Global Step: 117820   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:55:04,379-Speed 3272.52 samples/sec   Loss 4.3946   LearningRate 0.0276   Epoch: 9   Global Step: 117830   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:55:07,492-Speed 3290.69 samples/sec   Loss 4.3542   LearningRate 0.0276   Epoch: 9   Global Step: 117840   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:55:10,590-Speed 3305.77 samples/sec   Loss 4.3142   LearningRate 0.0276   Epoch: 9   Global Step: 117850   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:55:13,794-Speed 3197.87 samples/sec   Loss 4.4230   LearningRate 0.0276   Epoch: 9   Global Step: 117860   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 11:55:16,983-Speed 3211.79 samples/sec   Loss 4.3687   LearningRate 0.0276   Epoch: 9   Global Step: 117870   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 11:55:20,117-Speed 3267.99 samples/sec   Loss 4.3528   LearningRate 0.0276   Epoch: 9   Global Step: 117880   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 11:55:23,182-Speed 3342.08 samples/sec   Loss 4.3578   LearningRate 0.0276   Epoch: 9   Global Step: 117890   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 11:55:26,272-Speed 3315.18 samples/sec   Loss 4.3952   LearningRate 0.0276   Epoch: 9   Global Step: 117900   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:55:29,382-Speed 3293.43 samples/sec   Loss 4.3785   LearningRate 0.0276   Epoch: 9   Global Step: 117910   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:55:32,487-Speed 3298.98 samples/sec   Loss 4.3118   LearningRate 0.0276   Epoch: 9   Global Step: 117920   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:55:35,608-Speed 3282.28 samples/sec   Loss 4.2752   LearningRate 0.0276   Epoch: 9   Global Step: 117930   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:55:38,722-Speed 3288.92 samples/sec   Loss 4.4087   LearningRate 0.0276   Epoch: 9   Global Step: 117940   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:55:41,803-Speed 3325.06 samples/sec   Loss 4.1865   LearningRate 0.0276   Epoch: 9   Global Step: 117950   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:55:44,874-Speed 3335.76 samples/sec   Loss 4.3694   LearningRate 0.0276   Epoch: 9   Global Step: 117960   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:55:47,977-Speed 3300.82 samples/sec   Loss 4.3500   LearningRate 0.0276   Epoch: 9   Global Step: 117970   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:55:51,070-Speed 3310.65 samples/sec   Loss 4.4120   LearningRate 0.0276   Epoch: 9   Global Step: 117980   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:55:54,166-Speed 3309.05 samples/sec   Loss 4.3680   LearningRate 0.0276   Epoch: 9   Global Step: 117990   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:55:57,286-Speed 3283.31 samples/sec   Loss 4.3178   LearningRate 0.0276   Epoch: 9   Global Step: 118000   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:56:00,357-Speed 3335.45 samples/sec   Loss 4.3886   LearningRate 0.0276   Epoch: 9   Global Step: 118010   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:56:03,422-Speed 3341.78 samples/sec   Loss 4.4462   LearningRate 0.0276   Epoch: 9   Global Step: 118020   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:56:06,496-Speed 3332.00 samples/sec   Loss 4.2906   LearningRate 0.0275   Epoch: 9   Global Step: 118030   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:56:09,587-Speed 3313.67 samples/sec   Loss 4.3551   LearningRate 0.0275   Epoch: 9   Global Step: 118040   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:56:12,655-Speed 3339.22 samples/sec   Loss 4.3236   LearningRate 0.0275   Epoch: 9   Global Step: 118050   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:56:15,784-Speed 3274.05 samples/sec   Loss 4.3813   LearningRate 0.0275   Epoch: 9   Global Step: 118060   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:56:18,861-Speed 3328.51 samples/sec   Loss 4.3402   LearningRate 0.0275   Epoch: 9   Global Step: 118070   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:56:21,934-Speed 3333.34 samples/sec   Loss 4.3607   LearningRate 0.0275   Epoch: 9   Global Step: 118080   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:56:25,134-Speed 3201.38 samples/sec   Loss 4.3073   LearningRate 0.0275   Epoch: 9   Global Step: 118090   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:56:28,269-Speed 3267.29 samples/sec   Loss 4.3897   LearningRate 0.0275   Epoch: 9   Global Step: 118100   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:56:31,360-Speed 3313.84 samples/sec   Loss 4.3324   LearningRate 0.0275   Epoch: 9   Global Step: 118110   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:56:34,463-Speed 3300.83 samples/sec   Loss 4.4760   LearningRate 0.0275   Epoch: 9   Global Step: 118120   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:56:37,628-Speed 3236.88 samples/sec   Loss 4.3636   LearningRate 0.0275   Epoch: 9   Global Step: 118130   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:56:40,816-Speed 3213.04 samples/sec   Loss 4.3827   LearningRate 0.0275   Epoch: 9   Global Step: 118140   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:56:43,950-Speed 3268.18 samples/sec   Loss 4.3081   LearningRate 0.0275   Epoch: 9   Global Step: 118150   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:56:47,051-Speed 3303.28 samples/sec   Loss 4.3392   LearningRate 0.0275   Epoch: 9   Global Step: 118160   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:56:50,132-Speed 3324.42 samples/sec   Loss 4.3055   LearningRate 0.0275   Epoch: 9   Global Step: 118170   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:56:53,262-Speed 3272.49 samples/sec   Loss 4.4035   LearningRate 0.0275   Epoch: 9   Global Step: 118180   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:56:56,397-Speed 3267.46 samples/sec   Loss 4.3023   LearningRate 0.0275   Epoch: 9   Global Step: 118190   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:56:59,480-Speed 3322.64 samples/sec   Loss 4.3155   LearningRate 0.0275   Epoch: 9   Global Step: 118200   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:57:02,614-Speed 3268.03 samples/sec   Loss 4.3714   LearningRate 0.0275   Epoch: 9   Global Step: 118210   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:57:05,742-Speed 3274.88 samples/sec   Loss 4.2090   LearningRate 0.0275   Epoch: 9   Global Step: 118220   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:57:08,846-Speed 3299.75 samples/sec   Loss 4.3182   LearningRate 0.0275   Epoch: 9   Global Step: 118230   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:57:12,002-Speed 3245.98 samples/sec   Loss 4.2732   LearningRate 0.0275   Epoch: 9   Global Step: 118240   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:57:15,197-Speed 3206.39 samples/sec   Loss 4.3936   LearningRate 0.0275   Epoch: 9   Global Step: 118250   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:57:18,320-Speed 3279.21 samples/sec   Loss 4.3658   LearningRate 0.0275   Epoch: 9   Global Step: 118260   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:57:21,412-Speed 3313.05 samples/sec   Loss 4.3436   LearningRate 0.0274   Epoch: 9   Global Step: 118270   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:57:24,546-Speed 3268.37 samples/sec   Loss 4.3035   LearningRate 0.0274   Epoch: 9   Global Step: 118280   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:57:27,667-Speed 3282.19 samples/sec   Loss 4.2775   LearningRate 0.0274   Epoch: 9   Global Step: 118290   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:57:30,763-Speed 3308.89 samples/sec   Loss 4.3150   LearningRate 0.0274   Epoch: 9   Global Step: 118300   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:57:33,859-Speed 3308.24 samples/sec   Loss 4.3764   LearningRate 0.0274   Epoch: 9   Global Step: 118310   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:57:37,015-Speed 3246.32 samples/sec   Loss 4.4921   LearningRate 0.0274   Epoch: 9   Global Step: 118320   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:57:40,128-Speed 3289.92 samples/sec   Loss 4.3251   LearningRate 0.0274   Epoch: 9   Global Step: 118330   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:57:43,222-Speed 3310.61 samples/sec   Loss 4.4612   LearningRate 0.0274   Epoch: 9   Global Step: 118340   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:57:46,309-Speed 3318.97 samples/sec   Loss 4.4010   LearningRate 0.0274   Epoch: 9   Global Step: 118350   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:57:49,361-Speed 3356.14 samples/sec   Loss 4.3398   LearningRate 0.0274   Epoch: 9   Global Step: 118360   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:57:52,465-Speed 3299.70 samples/sec   Loss 4.3229   LearningRate 0.0274   Epoch: 9   Global Step: 118370   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:57:55,575-Speed 3293.57 samples/sec   Loss 4.3785   LearningRate 0.0274   Epoch: 9   Global Step: 118380   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:57:58,647-Speed 3334.93 samples/sec   Loss 4.3455   LearningRate 0.0274   Epoch: 9   Global Step: 118390   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:58:01,756-Speed 3295.15 samples/sec   Loss 4.3831   LearningRate 0.0274   Epoch: 9   Global Step: 118400   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:58:04,823-Speed 3339.53 samples/sec   Loss 4.3389   LearningRate 0.0274   Epoch: 9   Global Step: 118410   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:58:07,909-Speed 3318.86 samples/sec   Loss 4.3477   LearningRate 0.0274   Epoch: 9   Global Step: 118420   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:58:11,017-Speed 3295.49 samples/sec   Loss 4.3831   LearningRate 0.0274   Epoch: 9   Global Step: 118430   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:58:14,160-Speed 3259.41 samples/sec   Loss 4.3572   LearningRate 0.0274   Epoch: 9   Global Step: 118440   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:58:17,280-Speed 3283.51 samples/sec   Loss 4.3067   LearningRate 0.0274   Epoch: 9   Global Step: 118450   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 11:58:20,399-Speed 3283.50 samples/sec   Loss 4.3317   LearningRate 0.0274   Epoch: 9   Global Step: 118460   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:58:23,470-Speed 3335.84 samples/sec   Loss 4.2873   LearningRate 0.0274   Epoch: 9   Global Step: 118470   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:58:26,540-Speed 3336.87 samples/sec   Loss 4.3639   LearningRate 0.0274   Epoch: 9   Global Step: 118480   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:58:29,608-Speed 3338.46 samples/sec   Loss 4.2961   LearningRate 0.0274   Epoch: 9   Global Step: 118490   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:58:32,685-Speed 3329.32 samples/sec   Loss 4.3146   LearningRate 0.0274   Epoch: 9   Global Step: 118500   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:58:35,756-Speed 3334.46 samples/sec   Loss 4.3439   LearningRate 0.0273   Epoch: 9   Global Step: 118510   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:58:38,852-Speed 3309.22 samples/sec   Loss 4.2938   LearningRate 0.0273   Epoch: 9   Global Step: 118520   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:58:41,931-Speed 3326.45 samples/sec   Loss 4.4545   LearningRate 0.0273   Epoch: 9   Global Step: 118530   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:58:44,997-Speed 3341.25 samples/sec   Loss 4.4694   LearningRate 0.0273   Epoch: 9   Global Step: 118540   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:58:48,084-Speed 3317.57 samples/sec   Loss 4.4085   LearningRate 0.0273   Epoch: 9   Global Step: 118550   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:58:51,173-Speed 3315.63 samples/sec   Loss 4.3592   LearningRate 0.0273   Epoch: 9   Global Step: 118560   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 11:58:54,286-Speed 3291.13 samples/sec   Loss 4.3692   LearningRate 0.0273   Epoch: 9   Global Step: 118570   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:58:57,401-Speed 3287.99 samples/sec   Loss 4.3708   LearningRate 0.0273   Epoch: 9   Global Step: 118580   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:59:00,506-Speed 3298.35 samples/sec   Loss 4.3638   LearningRate 0.0273   Epoch: 9   Global Step: 118590   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:59:03,632-Speed 3277.43 samples/sec   Loss 4.3726   LearningRate 0.0273   Epoch: 9   Global Step: 118600   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:59:06,708-Speed 3329.32 samples/sec   Loss 4.3296   LearningRate 0.0273   Epoch: 9   Global Step: 118610   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:59:09,788-Speed 3326.25 samples/sec   Loss 4.2780   LearningRate 0.0273   Epoch: 9   Global Step: 118620   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:59:12,903-Speed 3288.23 samples/sec   Loss 4.3551   LearningRate 0.0273   Epoch: 9   Global Step: 118630   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:59:15,986-Speed 3322.56 samples/sec   Loss 4.4346   LearningRate 0.0273   Epoch: 9   Global Step: 118640   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:59:19,074-Speed 3317.06 samples/sec   Loss 4.3797   LearningRate 0.0273   Epoch: 9   Global Step: 118650   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:59:22,145-Speed 3334.77 samples/sec   Loss 4.3815   LearningRate 0.0273   Epoch: 9   Global Step: 118660   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:59:25,260-Speed 3288.54 samples/sec   Loss 4.3708   LearningRate 0.0273   Epoch: 9   Global Step: 118670   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 11:59:28,413-Speed 3249.20 samples/sec   Loss 4.3768   LearningRate 0.0273   Epoch: 9   Global Step: 118680   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:59:31,531-Speed 3285.28 samples/sec   Loss 4.3360   LearningRate 0.0273   Epoch: 9   Global Step: 118690   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:59:34,595-Speed 3343.18 samples/sec   Loss 4.3752   LearningRate 0.0273   Epoch: 9   Global Step: 118700   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:59:37,692-Speed 3307.72 samples/sec   Loss 4.3589   LearningRate 0.0273   Epoch: 9   Global Step: 118710   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:59:40,850-Speed 3243.07 samples/sec   Loss 4.4774   LearningRate 0.0273   Epoch: 9   Global Step: 118720   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:59:43,989-Speed 3263.17 samples/sec   Loss 4.3686   LearningRate 0.0273   Epoch: 9   Global Step: 118730   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:59:47,062-Speed 3333.72 samples/sec   Loss 4.3883   LearningRate 0.0273   Epoch: 9   Global Step: 118740   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 11:59:50,093-Speed 3378.83 samples/sec   Loss 4.3977   LearningRate 0.0272   Epoch: 9   Global Step: 118750   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-27 11:59:53,263-Speed 3231.88 samples/sec   Loss 4.4338   LearningRate 0.0272   Epoch: 9   Global Step: 118760   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-27 11:59:56,397-Speed 3268.45 samples/sec   Loss 4.3670   LearningRate 0.0272   Epoch: 9   Global Step: 118770   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-27 11:59:59,474-Speed 3328.34 samples/sec   Loss 4.3655   LearningRate 0.0272   Epoch: 9   Global Step: 118780   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-27 12:00:02,629-Speed 3246.32 samples/sec   Loss 4.4592   LearningRate 0.0272   Epoch: 9   Global Step: 118790   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-27 12:00:05,720-Speed 3314.44 samples/sec   Loss 4.3520   LearningRate 0.0272   Epoch: 9   Global Step: 118800   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-27 12:00:08,798-Speed 3327.54 samples/sec   Loss 4.4119   LearningRate 0.0272   Epoch: 9   Global Step: 118810   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-27 12:00:11,863-Speed 3342.51 samples/sec   Loss 4.3879   LearningRate 0.0272   Epoch: 9   Global Step: 118820   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-27 12:00:14,985-Speed 3280.84 samples/sec   Loss 4.3276   LearningRate 0.0272   Epoch: 9   Global Step: 118830   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-27 12:00:18,132-Speed 3254.37 samples/sec   Loss 4.4289   LearningRate 0.0272   Epoch: 9   Global Step: 118840   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-27 12:00:21,212-Speed 3326.54 samples/sec   Loss 4.3742   LearningRate 0.0272   Epoch: 9   Global Step: 118850   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:00:24,296-Speed 3320.68 samples/sec   Loss 4.3705   LearningRate 0.0272   Epoch: 9   Global Step: 118860   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:00:27,391-Speed 3309.35 samples/sec   Loss 4.3285   LearningRate 0.0272   Epoch: 9   Global Step: 118870   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:00:30,470-Speed 3327.29 samples/sec   Loss 4.3303   LearningRate 0.0272   Epoch: 9   Global Step: 118880   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:00:33,565-Speed 3310.18 samples/sec   Loss 4.3709   LearningRate 0.0272   Epoch: 9   Global Step: 118890   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:00:36,679-Speed 3288.62 samples/sec   Loss 4.3671   LearningRate 0.0272   Epoch: 9   Global Step: 118900   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:00:39,831-Speed 3249.79 samples/sec   Loss 4.3808   LearningRate 0.0272   Epoch: 9   Global Step: 118910   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:00:42,924-Speed 3311.83 samples/sec   Loss 4.3184   LearningRate 0.0272   Epoch: 9   Global Step: 118920   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:00:46,000-Speed 3330.34 samples/sec   Loss 4.3119   LearningRate 0.0272   Epoch: 9   Global Step: 118930   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:00:49,156-Speed 3245.66 samples/sec   Loss 4.3107   LearningRate 0.0272   Epoch: 9   Global Step: 118940   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:00:52,246-Speed 3314.25 samples/sec   Loss 4.3852   LearningRate 0.0272   Epoch: 9   Global Step: 118950   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:00:55,393-Speed 3255.22 samples/sec   Loss 4.3350   LearningRate 0.0272   Epoch: 9   Global Step: 118960   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:00:58,455-Speed 3345.32 samples/sec   Loss 4.3996   LearningRate 0.0272   Epoch: 9   Global Step: 118970   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:01:01,571-Speed 3287.21 samples/sec   Loss 4.3710   LearningRate 0.0271   Epoch: 9   Global Step: 118980   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:01:04,668-Speed 3307.93 samples/sec   Loss 4.2645   LearningRate 0.0271   Epoch: 9   Global Step: 118990   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:01:07,748-Speed 3325.65 samples/sec   Loss 4.3422   LearningRate 0.0271   Epoch: 9   Global Step: 119000   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:01:10,836-Speed 3316.89 samples/sec   Loss 4.3782   LearningRate 0.0271   Epoch: 9   Global Step: 119010   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:01:13,922-Speed 3318.54 samples/sec   Loss 4.3662   LearningRate 0.0271   Epoch: 9   Global Step: 119020   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:01:17,052-Speed 3273.29 samples/sec   Loss 4.3803   LearningRate 0.0271   Epoch: 9   Global Step: 119030   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:01:20,131-Speed 3326.68 samples/sec   Loss 4.3613   LearningRate 0.0271   Epoch: 9   Global Step: 119040   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:01:23,293-Speed 3239.93 samples/sec   Loss 4.3226   LearningRate 0.0271   Epoch: 9   Global Step: 119050   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:01:26,373-Speed 3324.91 samples/sec   Loss 4.3689   LearningRate 0.0271   Epoch: 9   Global Step: 119060   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:01:29,462-Speed 3316.22 samples/sec   Loss 4.3040   LearningRate 0.0271   Epoch: 9   Global Step: 119070   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:01:32,575-Speed 3291.05 samples/sec   Loss 4.4025   LearningRate 0.0271   Epoch: 9   Global Step: 119080   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:01:35,661-Speed 3318.77 samples/sec   Loss 4.3796   LearningRate 0.0271   Epoch: 9   Global Step: 119090   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:01:38,768-Speed 3297.47 samples/sec   Loss 4.4553   LearningRate 0.0271   Epoch: 9   Global Step: 119100   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:01:41,858-Speed 3314.58 samples/sec   Loss 4.3716   LearningRate 0.0271   Epoch: 9   Global Step: 119110   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:01:44,959-Speed 3303.50 samples/sec   Loss 4.3840   LearningRate 0.0271   Epoch: 9   Global Step: 119120   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:01:48,083-Speed 3278.99 samples/sec   Loss 4.4319   LearningRate 0.0271   Epoch: 9   Global Step: 119130   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:01:51,215-Speed 3270.06 samples/sec   Loss 4.4317   LearningRate 0.0271   Epoch: 9   Global Step: 119140   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:01:54,305-Speed 3314.77 samples/sec   Loss 4.3341   LearningRate 0.0271   Epoch: 9   Global Step: 119150   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:01:57,362-Speed 3351.28 samples/sec   Loss 4.3114   LearningRate 0.0271   Epoch: 9   Global Step: 119160   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:02:00,494-Speed 3270.56 samples/sec   Loss 4.3709   LearningRate 0.0271   Epoch: 9   Global Step: 119170   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:02:03,626-Speed 3270.52 samples/sec   Loss 4.3728   LearningRate 0.0271   Epoch: 9   Global Step: 119180   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:02:06,719-Speed 3311.56 samples/sec   Loss 4.4833   LearningRate 0.0271   Epoch: 9   Global Step: 119190   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:02:09,808-Speed 3316.42 samples/sec   Loss 4.3127   LearningRate 0.0271   Epoch: 9   Global Step: 119200   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:02:12,882-Speed 3331.94 samples/sec   Loss 4.4318   LearningRate 0.0271   Epoch: 9   Global Step: 119210   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:02:15,964-Speed 3323.28 samples/sec   Loss 4.2861   LearningRate 0.0270   Epoch: 9   Global Step: 119220   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:02:19,059-Speed 3309.73 samples/sec   Loss 4.3693   LearningRate 0.0270   Epoch: 9   Global Step: 119230   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:02:22,126-Speed 3339.74 samples/sec   Loss 4.3258   LearningRate 0.0270   Epoch: 9   Global Step: 119240   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:02:25,295-Speed 3232.49 samples/sec   Loss 4.3038   LearningRate 0.0270   Epoch: 9   Global Step: 119250   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:02:28,467-Speed 3229.68 samples/sec   Loss 4.4249   LearningRate 0.0270   Epoch: 9   Global Step: 119260   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:02:31,620-Speed 3248.25 samples/sec   Loss 4.4039   LearningRate 0.0270   Epoch: 9   Global Step: 119270   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:02:34,736-Speed 3287.64 samples/sec   Loss 4.3931   LearningRate 0.0270   Epoch: 9   Global Step: 119280   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:02:37,867-Speed 3271.20 samples/sec   Loss 4.3793   LearningRate 0.0270   Epoch: 9   Global Step: 119290   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:02:41,057-Speed 3211.82 samples/sec   Loss 4.4270   LearningRate 0.0270   Epoch: 9   Global Step: 119300   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:02:44,163-Speed 3297.55 samples/sec   Loss 4.4089   LearningRate 0.0270   Epoch: 9   Global Step: 119310   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:02:47,269-Speed 3297.22 samples/sec   Loss 4.3134   LearningRate 0.0270   Epoch: 9   Global Step: 119320   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:02:50,427-Speed 3244.10 samples/sec   Loss 4.3996   LearningRate 0.0270   Epoch: 9   Global Step: 119330   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:02:53,580-Speed 3248.85 samples/sec   Loss 4.3320   LearningRate 0.0270   Epoch: 9   Global Step: 119340   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:02:56,659-Speed 3327.38 samples/sec   Loss 4.3724   LearningRate 0.0270   Epoch: 9   Global Step: 119350   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:02:59,790-Speed 3271.15 samples/sec   Loss 4.4727   LearningRate 0.0270   Epoch: 9   Global Step: 119360   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:03:02,903-Speed 3291.01 samples/sec   Loss 4.4656   LearningRate 0.0270   Epoch: 9   Global Step: 119370   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:03:05,957-Speed 3353.78 samples/sec   Loss 4.3979   LearningRate 0.0270   Epoch: 9   Global Step: 119380   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:03:09,014-Speed 3350.91 samples/sec   Loss 4.4075   LearningRate 0.0270   Epoch: 9   Global Step: 119390   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:03:12,088-Speed 3332.34 samples/sec   Loss 4.4000   LearningRate 0.0270   Epoch: 9   Global Step: 119400   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:03:15,207-Speed 3283.93 samples/sec   Loss 4.4756   LearningRate 0.0270   Epoch: 9   Global Step: 119410   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:03:18,338-Speed 3271.59 samples/sec   Loss 4.3323   LearningRate 0.0270   Epoch: 9   Global Step: 119420   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:03:21,413-Speed 3331.88 samples/sec   Loss 4.3249   LearningRate 0.0270   Epoch: 9   Global Step: 119430   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:03:24,560-Speed 3254.81 samples/sec   Loss 4.2967   LearningRate 0.0270   Epoch: 9   Global Step: 119440   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:03:27,658-Speed 3306.10 samples/sec   Loss 4.3824   LearningRate 0.0270   Epoch: 9   Global Step: 119450   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:03:30,757-Speed 3305.56 samples/sec   Loss 4.3259   LearningRate 0.0269   Epoch: 9   Global Step: 119460   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:03:33,887-Speed 3273.06 samples/sec   Loss 4.3973   LearningRate 0.0269   Epoch: 9   Global Step: 119470   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:03:36,998-Speed 3292.76 samples/sec   Loss 4.4657   LearningRate 0.0269   Epoch: 9   Global Step: 119480   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:03:40,081-Speed 3322.60 samples/sec   Loss 4.4176   LearningRate 0.0269   Epoch: 9   Global Step: 119490   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:03:43,157-Speed 3329.97 samples/sec   Loss 4.3787   LearningRate 0.0269   Epoch: 9   Global Step: 119500   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:03:46,252-Speed 3309.25 samples/sec   Loss 4.2959   LearningRate 0.0269   Epoch: 9   Global Step: 119510   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:03:49,431-Speed 3222.53 samples/sec   Loss 4.3571   LearningRate 0.0269   Epoch: 9   Global Step: 119520   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:03:52,598-Speed 3234.17 samples/sec   Loss 4.2995   LearningRate 0.0269   Epoch: 9   Global Step: 119530   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:03:55,754-Speed 3245.50 samples/sec   Loss 4.3501   LearningRate 0.0269   Epoch: 9   Global Step: 119540   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:03:58,806-Speed 3355.98 samples/sec   Loss 4.4159   LearningRate 0.0269   Epoch: 9   Global Step: 119550   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:04:01,960-Speed 3247.94 samples/sec   Loss 4.4370   LearningRate 0.0269   Epoch: 9   Global Step: 119560   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:04:05,104-Speed 3258.32 samples/sec   Loss 4.3307   LearningRate 0.0269   Epoch: 9   Global Step: 119570   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:04:08,257-Speed 3248.71 samples/sec   Loss 4.4418   LearningRate 0.0269   Epoch: 9   Global Step: 119580   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:04:11,370-Speed 3290.23 samples/sec   Loss 4.4782   LearningRate 0.0269   Epoch: 9   Global Step: 119590   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:04:14,427-Speed 3351.00 samples/sec   Loss 4.3796   LearningRate 0.0269   Epoch: 9   Global Step: 119600   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:04:17,500-Speed 3333.53 samples/sec   Loss 4.3742   LearningRate 0.0269   Epoch: 9   Global Step: 119610   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:04:20,621-Speed 3282.04 samples/sec   Loss 4.3971   LearningRate 0.0269   Epoch: 9   Global Step: 119620   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:04:23,725-Speed 3298.94 samples/sec   Loss 4.3921   LearningRate 0.0269   Epoch: 9   Global Step: 119630   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:04:26,815-Speed 3315.75 samples/sec   Loss 4.3269   LearningRate 0.0269   Epoch: 9   Global Step: 119640   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:04:29,875-Speed 3346.78 samples/sec   Loss 4.5368   LearningRate 0.0269   Epoch: 9   Global Step: 119650   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:04:32,930-Speed 3353.88 samples/sec   Loss 4.3851   LearningRate 0.0269   Epoch: 9   Global Step: 119660   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:04:35,995-Speed 3341.51 samples/sec   Loss 4.4353   LearningRate 0.0269   Epoch: 9   Global Step: 119670   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:04:39,066-Speed 3336.10 samples/sec   Loss 4.5071   LearningRate 0.0269   Epoch: 9   Global Step: 119680   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:04:42,227-Speed 3239.48 samples/sec   Loss 4.3888   LearningRate 0.0269   Epoch: 9   Global Step: 119690   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:04:45,306-Speed 3327.75 samples/sec   Loss 4.4363   LearningRate 0.0268   Epoch: 9   Global Step: 119700   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:04:48,385-Speed 3326.50 samples/sec   Loss 4.4225   LearningRate 0.0268   Epoch: 9   Global Step: 119710   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:04:51,437-Speed 3356.32 samples/sec   Loss 4.2982   LearningRate 0.0268   Epoch: 9   Global Step: 119720   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:04:54,524-Speed 3318.36 samples/sec   Loss 4.4509   LearningRate 0.0268   Epoch: 9   Global Step: 119730   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:04:57,573-Speed 3360.16 samples/sec   Loss 4.5080   LearningRate 0.0268   Epoch: 9   Global Step: 119740   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:05:00,652-Speed 3326.15 samples/sec   Loss 4.3704   LearningRate 0.0268   Epoch: 9   Global Step: 119750   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:05:03,802-Speed 3252.82 samples/sec   Loss 4.4492   LearningRate 0.0268   Epoch: 9   Global Step: 119760   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:05:06,957-Speed 3246.62 samples/sec   Loss 4.4769   LearningRate 0.0268   Epoch: 9   Global Step: 119770   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:05:10,088-Speed 3271.47 samples/sec   Loss 4.3597   LearningRate 0.0268   Epoch: 9   Global Step: 119780   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:05:13,242-Speed 3248.12 samples/sec   Loss 4.4306   LearningRate 0.0268   Epoch: 9   Global Step: 119790   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:05:16,336-Speed 3310.96 samples/sec   Loss 4.3990   LearningRate 0.0268   Epoch: 9   Global Step: 119800   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:05:19,397-Speed 3345.96 samples/sec   Loss 4.5161   LearningRate 0.0268   Epoch: 9   Global Step: 119810   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:05:22,508-Speed 3292.87 samples/sec   Loss 4.3783   LearningRate 0.0268   Epoch: 9   Global Step: 119820   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:05:25,720-Speed 3188.71 samples/sec   Loss 4.3304   LearningRate 0.0268   Epoch: 9   Global Step: 119830   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:05:28,800-Speed 3325.49 samples/sec   Loss 4.4739   LearningRate 0.0268   Epoch: 9   Global Step: 119840   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:05:31,906-Speed 3298.45 samples/sec   Loss 4.3194   LearningRate 0.0268   Epoch: 9   Global Step: 119850   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:05:34,991-Speed 3320.77 samples/sec   Loss 4.3875   LearningRate 0.0268   Epoch: 9   Global Step: 119860   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:05:38,048-Speed 3349.92 samples/sec   Loss 4.4259   LearningRate 0.0268   Epoch: 9   Global Step: 119870   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:05:41,104-Speed 3352.01 samples/sec   Loss 4.4204   LearningRate 0.0268   Epoch: 9   Global Step: 119880   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:05:44,165-Speed 3346.21 samples/sec   Loss 4.3753   LearningRate 0.0268   Epoch: 9   Global Step: 119890   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:05:47,233-Speed 3339.56 samples/sec   Loss 4.3215   LearningRate 0.0268   Epoch: 9   Global Step: 119900   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:05:50,373-Speed 3261.86 samples/sec   Loss 4.3723   LearningRate 0.0268   Epoch: 9   Global Step: 119910   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:05:53,532-Speed 3242.84 samples/sec   Loss 4.3701   LearningRate 0.0268   Epoch: 9   Global Step: 119920   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:05:56,649-Speed 3285.93 samples/sec   Loss 4.2759   LearningRate 0.0268   Epoch: 9   Global Step: 119930   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:05:59,811-Speed 3239.57 samples/sec   Loss 4.3373   LearningRate 0.0267   Epoch: 9   Global Step: 119940   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:06:02,917-Speed 3298.34 samples/sec   Loss 4.4442   LearningRate 0.0267   Epoch: 9   Global Step: 119950   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:06:06,056-Speed 3262.53 samples/sec   Loss 4.4057   LearningRate 0.0267   Epoch: 9   Global Step: 119960   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:06:09,155-Speed 3304.83 samples/sec   Loss 4.3688   LearningRate 0.0267   Epoch: 9   Global Step: 119970   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:06:12,302-Speed 3255.52 samples/sec   Loss 4.4042   LearningRate 0.0267   Epoch: 9   Global Step: 119980   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:06:15,441-Speed 3263.24 samples/sec   Loss 4.4053   LearningRate 0.0267   Epoch: 9   Global Step: 119990   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:06:18,606-Speed 3236.86 samples/sec   Loss 4.3431   LearningRate 0.0267   Epoch: 9   Global Step: 120000   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:06:21,713-Speed 3296.10 samples/sec   Loss 4.3553   LearningRate 0.0267   Epoch: 9   Global Step: 120010   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:06:24,762-Speed 3359.91 samples/sec   Loss 4.4078   LearningRate 0.0267   Epoch: 9   Global Step: 120020   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:06:27,829-Speed 3339.87 samples/sec   Loss 4.3810   LearningRate 0.0267   Epoch: 9   Global Step: 120030   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:06:30,948-Speed 3284.54 samples/sec   Loss 4.3073   LearningRate 0.0267   Epoch: 9   Global Step: 120040   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:06:34,033-Speed 3320.23 samples/sec   Loss 4.4848   LearningRate 0.0267   Epoch: 9   Global Step: 120050   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:06:37,126-Speed 3311.64 samples/sec   Loss 4.3942   LearningRate 0.0267   Epoch: 9   Global Step: 120060   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:06:40,277-Speed 3250.32 samples/sec   Loss 4.3547   LearningRate 0.0267   Epoch: 9   Global Step: 120070   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:06:43,435-Speed 3243.87 samples/sec   Loss 4.3332   LearningRate 0.0267   Epoch: 9   Global Step: 120080   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:06:46,509-Speed 3332.69 samples/sec   Loss 4.3154   LearningRate 0.0267   Epoch: 9   Global Step: 120090   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:06:49,574-Speed 3341.47 samples/sec   Loss 4.4154   LearningRate 0.0267   Epoch: 9   Global Step: 120100   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:06:52,683-Speed 3295.39 samples/sec   Loss 4.4589   LearningRate 0.0267   Epoch: 9   Global Step: 120110   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:06:55,818-Speed 3267.31 samples/sec   Loss 4.4304   LearningRate 0.0267   Epoch: 9   Global Step: 120120   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:06:58,905-Speed 3318.18 samples/sec   Loss 4.3609   LearningRate 0.0267   Epoch: 9   Global Step: 120130   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:07:01,987-Speed 3324.01 samples/sec   Loss 4.4789   LearningRate 0.0267   Epoch: 9   Global Step: 120140   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:07:05,060-Speed 3332.56 samples/sec   Loss 4.3942   LearningRate 0.0267   Epoch: 9   Global Step: 120150   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:07:08,123-Speed 3344.42 samples/sec   Loss 4.4992   LearningRate 0.0267   Epoch: 9   Global Step: 120160   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:07:11,198-Speed 3331.14 samples/sec   Loss 4.4158   LearningRate 0.0267   Epoch: 9   Global Step: 120170   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:07:14,346-Speed 3254.36 samples/sec   Loss 4.3473   LearningRate 0.0266   Epoch: 9   Global Step: 120180   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:07:17,471-Speed 3277.59 samples/sec   Loss 4.3496   LearningRate 0.0266   Epoch: 9   Global Step: 120190   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:07:20,606-Speed 3267.35 samples/sec   Loss 4.3794   LearningRate 0.0266   Epoch: 9   Global Step: 120200   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:07:23,721-Speed 3288.35 samples/sec   Loss 4.3962   LearningRate 0.0266   Epoch: 9   Global Step: 120210   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:07:26,887-Speed 3235.41 samples/sec   Loss 4.3294   LearningRate 0.0266   Epoch: 9   Global Step: 120220   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:07:30,022-Speed 3267.12 samples/sec   Loss 4.4649   LearningRate 0.0266   Epoch: 9   Global Step: 120230   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:07:33,138-Speed 3287.72 samples/sec   Loss 4.5289   LearningRate 0.0266   Epoch: 9   Global Step: 120240   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:07:36,258-Speed 3283.31 samples/sec   Loss 4.3415   LearningRate 0.0266   Epoch: 9   Global Step: 120250   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:07:39,353-Speed 3308.94 samples/sec   Loss 4.3547   LearningRate 0.0266   Epoch: 9   Global Step: 120260   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:07:42,510-Speed 3244.35 samples/sec   Loss 4.3852   LearningRate 0.0266   Epoch: 9   Global Step: 120270   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:07:45,582-Speed 3334.87 samples/sec   Loss 4.4653   LearningRate 0.0266   Epoch: 9   Global Step: 120280   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:07:48,711-Speed 3274.41 samples/sec   Loss 4.3391   LearningRate 0.0266   Epoch: 9   Global Step: 120290   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:07:51,797-Speed 3319.21 samples/sec   Loss 4.4226   LearningRate 0.0266   Epoch: 9   Global Step: 120300   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:07:54,874-Speed 3329.11 samples/sec   Loss 4.4107   LearningRate 0.0266   Epoch: 9   Global Step: 120310   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:07:57,959-Speed 3319.98 samples/sec   Loss 4.3316   LearningRate 0.0266   Epoch: 9   Global Step: 120320   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:08:01,071-Speed 3290.81 samples/sec   Loss 4.3836   LearningRate 0.0266   Epoch: 9   Global Step: 120330   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:08:04,243-Speed 3229.68 samples/sec   Loss 4.3469   LearningRate 0.0266   Epoch: 9   Global Step: 120340   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:08:07,307-Speed 3343.51 samples/sec   Loss 4.3121   LearningRate 0.0266   Epoch: 9   Global Step: 120350   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:08:10,366-Speed 3347.85 samples/sec   Loss 4.4397   LearningRate 0.0266   Epoch: 9   Global Step: 120360   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:08:13,536-Speed 3231.98 samples/sec   Loss 4.3774   LearningRate 0.0266   Epoch: 9   Global Step: 120370   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:08:16,685-Speed 3252.49 samples/sec   Loss 4.3274   LearningRate 0.0266   Epoch: 9   Global Step: 120380   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:08:19,777-Speed 3313.60 samples/sec   Loss 4.3997   LearningRate 0.0266   Epoch: 9   Global Step: 120390   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:08:22,864-Speed 3318.35 samples/sec   Loss 4.3873   LearningRate 0.0266   Epoch: 9   Global Step: 120400   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:08:25,970-Speed 3297.19 samples/sec   Loss 4.3800   LearningRate 0.0266   Epoch: 9   Global Step: 120410   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:08:29,025-Speed 3353.35 samples/sec   Loss 4.4255   LearningRate 0.0265   Epoch: 9   Global Step: 120420   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:08:32,064-Speed 3371.01 samples/sec   Loss 4.4254   LearningRate 0.0265   Epoch: 9   Global Step: 120430   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:08:35,129-Speed 3342.08 samples/sec   Loss 4.4889   LearningRate 0.0265   Epoch: 9   Global Step: 120440   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:08:38,246-Speed 3286.28 samples/sec   Loss 4.3893   LearningRate 0.0265   Epoch: 9   Global Step: 120450   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:08:41,408-Speed 3239.04 samples/sec   Loss 4.3974   LearningRate 0.0265   Epoch: 9   Global Step: 120460   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:08:44,498-Speed 3315.38 samples/sec   Loss 4.4131   LearningRate 0.0265   Epoch: 9   Global Step: 120470   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:08:47,579-Speed 3324.32 samples/sec   Loss 4.4246   LearningRate 0.0265   Epoch: 9   Global Step: 120480   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:08:50,631-Speed 3356.68 samples/sec   Loss 4.4260   LearningRate 0.0265   Epoch: 9   Global Step: 120490   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:08:53,758-Speed 3275.39 samples/sec   Loss 4.4513   LearningRate 0.0265   Epoch: 9   Global Step: 120500   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:08:56,813-Speed 3352.33 samples/sec   Loss 4.4631   LearningRate 0.0265   Epoch: 9   Global Step: 120510   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:08:59,871-Speed 3350.63 samples/sec   Loss 4.4541   LearningRate 0.0265   Epoch: 9   Global Step: 120520   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:09:02,938-Speed 3339.87 samples/sec   Loss 4.5345   LearningRate 0.0265   Epoch: 9   Global Step: 120530   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:09:06,068-Speed 3272.88 samples/sec   Loss 4.3634   LearningRate 0.0265   Epoch: 9   Global Step: 120540   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:09:09,132-Speed 3342.99 samples/sec   Loss 4.3859   LearningRate 0.0265   Epoch: 9   Global Step: 120550   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:09:12,242-Speed 3293.52 samples/sec   Loss 4.4061   LearningRate 0.0265   Epoch: 9   Global Step: 120560   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:09:15,379-Speed 3266.10 samples/sec   Loss 4.2752   LearningRate 0.0265   Epoch: 9   Global Step: 120570   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:09:18,574-Speed 3205.88 samples/sec   Loss 4.4117   LearningRate 0.0265   Epoch: 9   Global Step: 120580   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:09:21,634-Speed 3346.86 samples/sec   Loss 4.3452   LearningRate 0.0265   Epoch: 9   Global Step: 120590   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:09:24,732-Speed 3306.18 samples/sec   Loss 4.3453   LearningRate 0.0265   Epoch: 9   Global Step: 120600   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:09:27,825-Speed 3312.14 samples/sec   Loss 4.4563   LearningRate 0.0265   Epoch: 9   Global Step: 120610   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:09:30,982-Speed 3244.35 samples/sec   Loss 4.3646   LearningRate 0.0265   Epoch: 9   Global Step: 120620   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:09:34,060-Speed 3327.57 samples/sec   Loss 4.3579   LearningRate 0.0265   Epoch: 9   Global Step: 120630   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:09:37,120-Speed 3348.11 samples/sec   Loss 4.5252   LearningRate 0.0265   Epoch: 9   Global Step: 120640   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:09:40,197-Speed 3328.34 samples/sec   Loss 4.3865   LearningRate 0.0265   Epoch: 9   Global Step: 120650   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:09:43,321-Speed 3279.24 samples/sec   Loss 4.3776   LearningRate 0.0264   Epoch: 9   Global Step: 120660   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:09:46,410-Speed 3316.59 samples/sec   Loss 4.3809   LearningRate 0.0264   Epoch: 9   Global Step: 120670   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:09:49,522-Speed 3291.42 samples/sec   Loss 4.3053   LearningRate 0.0264   Epoch: 9   Global Step: 120680   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:09:52,706-Speed 3217.34 samples/sec   Loss 4.4005   LearningRate 0.0264   Epoch: 9   Global Step: 120690   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:09:55,821-Speed 3288.33 samples/sec   Loss 4.4633   LearningRate 0.0264   Epoch: 9   Global Step: 120700   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:09:58,921-Speed 3303.92 samples/sec   Loss 4.3545   LearningRate 0.0264   Epoch: 9   Global Step: 120710   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:10:02,008-Speed 3318.87 samples/sec   Loss 4.4791   LearningRate 0.0264   Epoch: 9   Global Step: 120720   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:10:05,097-Speed 3315.36 samples/sec   Loss 4.4331   LearningRate 0.0264   Epoch: 9   Global Step: 120730   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:10:08,236-Speed 3262.98 samples/sec   Loss 4.4210   LearningRate 0.0264   Epoch: 9   Global Step: 120740   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:10:11,310-Speed 3332.11 samples/sec   Loss 4.4257   LearningRate 0.0264   Epoch: 9   Global Step: 120750   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:10:14,375-Speed 3342.51 samples/sec   Loss 4.3699   LearningRate 0.0264   Epoch: 9   Global Step: 120760   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:10:17,437-Speed 3345.21 samples/sec   Loss 4.4056   LearningRate 0.0264   Epoch: 9   Global Step: 120770   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:10:20,520-Speed 3323.13 samples/sec   Loss 4.3791   LearningRate 0.0264   Epoch: 9   Global Step: 120780   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:10:23,637-Speed 3285.39 samples/sec   Loss 4.4144   LearningRate 0.0264   Epoch: 9   Global Step: 120790   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:10:26,756-Speed 3284.54 samples/sec   Loss 4.3760   LearningRate 0.0264   Epoch: 9   Global Step: 120800   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:10:29,834-Speed 3327.62 samples/sec   Loss 4.3746   LearningRate 0.0264   Epoch: 9   Global Step: 120810   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:10:32,910-Speed 3330.39 samples/sec   Loss 4.4294   LearningRate 0.0264   Epoch: 9   Global Step: 120820   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:10:35,975-Speed 3342.97 samples/sec   Loss 4.4598   LearningRate 0.0264   Epoch: 9   Global Step: 120830   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:10:39,048-Speed 3332.37 samples/sec   Loss 4.4450   LearningRate 0.0264   Epoch: 9   Global Step: 120840   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:10:42,098-Speed 3359.20 samples/sec   Loss 4.4389   LearningRate 0.0264   Epoch: 9   Global Step: 120850   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:10:45,170-Speed 3333.62 samples/sec   Loss 4.3705   LearningRate 0.0264   Epoch: 9   Global Step: 120860   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:10:48,248-Speed 3327.94 samples/sec   Loss 4.3620   LearningRate 0.0264   Epoch: 9   Global Step: 120870   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:10:51,314-Speed 3341.50 samples/sec   Loss 4.4798   LearningRate 0.0264   Epoch: 9   Global Step: 120880   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:10:54,368-Speed 3354.14 samples/sec   Loss 4.4204   LearningRate 0.0264   Epoch: 9   Global Step: 120890   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:10:57,452-Speed 3321.49 samples/sec   Loss 4.4332   LearningRate 0.0264   Epoch: 9   Global Step: 120900   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:11:00,505-Speed 3354.94 samples/sec   Loss 4.2844   LearningRate 0.0263   Epoch: 9   Global Step: 120910   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:11:03,564-Speed 3348.93 samples/sec   Loss 4.4118   LearningRate 0.0263   Epoch: 9   Global Step: 120920   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:11:06,673-Speed 3296.32 samples/sec   Loss 4.4750   LearningRate 0.0263   Epoch: 9   Global Step: 120930   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:11:09,732-Speed 3348.30 samples/sec   Loss 4.4227   LearningRate 0.0263   Epoch: 9   Global Step: 120940   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:11:12,824-Speed 3312.95 samples/sec   Loss 4.4192   LearningRate 0.0263   Epoch: 9   Global Step: 120950   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:11:15,917-Speed 3311.37 samples/sec   Loss 4.4060   LearningRate 0.0263   Epoch: 9   Global Step: 120960   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:11:19,052-Speed 3267.44 samples/sec   Loss 4.3740   LearningRate 0.0263   Epoch: 9   Global Step: 120970   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:11:22,106-Speed 3354.60 samples/sec   Loss 4.3845   LearningRate 0.0263   Epoch: 9   Global Step: 120980   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:11:25,282-Speed 3224.82 samples/sec   Loss 4.4285   LearningRate 0.0263   Epoch: 9   Global Step: 120990   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:11:28,478-Speed 3205.06 samples/sec   Loss 4.4246   LearningRate 0.0263   Epoch: 9   Global Step: 121000   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:11:31,558-Speed 3325.68 samples/sec   Loss 4.3244   LearningRate 0.0263   Epoch: 9   Global Step: 121010   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:11:34,651-Speed 3312.47 samples/sec   Loss 4.3902   LearningRate 0.0263   Epoch: 9   Global Step: 121020   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:11:37,809-Speed 3243.71 samples/sec   Loss 4.3678   LearningRate 0.0263   Epoch: 9   Global Step: 121030   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:11:40,931-Speed 3280.41 samples/sec   Loss 4.4785   LearningRate 0.0263   Epoch: 9   Global Step: 121040   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:11:44,089-Speed 3244.38 samples/sec   Loss 4.3700   LearningRate 0.0263   Epoch: 9   Global Step: 121050   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:11:47,198-Speed 3294.07 samples/sec   Loss 4.3708   LearningRate 0.0263   Epoch: 9   Global Step: 121060   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:11:50,326-Speed 3275.10 samples/sec   Loss 4.4480   LearningRate 0.0263   Epoch: 9   Global Step: 121070   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:11:53,457-Speed 3270.92 samples/sec   Loss 4.5173   LearningRate 0.0263   Epoch: 9   Global Step: 121080   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:11:56,519-Speed 3346.05 samples/sec   Loss 4.4340   LearningRate 0.0263   Epoch: 9   Global Step: 121090   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:11:59,662-Speed 3258.36 samples/sec   Loss 4.4525   LearningRate 0.0263   Epoch: 9   Global Step: 121100   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:12:02,793-Speed 3271.20 samples/sec   Loss 4.4788   LearningRate 0.0263   Epoch: 9   Global Step: 121110   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:12:05,923-Speed 3272.78 samples/sec   Loss 4.4540   LearningRate 0.0263   Epoch: 9   Global Step: 121120   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:12:09,011-Speed 3317.26 samples/sec   Loss 4.3507   LearningRate 0.0263   Epoch: 9   Global Step: 121130   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:12:12,131-Speed 3283.01 samples/sec   Loss 4.3721   LearningRate 0.0263   Epoch: 9   Global Step: 121140   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:12:15,222-Speed 3314.31 samples/sec   Loss 4.4170   LearningRate 0.0262   Epoch: 9   Global Step: 121150   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:12:18,388-Speed 3235.27 samples/sec   Loss 4.3611   LearningRate 0.0262   Epoch: 9   Global Step: 121160   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:12:21,504-Speed 3288.00 samples/sec   Loss 4.3967   LearningRate 0.0262   Epoch: 9   Global Step: 121170   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:12:24,579-Speed 3331.13 samples/sec   Loss 4.3796   LearningRate 0.0262   Epoch: 9   Global Step: 121180   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:12:27,678-Speed 3304.98 samples/sec   Loss 4.4102   LearningRate 0.0262   Epoch: 9   Global Step: 121190   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:12:30,781-Speed 3300.81 samples/sec   Loss 4.4500   LearningRate 0.0262   Epoch: 9   Global Step: 121200   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:12:33,855-Speed 3332.39 samples/sec   Loss 4.3846   LearningRate 0.0262   Epoch: 9   Global Step: 121210   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:12:36,952-Speed 3307.53 samples/sec   Loss 4.4471   LearningRate 0.0262   Epoch: 9   Global Step: 121220   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:12:40,026-Speed 3332.04 samples/sec   Loss 4.3082   LearningRate 0.0262   Epoch: 9   Global Step: 121230   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:12:43,132-Speed 3298.24 samples/sec   Loss 4.3674   LearningRate 0.0262   Epoch: 9   Global Step: 121240   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:12:46,225-Speed 3312.06 samples/sec   Loss 4.5021   LearningRate 0.0262   Epoch: 9   Global Step: 121250   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:12:49,311-Speed 3319.59 samples/sec   Loss 4.5222   LearningRate 0.0262   Epoch: 9   Global Step: 121260   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:12:52,442-Speed 3270.65 samples/sec   Loss 4.4077   LearningRate 0.0262   Epoch: 9   Global Step: 121270   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:12:55,563-Speed 3283.06 samples/sec   Loss 4.4270   LearningRate 0.0262   Epoch: 9   Global Step: 121280   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:12:58,651-Speed 3316.88 samples/sec   Loss 4.3825   LearningRate 0.0262   Epoch: 9   Global Step: 121290   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:13:01,772-Speed 3281.95 samples/sec   Loss 4.4568   LearningRate 0.0262   Epoch: 9   Global Step: 121300   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:13:04,890-Speed 3285.01 samples/sec   Loss 4.4549   LearningRate 0.0262   Epoch: 9   Global Step: 121310   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:13:07,994-Speed 3300.51 samples/sec   Loss 4.4299   LearningRate 0.0262   Epoch: 9   Global Step: 121320   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:13:11,051-Speed 3350.85 samples/sec   Loss 4.4021   LearningRate 0.0262   Epoch: 9   Global Step: 121330   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:13:14,131-Speed 3325.42 samples/sec   Loss 4.5201   LearningRate 0.0262   Epoch: 9   Global Step: 121340   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:13:17,240-Speed 3294.30 samples/sec   Loss 4.3217   LearningRate 0.0262   Epoch: 9   Global Step: 121350   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:13:20,302-Speed 3344.98 samples/sec   Loss 4.4819   LearningRate 0.0262   Epoch: 9   Global Step: 121360   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:13:23,375-Speed 3333.92 samples/sec   Loss 4.4690   LearningRate 0.0262   Epoch: 9   Global Step: 121370   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:13:26,447-Speed 3334.75 samples/sec   Loss 4.4139   LearningRate 0.0262   Epoch: 9   Global Step: 121380   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:13:29,591-Speed 3257.27 samples/sec   Loss 4.4319   LearningRate 0.0261   Epoch: 9   Global Step: 121390   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:13:32,680-Speed 3317.15 samples/sec   Loss 4.4774   LearningRate 0.0261   Epoch: 9   Global Step: 121400   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:13:35,732-Speed 3355.32 samples/sec   Loss 4.4164   LearningRate 0.0261   Epoch: 9   Global Step: 121410   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:13:38,825-Speed 3312.16 samples/sec   Loss 4.4438   LearningRate 0.0261   Epoch: 9   Global Step: 121420   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:13:41,946-Speed 3281.86 samples/sec   Loss 4.4327   LearningRate 0.0261   Epoch: 9   Global Step: 121430   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:13:45,016-Speed 3337.37 samples/sec   Loss 4.3535   LearningRate 0.0261   Epoch: 9   Global Step: 121440   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:13:48,138-Speed 3280.29 samples/sec   Loss 4.4567   LearningRate 0.0261   Epoch: 9   Global Step: 121450   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:13:51,213-Speed 3331.42 samples/sec   Loss 4.4192   LearningRate 0.0261   Epoch: 9   Global Step: 121460   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:13:54,343-Speed 3272.38 samples/sec   Loss 4.4051   LearningRate 0.0261   Epoch: 9   Global Step: 121470   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:13:57,429-Speed 3320.27 samples/sec   Loss 4.3749   LearningRate 0.0261   Epoch: 9   Global Step: 121480   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:14:00,522-Speed 3311.66 samples/sec   Loss 4.3817   LearningRate 0.0261   Epoch: 9   Global Step: 121490   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:14:03,599-Speed 3328.72 samples/sec   Loss 4.4314   LearningRate 0.0261   Epoch: 9   Global Step: 121500   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:14:06,698-Speed 3305.52 samples/sec   Loss 4.3198   LearningRate 0.0261   Epoch: 9   Global Step: 121510   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:14:09,779-Speed 3324.27 samples/sec   Loss 4.3490   LearningRate 0.0261   Epoch: 9   Global Step: 121520   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:14:12,915-Speed 3266.78 samples/sec   Loss 4.3760   LearningRate 0.0261   Epoch: 9   Global Step: 121530   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:14:16,047-Speed 3271.03 samples/sec   Loss 4.3718   LearningRate 0.0261   Epoch: 9   Global Step: 121540   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:14:19,142-Speed 3308.74 samples/sec   Loss 4.4432   LearningRate 0.0261   Epoch: 9   Global Step: 121550   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:14:22,194-Speed 3356.23 samples/sec   Loss 4.4777   LearningRate 0.0261   Epoch: 9   Global Step: 121560   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:14:25,367-Speed 3228.47 samples/sec   Loss 4.4481   LearningRate 0.0261   Epoch: 9   Global Step: 121570   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:14:28,520-Speed 3249.03 samples/sec   Loss 4.3765   LearningRate 0.0261   Epoch: 9   Global Step: 121580   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:14:31,682-Speed 3239.17 samples/sec   Loss 4.4516   LearningRate 0.0261   Epoch: 9   Global Step: 121590   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:14:34,802-Speed 3283.13 samples/sec   Loss 4.3351   LearningRate 0.0261   Epoch: 9   Global Step: 121600   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:14:37,940-Speed 3264.37 samples/sec   Loss 4.3725   LearningRate 0.0261   Epoch: 9   Global Step: 121610   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:14:41,036-Speed 3308.40 samples/sec   Loss 4.3883   LearningRate 0.0261   Epoch: 9   Global Step: 121620   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:14:44,125-Speed 3316.13 samples/sec   Loss 4.4381   LearningRate 0.0260   Epoch: 9   Global Step: 121630   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:14:47,278-Speed 3248.29 samples/sec   Loss 4.3535   LearningRate 0.0260   Epoch: 9   Global Step: 121640   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:14:50,467-Speed 3212.06 samples/sec   Loss 4.4793   LearningRate 0.0260   Epoch: 9   Global Step: 121650   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:14:53,571-Speed 3300.76 samples/sec   Loss 4.3949   LearningRate 0.0260   Epoch: 9   Global Step: 121660   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:14:56,644-Speed 3333.59 samples/sec   Loss 4.3847   LearningRate 0.0260   Epoch: 9   Global Step: 121670   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:14:59,733-Speed 3315.36 samples/sec   Loss 4.3375   LearningRate 0.0260   Epoch: 9   Global Step: 121680   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:15:02,875-Speed 3260.41 samples/sec   Loss 4.3925   LearningRate 0.0260   Epoch: 9   Global Step: 121690   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:15:06,006-Speed 3271.30 samples/sec   Loss 4.3658   LearningRate 0.0260   Epoch: 9   Global Step: 121700   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:15:09,130-Speed 3279.13 samples/sec   Loss 4.3809   LearningRate 0.0260   Epoch: 9   Global Step: 121710   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:15:12,241-Speed 3292.68 samples/sec   Loss 4.2813   LearningRate 0.0260   Epoch: 9   Global Step: 121720   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:15:15,422-Speed 3220.07 samples/sec   Loss 4.3065   LearningRate 0.0260   Epoch: 9   Global Step: 121730   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:15:18,576-Speed 3247.12 samples/sec   Loss 4.4936   LearningRate 0.0260   Epoch: 9   Global Step: 121740   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:15:21,672-Speed 3308.96 samples/sec   Loss 4.4343   LearningRate 0.0260   Epoch: 9   Global Step: 121750   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:15:24,765-Speed 3312.32 samples/sec   Loss 4.3904   LearningRate 0.0260   Epoch: 9   Global Step: 121760   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:15:27,852-Speed 3318.61 samples/sec   Loss 4.5077   LearningRate 0.0260   Epoch: 9   Global Step: 121770   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:15:31,009-Speed 3244.01 samples/sec   Loss 4.4226   LearningRate 0.0260   Epoch: 9   Global Step: 121780   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:15:34,120-Speed 3292.61 samples/sec   Loss 4.3900   LearningRate 0.0260   Epoch: 9   Global Step: 121790   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:15:37,239-Speed 3284.41 samples/sec   Loss 4.3557   LearningRate 0.0260   Epoch: 9   Global Step: 121800   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:15:40,393-Speed 3247.89 samples/sec   Loss 4.4298   LearningRate 0.0260   Epoch: 9   Global Step: 121810   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:15:43,586-Speed 3207.11 samples/sec   Loss 4.4930   LearningRate 0.0260   Epoch: 9   Global Step: 121820   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:15:46,732-Speed 3256.08 samples/sec   Loss 4.4635   LearningRate 0.0260   Epoch: 9   Global Step: 121830   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:15:49,893-Speed 3241.01 samples/sec   Loss 4.3878   LearningRate 0.0260   Epoch: 9   Global Step: 121840   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:15:53,024-Speed 3271.40 samples/sec   Loss 4.4646   LearningRate 0.0260   Epoch: 9   Global Step: 121850   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:15:56,134-Speed 3294.05 samples/sec   Loss 4.4139   LearningRate 0.0260   Epoch: 9   Global Step: 121860   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:15:59,247-Speed 3290.25 samples/sec   Loss 4.3910   LearningRate 0.0260   Epoch: 9   Global Step: 121870   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:16:02,370-Speed 3279.90 samples/sec   Loss 4.3413   LearningRate 0.0259   Epoch: 9   Global Step: 121880   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:16:05,455-Speed 3320.50 samples/sec   Loss 4.3375   LearningRate 0.0259   Epoch: 9   Global Step: 121890   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:16:08,534-Speed 3326.13 samples/sec   Loss 4.4213   LearningRate 0.0259   Epoch: 9   Global Step: 121900   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:16:11,632-Speed 3307.17 samples/sec   Loss 4.4588   LearningRate 0.0259   Epoch: 9   Global Step: 121910   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:16:14,742-Speed 3293.26 samples/sec   Loss 4.3489   LearningRate 0.0259   Epoch: 9   Global Step: 121920   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:16:17,830-Speed 3317.41 samples/sec   Loss 4.4283   LearningRate 0.0259   Epoch: 9   Global Step: 121930   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:16:20,948-Speed 3285.29 samples/sec   Loss 4.4076   LearningRate 0.0259   Epoch: 9   Global Step: 121940   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:16:24,027-Speed 3326.62 samples/sec   Loss 4.4647   LearningRate 0.0259   Epoch: 9   Global Step: 121950   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:16:27,166-Speed 3265.10 samples/sec   Loss 4.3869   LearningRate 0.0259   Epoch: 9   Global Step: 121960   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:16:30,361-Speed 3205.99 samples/sec   Loss 4.3217   LearningRate 0.0259   Epoch: 9   Global Step: 121970   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:16:33,415-Speed 3354.24 samples/sec   Loss 4.4421   LearningRate 0.0259   Epoch: 9   Global Step: 121980   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:16:36,523-Speed 3295.35 samples/sec   Loss 4.5264   LearningRate 0.0259   Epoch: 9   Global Step: 121990   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:16:39,669-Speed 3256.46 samples/sec   Loss 4.3953   LearningRate 0.0259   Epoch: 9   Global Step: 122000   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:16:42,767-Speed 3305.43 samples/sec   Loss 4.4772   LearningRate 0.0259   Epoch: 9   Global Step: 122010   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:16:45,852-Speed 3321.10 samples/sec   Loss 4.3579   LearningRate 0.0259   Epoch: 9   Global Step: 122020   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:16:49,028-Speed 3224.63 samples/sec   Loss 4.3876   LearningRate 0.0259   Epoch: 9   Global Step: 122030   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:16:52,209-Speed 3221.14 samples/sec   Loss 4.4239   LearningRate 0.0259   Epoch: 9   Global Step: 122040   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:16:55,341-Speed 3269.95 samples/sec   Loss 4.3875   LearningRate 0.0259   Epoch: 9   Global Step: 122050   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:16:58,417-Speed 3330.59 samples/sec   Loss 4.4293   LearningRate 0.0259   Epoch: 9   Global Step: 122060   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:17:01,477-Speed 3346.96 samples/sec   Loss 4.4049   LearningRate 0.0259   Epoch: 9   Global Step: 122070   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:17:04,587-Speed 3293.84 samples/sec   Loss 4.4557   LearningRate 0.0259   Epoch: 9   Global Step: 122080   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:17:07,692-Speed 3299.43 samples/sec   Loss 4.3680   LearningRate 0.0259   Epoch: 9   Global Step: 122090   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:17:10,785-Speed 3311.15 samples/sec   Loss 4.4723   LearningRate 0.0259   Epoch: 9   Global Step: 122100   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:17:13,837-Speed 3356.54 samples/sec   Loss 4.4100   LearningRate 0.0259   Epoch: 9   Global Step: 122110   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:17:16,890-Speed 3354.84 samples/sec   Loss 4.4769   LearningRate 0.0258   Epoch: 9   Global Step: 122120   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:17:19,961-Speed 3336.03 samples/sec   Loss 4.3493   LearningRate 0.0258   Epoch: 9   Global Step: 122130   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:17:23,043-Speed 3323.36 samples/sec   Loss 4.3928   LearningRate 0.0258   Epoch: 9   Global Step: 122140   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:17:26,196-Speed 3248.39 samples/sec   Loss 4.4322   LearningRate 0.0258   Epoch: 9   Global Step: 122150   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:17:29,304-Speed 3296.18 samples/sec   Loss 4.4138   LearningRate 0.0258   Epoch: 9   Global Step: 122160   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:17:32,429-Speed 3277.39 samples/sec   Loss 4.3737   LearningRate 0.0258   Epoch: 9   Global Step: 122170   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:17:35,564-Speed 3267.89 samples/sec   Loss 4.3269   LearningRate 0.0258   Epoch: 9   Global Step: 122180   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:17:38,682-Speed 3285.33 samples/sec   Loss 4.4700   LearningRate 0.0258   Epoch: 9   Global Step: 122190   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:17:41,834-Speed 3250.14 samples/sec   Loss 4.3699   LearningRate 0.0258   Epoch: 9   Global Step: 122200   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:17:44,903-Speed 3336.64 samples/sec   Loss 4.3987   LearningRate 0.0258   Epoch: 9   Global Step: 122210   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:17:47,995-Speed 3314.18 samples/sec   Loss 4.4306   LearningRate 0.0258   Epoch: 9   Global Step: 122220   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:17:51,095-Speed 3303.30 samples/sec   Loss 4.3757   LearningRate 0.0258   Epoch: 9   Global Step: 122230   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:17:54,153-Speed 3350.03 samples/sec   Loss 4.4375   LearningRate 0.0258   Epoch: 9   Global Step: 122240   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:17:57,265-Speed 3291.58 samples/sec   Loss 4.4085   LearningRate 0.0258   Epoch: 9   Global Step: 122250   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:18:00,324-Speed 3348.62 samples/sec   Loss 4.3264   LearningRate 0.0258   Epoch: 9   Global Step: 122260   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:18:03,413-Speed 3316.35 samples/sec   Loss 4.3523   LearningRate 0.0258   Epoch: 9   Global Step: 122270   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:18:06,617-Speed 3196.38 samples/sec   Loss 4.4552   LearningRate 0.0258   Epoch: 9   Global Step: 122280   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:18:09,727-Speed 3293.84 samples/sec   Loss 4.3689   LearningRate 0.0258   Epoch: 9   Global Step: 122290   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:18:13,014-Speed 3116.35 samples/sec   Loss 4.4049   LearningRate 0.0258   Epoch: 9   Global Step: 122300   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:18:16,120-Speed 3297.63 samples/sec   Loss 4.3761   LearningRate 0.0258   Epoch: 9   Global Step: 122310   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:18:19,236-Speed 3287.24 samples/sec   Loss 4.4929   LearningRate 0.0258   Epoch: 9   Global Step: 122320   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:18:22,296-Speed 3348.41 samples/sec   Loss 4.3248   LearningRate 0.0258   Epoch: 9   Global Step: 122330   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:18:25,468-Speed 3228.53 samples/sec   Loss 4.4021   LearningRate 0.0258   Epoch: 9   Global Step: 122340   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:18:28,661-Speed 3207.80 samples/sec   Loss 4.4762   LearningRate 0.0258   Epoch: 9   Global Step: 122350   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:18:31,751-Speed 3315.84 samples/sec   Loss 4.5235   LearningRate 0.0258   Epoch: 9   Global Step: 122360   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:18:34,812-Speed 3346.10 samples/sec   Loss 4.4550   LearningRate 0.0257   Epoch: 9   Global Step: 122370   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:18:37,930-Speed 3285.65 samples/sec   Loss 4.4358   LearningRate 0.0257   Epoch: 9   Global Step: 122380   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:18:40,989-Speed 3347.90 samples/sec   Loss 4.4172   LearningRate 0.0257   Epoch: 9   Global Step: 122390   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:18:44,107-Speed 3284.69 samples/sec   Loss 4.4089   LearningRate 0.0257   Epoch: 9   Global Step: 122400   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:18:47,183-Speed 3330.22 samples/sec   Loss 4.3966   LearningRate 0.0257   Epoch: 9   Global Step: 122410   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:18:50,316-Speed 3269.87 samples/sec   Loss 4.3759   LearningRate 0.0257   Epoch: 9   Global Step: 122420   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:18:53,399-Speed 3322.20 samples/sec   Loss 4.3163   LearningRate 0.0257   Epoch: 9   Global Step: 122430   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:18:56,511-Speed 3292.05 samples/sec   Loss 4.3655   LearningRate 0.0257   Epoch: 9   Global Step: 122440   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:18:59,621-Speed 3293.91 samples/sec   Loss 4.4194   LearningRate 0.0257   Epoch: 9   Global Step: 122450   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:19:02,757-Speed 3266.23 samples/sec   Loss 4.4169   LearningRate 0.0257   Epoch: 9   Global Step: 122460   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:19:05,920-Speed 3238.91 samples/sec   Loss 4.4469   LearningRate 0.0257   Epoch: 9   Global Step: 122470   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:19:09,009-Speed 3315.73 samples/sec   Loss 4.4762   LearningRate 0.0257   Epoch: 9   Global Step: 122480   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:19:12,141-Speed 3270.17 samples/sec   Loss 4.3690   LearningRate 0.0257   Epoch: 9   Global Step: 122490   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:19:15,325-Speed 3216.72 samples/sec   Loss 4.5022   LearningRate 0.0257   Epoch: 9   Global Step: 122500   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:19:18,430-Speed 3299.26 samples/sec   Loss 4.4961   LearningRate 0.0257   Epoch: 9   Global Step: 122510   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:19:21,511-Speed 3324.37 samples/sec   Loss 4.3980   LearningRate 0.0257   Epoch: 9   Global Step: 122520   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:19:24,691-Speed 3220.86 samples/sec   Loss 4.3826   LearningRate 0.0257   Epoch: 9   Global Step: 122530   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:19:27,882-Speed 3210.72 samples/sec   Loss 4.3921   LearningRate 0.0257   Epoch: 9   Global Step: 122540   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:19:31,005-Speed 3280.35 samples/sec   Loss 4.4047   LearningRate 0.0257   Epoch: 9   Global Step: 122550   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:19:34,096-Speed 3313.03 samples/sec   Loss 4.4526   LearningRate 0.0257   Epoch: 9   Global Step: 122560   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:19:37,231-Speed 3267.93 samples/sec   Loss 4.5122   LearningRate 0.0257   Epoch: 9   Global Step: 122570   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:19:40,417-Speed 3215.61 samples/sec   Loss 4.4397   LearningRate 0.0257   Epoch: 9   Global Step: 122580   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:19:43,623-Speed 3194.53 samples/sec   Loss 4.4436   LearningRate 0.0257   Epoch: 9   Global Step: 122590   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:19:46,760-Speed 3265.29 samples/sec   Loss 4.3162   LearningRate 0.0257   Epoch: 9   Global Step: 122600   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:19:49,911-Speed 3251.24 samples/sec   Loss 4.3988   LearningRate 0.0256   Epoch: 9   Global Step: 122610   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:19:53,090-Speed 3222.08 samples/sec   Loss 4.4070   LearningRate 0.0256   Epoch: 9   Global Step: 122620   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:19:56,199-Speed 3294.92 samples/sec   Loss 4.4103   LearningRate 0.0256   Epoch: 9   Global Step: 122630   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:19:59,271-Speed 3334.05 samples/sec   Loss 4.4725   LearningRate 0.0256   Epoch: 9   Global Step: 122640   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:20:02,390-Speed 3284.16 samples/sec   Loss 4.4055   LearningRate 0.0256   Epoch: 9   Global Step: 122650   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:20:05,469-Speed 3327.66 samples/sec   Loss 4.4599   LearningRate 0.0256   Epoch: 9   Global Step: 122660   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:20:08,615-Speed 3255.87 samples/sec   Loss 4.4865   LearningRate 0.0256   Epoch: 9   Global Step: 122670   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:20:11,764-Speed 3252.07 samples/sec   Loss 4.3440   LearningRate 0.0256   Epoch: 9   Global Step: 122680   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:20:14,992-Speed 3173.19 samples/sec   Loss 4.3005   LearningRate 0.0256   Epoch: 9   Global Step: 122690   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:20:18,159-Speed 3234.45 samples/sec   Loss 4.5222   LearningRate 0.0256   Epoch: 9   Global Step: 122700   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:20:21,243-Speed 3322.04 samples/sec   Loss 4.4077   LearningRate 0.0256   Epoch: 9   Global Step: 122710   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:20:24,335-Speed 3313.03 samples/sec   Loss 4.3620   LearningRate 0.0256   Epoch: 9   Global Step: 122720   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:20:27,453-Speed 3285.00 samples/sec   Loss 4.3898   LearningRate 0.0256   Epoch: 9   Global Step: 122730   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:20:30,581-Speed 3274.71 samples/sec   Loss 4.4112   LearningRate 0.0256   Epoch: 9   Global Step: 122740   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-27 12:20:33,675-Speed 3311.23 samples/sec   Loss 4.4525   LearningRate 0.0256   Epoch: 9   Global Step: 122750   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-27 12:20:36,757-Speed 3323.31 samples/sec   Loss 4.4157   LearningRate 0.0256   Epoch: 9   Global Step: 122760   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-27 12:20:39,940-Speed 3218.52 samples/sec   Loss 4.4764   LearningRate 0.0256   Epoch: 9   Global Step: 122770   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-27 12:20:43,167-Speed 3174.70 samples/sec   Loss 4.4325   LearningRate 0.0256   Epoch: 9   Global Step: 122780   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-27 12:20:46,260-Speed 3310.85 samples/sec   Loss 4.5217   LearningRate 0.0256   Epoch: 9   Global Step: 122790   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-27 12:20:49,451-Speed 3209.93 samples/sec   Loss 4.3526   LearningRate 0.0256   Epoch: 9   Global Step: 122800   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-27 12:20:52,583-Speed 3270.85 samples/sec   Loss 4.4323   LearningRate 0.0256   Epoch: 9   Global Step: 122810   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-27 12:20:55,664-Speed 3324.51 samples/sec   Loss 4.3681   LearningRate 0.0256   Epoch: 9   Global Step: 122820   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-27 12:20:58,770-Speed 3297.62 samples/sec   Loss 4.3800   LearningRate 0.0256   Epoch: 9   Global Step: 122830   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-04-27 12:21:01,923-Speed 3249.15 samples/sec   Loss 4.4290   LearningRate 0.0256   Epoch: 9   Global Step: 122840   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:21:05,048-Speed 3277.69 samples/sec   Loss 4.4800   LearningRate 0.0256   Epoch: 9   Global Step: 122850   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:21:08,161-Speed 3290.22 samples/sec   Loss 4.3579   LearningRate 0.0255   Epoch: 9   Global Step: 122860   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:21:11,270-Speed 3295.06 samples/sec   Loss 4.3580   LearningRate 0.0255   Epoch: 9   Global Step: 122870   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:21:14,432-Speed 3239.48 samples/sec   Loss 4.3526   LearningRate 0.0255   Epoch: 9   Global Step: 122880   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:21:17,652-Speed 3181.79 samples/sec   Loss 4.4679   LearningRate 0.0255   Epoch: 9   Global Step: 122890   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:21:20,746-Speed 3310.42 samples/sec   Loss 4.4945   LearningRate 0.0255   Epoch: 9   Global Step: 122900   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:21:23,840-Speed 3310.07 samples/sec   Loss 4.3540   LearningRate 0.0255   Epoch: 9   Global Step: 122910   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:21:26,960-Speed 3283.66 samples/sec   Loss 4.3636   LearningRate 0.0255   Epoch: 9   Global Step: 122920   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:21:30,053-Speed 3311.29 samples/sec   Loss 4.4189   LearningRate 0.0255   Epoch: 9   Global Step: 122930   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:21:33,160-Speed 3297.41 samples/sec   Loss 4.5517   LearningRate 0.0255   Epoch: 9   Global Step: 122940   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:21:36,266-Speed 3298.32 samples/sec   Loss 4.4533   LearningRate 0.0255   Epoch: 9   Global Step: 122950   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:21:39,399-Speed 3268.56 samples/sec   Loss 4.4261   LearningRate 0.0255   Epoch: 9   Global Step: 122960   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:21:42,532-Speed 3269.81 samples/sec   Loss 4.3051   LearningRate 0.0255   Epoch: 9   Global Step: 122970   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:21:45,629-Speed 3307.53 samples/sec   Loss 4.4256   LearningRate 0.0255   Epoch: 9   Global Step: 122980   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:21:48,792-Speed 3238.48 samples/sec   Loss 4.3804   LearningRate 0.0255   Epoch: 9   Global Step: 122990   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:21:51,973-Speed 3219.72 samples/sec   Loss 4.4125   LearningRate 0.0255   Epoch: 9   Global Step: 123000   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:21:55,093-Speed 3284.17 samples/sec   Loss 4.3557   LearningRate 0.0255   Epoch: 9   Global Step: 123010   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:21:58,203-Speed 3293.24 samples/sec   Loss 4.4411   LearningRate 0.0255   Epoch: 9   Global Step: 123020   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:22:01,356-Speed 3248.25 samples/sec   Loss 4.4118   LearningRate 0.0255   Epoch: 9   Global Step: 123030   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:22:04,434-Speed 3328.56 samples/sec   Loss 4.4672   LearningRate 0.0255   Epoch: 9   Global Step: 123040   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:22:07,532-Speed 3306.36 samples/sec   Loss 4.5093   LearningRate 0.0255   Epoch: 9   Global Step: 123050   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:22:10,621-Speed 3315.10 samples/sec   Loss 4.4235   LearningRate 0.0255   Epoch: 9   Global Step: 123060   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:22:13,764-Speed 3259.48 samples/sec   Loss 4.2839   LearningRate 0.0255   Epoch: 9   Global Step: 123070   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:22:16,947-Speed 3218.14 samples/sec   Loss 4.3864   LearningRate 0.0255   Epoch: 9   Global Step: 123080   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:22:20,059-Speed 3293.12 samples/sec   Loss 4.4330   LearningRate 0.0255   Epoch: 9   Global Step: 123090   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:22:23,161-Speed 3302.54 samples/sec   Loss 4.4710   LearningRate 0.0254   Epoch: 9   Global Step: 123100   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:22:26,956-Speed 2699.00 samples/sec   Loss 4.3627   LearningRate 0.0254   Epoch: 9   Global Step: 123110   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:22:30,063-Speed 3297.51 samples/sec   Loss 4.3604   LearningRate 0.0254   Epoch: 9   Global Step: 123120   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:22:33,192-Speed 3272.91 samples/sec   Loss 4.5977   LearningRate 0.0254   Epoch: 9   Global Step: 123130   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:22:36,339-Speed 3255.81 samples/sec   Loss 4.3616   LearningRate 0.0254   Epoch: 9   Global Step: 123140   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:22:39,479-Speed 3261.93 samples/sec   Loss 4.4401   LearningRate 0.0254   Epoch: 9   Global Step: 123150   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:22:42,641-Speed 3239.64 samples/sec   Loss 4.3520   LearningRate 0.0254   Epoch: 9   Global Step: 123160   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:22:45,764-Speed 3279.70 samples/sec   Loss 4.4084   LearningRate 0.0254   Epoch: 9   Global Step: 123170   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:22:48,893-Speed 3273.59 samples/sec   Loss 4.2968   LearningRate 0.0254   Epoch: 9   Global Step: 123180   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:22:52,007-Speed 3289.51 samples/sec   Loss 4.5339   LearningRate 0.0254   Epoch: 9   Global Step: 123190   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:22:55,163-Speed 3246.23 samples/sec   Loss 4.4001   LearningRate 0.0254   Epoch: 9   Global Step: 123200   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:22:58,238-Speed 3330.98 samples/sec   Loss 4.4690   LearningRate 0.0254   Epoch: 9   Global Step: 123210   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:23:01,313-Speed 3330.51 samples/sec   Loss 4.4362   LearningRate 0.0254   Epoch: 9   Global Step: 123220   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:23:04,430-Speed 3286.39 samples/sec   Loss 4.3868   LearningRate 0.0254   Epoch: 9   Global Step: 123230   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:23:07,553-Speed 3279.93 samples/sec   Loss 4.5050   LearningRate 0.0254   Epoch: 9   Global Step: 123240   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:23:10,691-Speed 3264.81 samples/sec   Loss 4.5163   LearningRate 0.0254   Epoch: 9   Global Step: 123250   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:23:13,766-Speed 3330.65 samples/sec   Loss 4.4713   LearningRate 0.0254   Epoch: 9   Global Step: 123260   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:23:16,862-Speed 3308.44 samples/sec   Loss 4.4981   LearningRate 0.0254   Epoch: 9   Global Step: 123270   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:23:19,963-Speed 3303.88 samples/sec   Loss 4.3665   LearningRate 0.0254   Epoch: 9   Global Step: 123280   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:23:23,109-Speed 3255.60 samples/sec   Loss 4.3899   LearningRate 0.0254   Epoch: 9   Global Step: 123290   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:23:26,222-Speed 3290.89 samples/sec   Loss 4.3901   LearningRate 0.0254   Epoch: 9   Global Step: 123300   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:23:29,368-Speed 3255.55 samples/sec   Loss 4.4918   LearningRate 0.0254   Epoch: 9   Global Step: 123310   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:23:32,462-Speed 3310.47 samples/sec   Loss 4.4295   LearningRate 0.0254   Epoch: 9   Global Step: 123320   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:23:35,572-Speed 3293.72 samples/sec   Loss 4.5327   LearningRate 0.0254   Epoch: 9   Global Step: 123330   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:23:38,724-Speed 3249.35 samples/sec   Loss 4.4081   LearningRate 0.0254   Epoch: 9   Global Step: 123340   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:23:41,843-Speed 3284.78 samples/sec   Loss 4.4312   LearningRate 0.0253   Epoch: 9   Global Step: 123350   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:23:44,948-Speed 3298.16 samples/sec   Loss 4.4598   LearningRate 0.0253   Epoch: 9   Global Step: 123360   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:23:48,022-Speed 3332.35 samples/sec   Loss 4.3269   LearningRate 0.0253   Epoch: 9   Global Step: 123370   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:23:51,162-Speed 3262.43 samples/sec   Loss 4.3914   LearningRate 0.0253   Epoch: 9   Global Step: 123380   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:23:54,275-Speed 3290.50 samples/sec   Loss 4.3846   LearningRate 0.0253   Epoch: 9   Global Step: 123390   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:23:57,380-Speed 3299.16 samples/sec   Loss 4.3582   LearningRate 0.0253   Epoch: 9   Global Step: 123400   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:24:00,557-Speed 3224.27 samples/sec   Loss 4.3911   LearningRate 0.0253   Epoch: 9   Global Step: 123410   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:24:05,570-Speed 2043.17 samples/sec   Loss 4.4243   LearningRate 0.0253   Epoch: 9   Global Step: 123420   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:24:08,669-Speed 3305.68 samples/sec   Loss 4.4273   LearningRate 0.0253   Epoch: 9   Global Step: 123430   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:24:13,235-Speed 2243.42 samples/sec   Loss 4.4859   LearningRate 0.0253   Epoch: 9   Global Step: 123440   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:24:16,385-Speed 3251.56 samples/sec   Loss 4.2293   LearningRate 0.0253   Epoch: 9   Global Step: 123450   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:24:19,475-Speed 3315.01 samples/sec   Loss 4.3088   LearningRate 0.0253   Epoch: 9   Global Step: 123460   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:24:22,549-Speed 3332.23 samples/sec   Loss 4.4804   LearningRate 0.0253   Epoch: 9   Global Step: 123470   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:24:25,665-Speed 3287.74 samples/sec   Loss 4.3976   LearningRate 0.0253   Epoch: 9   Global Step: 123480   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:24:28,807-Speed 3259.23 samples/sec   Loss 4.4298   LearningRate 0.0253   Epoch: 9   Global Step: 123490   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:24:31,951-Speed 3258.39 samples/sec   Loss 4.4461   LearningRate 0.0253   Epoch: 9   Global Step: 123500   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:24:35,041-Speed 3314.58 samples/sec   Loss 4.3867   LearningRate 0.0253   Epoch: 9   Global Step: 123510   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:24:38,142-Speed 3303.67 samples/sec   Loss 4.3955   LearningRate 0.0253   Epoch: 9   Global Step: 123520   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:24:41,273-Speed 3271.53 samples/sec   Loss 4.4870   LearningRate 0.0253   Epoch: 9   Global Step: 123530   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:24:44,400-Speed 3275.64 samples/sec   Loss 4.4470   LearningRate 0.0253   Epoch: 9   Global Step: 123540   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:24:47,574-Speed 3227.69 samples/sec   Loss 4.3697   LearningRate 0.0253   Epoch: 9   Global Step: 123550   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:24:50,703-Speed 3273.04 samples/sec   Loss 4.3028   LearningRate 0.0253   Epoch: 9   Global Step: 123560   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:24:53,830-Speed 3276.28 samples/sec   Loss 4.4534   LearningRate 0.0253   Epoch: 9   Global Step: 123570   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:24:56,942-Speed 3290.94 samples/sec   Loss 4.3059   LearningRate 0.0253   Epoch: 9   Global Step: 123580   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:25:00,036-Speed 3310.34 samples/sec   Loss 4.4788   LearningRate 0.0253   Epoch: 9   Global Step: 123590   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:25:03,178-Speed 3260.04 samples/sec   Loss 4.4665   LearningRate 0.0252   Epoch: 9   Global Step: 123600   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:25:06,242-Speed 3343.30 samples/sec   Loss 4.5271   LearningRate 0.0252   Epoch: 9   Global Step: 123610   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:25:09,313-Speed 3335.86 samples/sec   Loss 4.3467   LearningRate 0.0252   Epoch: 9   Global Step: 123620   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:25:12,459-Speed 3255.81 samples/sec   Loss 4.3960   LearningRate 0.0252   Epoch: 9   Global Step: 123630   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:25:15,657-Speed 3203.31 samples/sec   Loss 4.3684   LearningRate 0.0252   Epoch: 9   Global Step: 123640   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:25:18,806-Speed 3252.41 samples/sec   Loss 4.5073   LearningRate 0.0252   Epoch: 9   Global Step: 123650   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:25:21,921-Speed 3288.86 samples/sec   Loss 4.3910   LearningRate 0.0252   Epoch: 9   Global Step: 123660   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:25:25,001-Speed 3325.86 samples/sec   Loss 4.4117   LearningRate 0.0252   Epoch: 9   Global Step: 123670   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:25:28,112-Speed 3291.61 samples/sec   Loss 4.3973   LearningRate 0.0252   Epoch: 9   Global Step: 123680   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:25:31,248-Speed 3266.35 samples/sec   Loss 4.3378   LearningRate 0.0252   Epoch: 9   Global Step: 123690   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:25:34,335-Speed 3318.32 samples/sec   Loss 4.2961   LearningRate 0.0252   Epoch: 9   Global Step: 123700   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:25:37,461-Speed 3276.35 samples/sec   Loss 4.3859   LearningRate 0.0252   Epoch: 9   Global Step: 123710   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:25:40,627-Speed 3236.11 samples/sec   Loss 4.3694   LearningRate 0.0252   Epoch: 9   Global Step: 123720   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:25:43,704-Speed 3328.94 samples/sec   Loss 4.4444   LearningRate 0.0252   Epoch: 9   Global Step: 123730   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:25:46,789-Speed 3320.06 samples/sec   Loss 4.4151   LearningRate 0.0252   Epoch: 9   Global Step: 123740   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:25:49,917-Speed 3274.20 samples/sec   Loss 4.4461   LearningRate 0.0252   Epoch: 9   Global Step: 123750   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:25:53,071-Speed 3247.58 samples/sec   Loss 4.4735   LearningRate 0.0252   Epoch: 9   Global Step: 123760   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:25:56,169-Speed 3307.36 samples/sec   Loss 4.4082   LearningRate 0.0252   Epoch: 9   Global Step: 123770   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:25:59,261-Speed 3312.82 samples/sec   Loss 4.3897   LearningRate 0.0252   Epoch: 9   Global Step: 123780   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:26:02,345-Speed 3320.54 samples/sec   Loss 4.4471   LearningRate 0.0252   Epoch: 9   Global Step: 123790   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:26:05,458-Speed 3291.21 samples/sec   Loss 4.3669   LearningRate 0.0252   Epoch: 9   Global Step: 123800   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:26:08,590-Speed 3269.90 samples/sec   Loss 4.4775   LearningRate 0.0252   Epoch: 9   Global Step: 123810   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:26:11,691-Speed 3303.03 samples/sec   Loss 4.3880   LearningRate 0.0252   Epoch: 9   Global Step: 123820   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:26:14,837-Speed 3256.33 samples/sec   Loss 4.5150   LearningRate 0.0252   Epoch: 9   Global Step: 123830   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:26:17,968-Speed 3271.60 samples/sec   Loss 4.4699   LearningRate 0.0251   Epoch: 9   Global Step: 123840   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:26:21,063-Speed 3309.72 samples/sec   Loss 4.4354   LearningRate 0.0251   Epoch: 9   Global Step: 123850   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:26:24,141-Speed 3328.03 samples/sec   Loss 4.4123   LearningRate 0.0251   Epoch: 9   Global Step: 123860   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:26:27,270-Speed 3274.00 samples/sec   Loss 4.3710   LearningRate 0.0251   Epoch: 9   Global Step: 123870   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:26:30,375-Speed 3298.38 samples/sec   Loss 4.3811   LearningRate 0.0251   Epoch: 9   Global Step: 123880   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:26:33,483-Speed 3295.83 samples/sec   Loss 4.3800   LearningRate 0.0251   Epoch: 9   Global Step: 123890   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:26:36,587-Speed 3300.00 samples/sec   Loss 4.3779   LearningRate 0.0251   Epoch: 9   Global Step: 123900   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:26:39,744-Speed 3244.18 samples/sec   Loss 4.4674   LearningRate 0.0251   Epoch: 9   Global Step: 123910   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:26:42,897-Speed 3249.29 samples/sec   Loss 4.4142   LearningRate 0.0251   Epoch: 9   Global Step: 123920   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:26:45,973-Speed 3330.57 samples/sec   Loss 4.4583   LearningRate 0.0251   Epoch: 9   Global Step: 123930   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:26:49,042-Speed 3336.98 samples/sec   Loss 4.5229   LearningRate 0.0251   Epoch: 9   Global Step: 123940   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:26:52,132-Speed 3315.25 samples/sec   Loss 4.4304   LearningRate 0.0251   Epoch: 9   Global Step: 123950   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:26:55,219-Speed 3317.80 samples/sec   Loss 4.3808   LearningRate 0.0251   Epoch: 9   Global Step: 123960   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:26:58,308-Speed 3315.90 samples/sec   Loss 4.4098   LearningRate 0.0251   Epoch: 9   Global Step: 123970   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:27:01,408-Speed 3304.25 samples/sec   Loss 4.3904   LearningRate 0.0251   Epoch: 9   Global Step: 123980   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:27:04,554-Speed 3255.80 samples/sec   Loss 4.4163   LearningRate 0.0251   Epoch: 9   Global Step: 123990   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:27:07,697-Speed 3259.31 samples/sec   Loss 4.3611   LearningRate 0.0251   Epoch: 9   Global Step: 124000   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:27:10,770-Speed 3332.73 samples/sec   Loss 4.4298   LearningRate 0.0251   Epoch: 9   Global Step: 124010   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:27:13,898-Speed 3275.05 samples/sec   Loss 4.3987   LearningRate 0.0251   Epoch: 9   Global Step: 124020   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:27:17,069-Speed 3229.98 samples/sec   Loss 4.4060   LearningRate 0.0251   Epoch: 9   Global Step: 124030   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:27:20,140-Speed 3335.76 samples/sec   Loss 4.4094   LearningRate 0.0251   Epoch: 9   Global Step: 124040   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:27:23,229-Speed 3315.65 samples/sec   Loss 4.4697   LearningRate 0.0251   Epoch: 9   Global Step: 124050   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:27:26,305-Speed 3330.26 samples/sec   Loss 4.4291   LearningRate 0.0251   Epoch: 9   Global Step: 124060   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:27:29,376-Speed 3334.81 samples/sec   Loss 4.3710   LearningRate 0.0251   Epoch: 9   Global Step: 124070   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:27:32,459-Speed 3323.13 samples/sec   Loss 4.4293   LearningRate 0.0251   Epoch: 9   Global Step: 124080   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:27:35,560-Speed 3303.15 samples/sec   Loss 4.4672   LearningRate 0.0250   Epoch: 9   Global Step: 124090   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:27:38,652-Speed 3312.13 samples/sec   Loss 4.3366   LearningRate 0.0250   Epoch: 9   Global Step: 124100   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:27:41,756-Speed 3299.91 samples/sec   Loss 4.4760   LearningRate 0.0250   Epoch: 9   Global Step: 124110   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:27:44,871-Speed 3289.14 samples/sec   Loss 4.3086   LearningRate 0.0250   Epoch: 9   Global Step: 124120   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:27:48,003-Speed 3269.57 samples/sec   Loss 4.4152   LearningRate 0.0250   Epoch: 9   Global Step: 124130   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:27:51,099-Speed 3309.02 samples/sec   Loss 4.5197   LearningRate 0.0250   Epoch: 9   Global Step: 124140   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:27:54,294-Speed 3205.26 samples/sec   Loss 4.4386   LearningRate 0.0250   Epoch: 9   Global Step: 124150   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:27:57,377-Speed 3322.70 samples/sec   Loss 4.4194   LearningRate 0.0250   Epoch: 9   Global Step: 124160   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:28:00,516-Speed 3263.43 samples/sec   Loss 4.4124   LearningRate 0.0250   Epoch: 9   Global Step: 124170   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:28:03,706-Speed 3211.57 samples/sec   Loss 4.3451   LearningRate 0.0250   Epoch: 9   Global Step: 124180   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:28:06,807-Speed 3302.97 samples/sec   Loss 4.3995   LearningRate 0.0250   Epoch: 9   Global Step: 124190   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:28:10,137-Speed 3076.21 samples/sec   Loss 4.3925   LearningRate 0.0250   Epoch: 9   Global Step: 124200   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:28:13,280-Speed 3258.62 samples/sec   Loss 4.3578   LearningRate 0.0250   Epoch: 9   Global Step: 124210   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:28:45,566-Speed 317.18 samples/sec   Loss 3.2904   LearningRate 0.0250   Epoch: 10   Global Step: 124220   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:28:49,015-Speed 2970.48 samples/sec   Loss 3.1884   LearningRate 0.0250   Epoch: 10   Global Step: 124230   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:28:52,083-Speed 3339.15 samples/sec   Loss 3.1716   LearningRate 0.0250   Epoch: 10   Global Step: 124240   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:28:55,219-Speed 3266.24 samples/sec   Loss 3.3233   LearningRate 0.0250   Epoch: 10   Global Step: 124250   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:28:58,323-Speed 3300.38 samples/sec   Loss 3.3085   LearningRate 0.0250   Epoch: 10   Global Step: 124260   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:29:01,470-Speed 3255.17 samples/sec   Loss 3.1765   LearningRate 0.0250   Epoch: 10   Global Step: 124270   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:29:04,705-Speed 3165.91 samples/sec   Loss 3.2771   LearningRate 0.0250   Epoch: 10   Global Step: 124280   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:29:07,774-Speed 3338.16 samples/sec   Loss 3.2770   LearningRate 0.0250   Epoch: 10   Global Step: 124290   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:29:10,874-Speed 3303.99 samples/sec   Loss 3.2021   LearningRate 0.0250   Epoch: 10   Global Step: 124300   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:29:14,006-Speed 3270.74 samples/sec   Loss 3.2753   LearningRate 0.0250   Epoch: 10   Global Step: 124310   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:29:17,186-Speed 3221.20 samples/sec   Loss 3.3094   LearningRate 0.0250   Epoch: 10   Global Step: 124320   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:29:20,383-Speed 3203.19 samples/sec   Loss 3.3115   LearningRate 0.0250   Epoch: 10   Global Step: 124330   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:29:23,614-Speed 3170.11 samples/sec   Loss 3.2967   LearningRate 0.0249   Epoch: 10   Global Step: 124340   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:29:27,145-Speed 2900.85 samples/sec   Loss 3.2978   LearningRate 0.0249   Epoch: 10   Global Step: 124350   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:29:30,250-Speed 3299.43 samples/sec   Loss 3.2396   LearningRate 0.0249   Epoch: 10   Global Step: 124360   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:29:33,338-Speed 3317.03 samples/sec   Loss 3.2210   LearningRate 0.0249   Epoch: 10   Global Step: 124370   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:29:36,413-Speed 3331.32 samples/sec   Loss 3.2125   LearningRate 0.0249   Epoch: 10   Global Step: 124380   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:29:39,503-Speed 3314.59 samples/sec   Loss 3.2909   LearningRate 0.0249   Epoch: 10   Global Step: 124390   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:29:42,605-Speed 3302.08 samples/sec   Loss 3.2439   LearningRate 0.0249   Epoch: 10   Global Step: 124400   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:29:45,661-Speed 3352.71 samples/sec   Loss 3.2805   LearningRate 0.0249   Epoch: 10   Global Step: 124410   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:29:48,787-Speed 3276.83 samples/sec   Loss 3.2351   LearningRate 0.0249   Epoch: 10   Global Step: 124420   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:29:51,902-Speed 3288.07 samples/sec   Loss 3.3234   LearningRate 0.0249   Epoch: 10   Global Step: 124430   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:29:54,999-Speed 3307.66 samples/sec   Loss 3.2574   LearningRate 0.0249   Epoch: 10   Global Step: 124440   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:29:58,084-Speed 3319.72 samples/sec   Loss 3.2988   LearningRate 0.0249   Epoch: 10   Global Step: 124450   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:30:01,286-Speed 3199.80 samples/sec   Loss 3.3043   LearningRate 0.0249   Epoch: 10   Global Step: 124460   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:30:04,426-Speed 3262.12 samples/sec   Loss 3.2900   LearningRate 0.0249   Epoch: 10   Global Step: 124470   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:30:07,549-Speed 3280.04 samples/sec   Loss 3.2190   LearningRate 0.0249   Epoch: 10   Global Step: 124480   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:30:10,637-Speed 3316.69 samples/sec   Loss 3.2957   LearningRate 0.0249   Epoch: 10   Global Step: 124490   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:30:13,779-Speed 3260.32 samples/sec   Loss 3.3040   LearningRate 0.0249   Epoch: 10   Global Step: 124500   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:30:16,851-Speed 3334.41 samples/sec   Loss 3.3193   LearningRate 0.0249   Epoch: 10   Global Step: 124510   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:30:19,948-Speed 3306.88 samples/sec   Loss 3.2948   LearningRate 0.0249   Epoch: 10   Global Step: 124520   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:30:23,073-Speed 3278.09 samples/sec   Loss 3.2661   LearningRate 0.0249   Epoch: 10   Global Step: 124530   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:30:26,183-Speed 3293.96 samples/sec   Loss 3.2742   LearningRate 0.0249   Epoch: 10   Global Step: 124540   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:30:29,321-Speed 3263.58 samples/sec   Loss 3.3546   LearningRate 0.0249   Epoch: 10   Global Step: 124550   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:30:32,425-Speed 3300.51 samples/sec   Loss 3.3065   LearningRate 0.0249   Epoch: 10   Global Step: 124560   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:30:35,556-Speed 3271.72 samples/sec   Loss 3.3085   LearningRate 0.0249   Epoch: 10   Global Step: 124570   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:30:38,682-Speed 3276.87 samples/sec   Loss 3.3426   LearningRate 0.0249   Epoch: 10   Global Step: 124580   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:30:41,773-Speed 3314.21 samples/sec   Loss 3.3420   LearningRate 0.0248   Epoch: 10   Global Step: 124590   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:30:44,899-Speed 3276.44 samples/sec   Loss 3.3046   LearningRate 0.0248   Epoch: 10   Global Step: 124600   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:30:48,143-Speed 3157.96 samples/sec   Loss 3.1862   LearningRate 0.0248   Epoch: 10   Global Step: 124610   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:30:51,247-Speed 3300.16 samples/sec   Loss 3.2005   LearningRate 0.0248   Epoch: 10   Global Step: 124620   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:30:54,323-Speed 3330.05 samples/sec   Loss 3.2237   LearningRate 0.0248   Epoch: 10   Global Step: 124630   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:30:57,439-Speed 3287.39 samples/sec   Loss 3.3320   LearningRate 0.0248   Epoch: 10   Global Step: 124640   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:31:00,606-Speed 3233.77 samples/sec   Loss 3.2814   LearningRate 0.0248   Epoch: 10   Global Step: 124650   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:31:03,683-Speed 3328.96 samples/sec   Loss 3.3703   LearningRate 0.0248   Epoch: 10   Global Step: 124660   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:31:06,790-Speed 3296.91 samples/sec   Loss 3.3123   LearningRate 0.0248   Epoch: 10   Global Step: 124670   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:31:09,900-Speed 3294.29 samples/sec   Loss 3.3358   LearningRate 0.0248   Epoch: 10   Global Step: 124680   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:31:13,099-Speed 3201.38 samples/sec   Loss 3.2656   LearningRate 0.0248   Epoch: 10   Global Step: 124690   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:31:16,254-Speed 3246.66 samples/sec   Loss 3.4016   LearningRate 0.0248   Epoch: 10   Global Step: 124700   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:31:19,313-Speed 3348.33 samples/sec   Loss 3.3347   LearningRate 0.0248   Epoch: 10   Global Step: 124710   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:31:22,371-Speed 3350.31 samples/sec   Loss 3.2963   LearningRate 0.0248   Epoch: 10   Global Step: 124720   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:31:25,493-Speed 3280.66 samples/sec   Loss 3.4010   LearningRate 0.0248   Epoch: 10   Global Step: 124730   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:31:28,615-Speed 3280.80 samples/sec   Loss 3.2851   LearningRate 0.0248   Epoch: 10   Global Step: 124740   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:31:31,724-Speed 3295.33 samples/sec   Loss 3.3592   LearningRate 0.0248   Epoch: 10   Global Step: 124750   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:31:34,853-Speed 3273.69 samples/sec   Loss 3.3796   LearningRate 0.0248   Epoch: 10   Global Step: 124760   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:31:37,959-Speed 3298.12 samples/sec   Loss 3.3690   LearningRate 0.0248   Epoch: 10   Global Step: 124770   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:31:41,047-Speed 3316.45 samples/sec   Loss 3.3842   LearningRate 0.0248   Epoch: 10   Global Step: 124780   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:31:44,115-Speed 3338.50 samples/sec   Loss 3.3471   LearningRate 0.0248   Epoch: 10   Global Step: 124790   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:31:47,234-Speed 3284.67 samples/sec   Loss 3.3034   LearningRate 0.0248   Epoch: 10   Global Step: 124800   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:31:50,394-Speed 3241.82 samples/sec   Loss 3.3522   LearningRate 0.0248   Epoch: 10   Global Step: 124810   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:31:53,618-Speed 3176.11 samples/sec   Loss 3.3100   LearningRate 0.0248   Epoch: 10   Global Step: 124820   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:31:56,700-Speed 3324.23 samples/sec   Loss 3.3537   LearningRate 0.0248   Epoch: 10   Global Step: 124830   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:31:59,861-Speed 3240.09 samples/sec   Loss 3.3482   LearningRate 0.0247   Epoch: 10   Global Step: 124840   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:32:03,035-Speed 3228.04 samples/sec   Loss 3.3538   LearningRate 0.0247   Epoch: 10   Global Step: 124850   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:32:06,228-Speed 3207.65 samples/sec   Loss 3.3770   LearningRate 0.0247   Epoch: 10   Global Step: 124860   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:32:09,324-Speed 3308.54 samples/sec   Loss 3.2503   LearningRate 0.0247   Epoch: 10   Global Step: 124870   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:32:12,423-Speed 3306.07 samples/sec   Loss 3.3509   LearningRate 0.0247   Epoch: 10   Global Step: 124880   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:32:15,533-Speed 3293.85 samples/sec   Loss 3.3466   LearningRate 0.0247   Epoch: 10   Global Step: 124890   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:32:18,718-Speed 3215.48 samples/sec   Loss 3.3963   LearningRate 0.0247   Epoch: 10   Global Step: 124900   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:32:21,805-Speed 3318.60 samples/sec   Loss 3.3460   LearningRate 0.0247   Epoch: 10   Global Step: 124910   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:32:24,904-Speed 3304.98 samples/sec   Loss 3.3706   LearningRate 0.0247   Epoch: 10   Global Step: 124920   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:32:28,038-Speed 3268.87 samples/sec   Loss 3.3562   LearningRate 0.0247   Epoch: 10   Global Step: 124930   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:32:31,119-Speed 3324.28 samples/sec   Loss 3.4049   LearningRate 0.0247   Epoch: 10   Global Step: 124940   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:32:34,214-Speed 3309.98 samples/sec   Loss 3.3018   LearningRate 0.0247   Epoch: 10   Global Step: 124950   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:32:37,303-Speed 3315.71 samples/sec   Loss 3.3513   LearningRate 0.0247   Epoch: 10   Global Step: 124960   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:32:40,399-Speed 3308.21 samples/sec   Loss 3.3959   LearningRate 0.0247   Epoch: 10   Global Step: 124970   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:32:43,519-Speed 3283.24 samples/sec   Loss 3.4005   LearningRate 0.0247   Epoch: 10   Global Step: 124980   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:32:46,629-Speed 3294.01 samples/sec   Loss 3.3635   LearningRate 0.0247   Epoch: 10   Global Step: 124990   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:32:49,704-Speed 3330.63 samples/sec   Loss 3.4289   LearningRate 0.0247   Epoch: 10   Global Step: 125000   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:32:52,843-Speed 3263.66 samples/sec   Loss 3.4386   LearningRate 0.0247   Epoch: 10   Global Step: 125010   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:32:55,941-Speed 3305.63 samples/sec   Loss 3.3746   LearningRate 0.0247   Epoch: 10   Global Step: 125020   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:32:59,021-Speed 3325.97 samples/sec   Loss 3.3545   LearningRate 0.0247   Epoch: 10   Global Step: 125030   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:33:02,122-Speed 3303.96 samples/sec   Loss 3.4241   LearningRate 0.0247   Epoch: 10   Global Step: 125040   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:33:05,239-Speed 3286.17 samples/sec   Loss 3.2875   LearningRate 0.0247   Epoch: 10   Global Step: 125050   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:33:08,329-Speed 3314.61 samples/sec   Loss 3.3971   LearningRate 0.0247   Epoch: 10   Global Step: 125060   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:33:11,441-Speed 3291.86 samples/sec   Loss 3.3924   LearningRate 0.0247   Epoch: 10   Global Step: 125070   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:33:14,547-Speed 3297.63 samples/sec   Loss 3.2876   LearningRate 0.0247   Epoch: 10   Global Step: 125080   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:33:17,660-Speed 3290.53 samples/sec   Loss 3.3561   LearningRate 0.0246   Epoch: 10   Global Step: 125090   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:33:20,779-Speed 3283.98 samples/sec   Loss 3.3917   LearningRate 0.0246   Epoch: 10   Global Step: 125100   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:33:23,874-Speed 3310.09 samples/sec   Loss 3.4769   LearningRate 0.0246   Epoch: 10   Global Step: 125110   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:33:26,983-Speed 3294.85 samples/sec   Loss 3.4027   LearningRate 0.0246   Epoch: 10   Global Step: 125120   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:33:30,158-Speed 3226.09 samples/sec   Loss 3.4287   LearningRate 0.0246   Epoch: 10   Global Step: 125130   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:33:33,211-Speed 3354.84 samples/sec   Loss 3.4128   LearningRate 0.0246   Epoch: 10   Global Step: 125140   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:33:36,304-Speed 3311.58 samples/sec   Loss 3.3821   LearningRate 0.0246   Epoch: 10   Global Step: 125150   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:33:39,357-Speed 3355.94 samples/sec   Loss 3.4261   LearningRate 0.0246   Epoch: 10   Global Step: 125160   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:33:42,466-Speed 3294.27 samples/sec   Loss 3.3901   LearningRate 0.0246   Epoch: 10   Global Step: 125170   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:33:45,571-Speed 3299.33 samples/sec   Loss 3.4565   LearningRate 0.0246   Epoch: 10   Global Step: 125180   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:33:48,687-Speed 3286.23 samples/sec   Loss 3.4289   LearningRate 0.0246   Epoch: 10   Global Step: 125190   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:33:51,810-Speed 3279.96 samples/sec   Loss 3.4317   LearningRate 0.0246   Epoch: 10   Global Step: 125200   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:33:54,952-Speed 3260.90 samples/sec   Loss 3.4279   LearningRate 0.0246   Epoch: 10   Global Step: 125210   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:33:58,025-Speed 3332.73 samples/sec   Loss 3.3495   LearningRate 0.0246   Epoch: 10   Global Step: 125220   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:34:01,103-Speed 3328.09 samples/sec   Loss 3.4294   LearningRate 0.0246   Epoch: 10   Global Step: 125230   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:34:04,246-Speed 3258.34 samples/sec   Loss 3.3659   LearningRate 0.0246   Epoch: 10   Global Step: 125240   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:34:07,438-Speed 3210.13 samples/sec   Loss 3.4740   LearningRate 0.0246   Epoch: 10   Global Step: 125250   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:34:10,526-Speed 3316.29 samples/sec   Loss 3.4115   LearningRate 0.0246   Epoch: 10   Global Step: 125260   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:34:13,614-Speed 3317.35 samples/sec   Loss 3.4752   LearningRate 0.0246   Epoch: 10   Global Step: 125270   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:34:16,744-Speed 3272.37 samples/sec   Loss 3.3866   LearningRate 0.0246   Epoch: 10   Global Step: 125280   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:34:19,846-Speed 3302.37 samples/sec   Loss 3.5338   LearningRate 0.0246   Epoch: 10   Global Step: 125290   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:34:22,952-Speed 3298.36 samples/sec   Loss 3.4608   LearningRate 0.0246   Epoch: 10   Global Step: 125300   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:34:26,039-Speed 3317.82 samples/sec   Loss 3.4149   LearningRate 0.0246   Epoch: 10   Global Step: 125310   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:34:29,106-Speed 3340.09 samples/sec   Loss 3.4521   LearningRate 0.0246   Epoch: 10   Global Step: 125320   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:34:32,191-Speed 3319.96 samples/sec   Loss 3.3503   LearningRate 0.0246   Epoch: 10   Global Step: 125330   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:34:35,293-Speed 3302.18 samples/sec   Loss 3.3891   LearningRate 0.0245   Epoch: 10   Global Step: 125340   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:34:38,350-Speed 3350.85 samples/sec   Loss 3.4712   LearningRate 0.0245   Epoch: 10   Global Step: 125350   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:34:41,522-Speed 3229.48 samples/sec   Loss 3.3901   LearningRate 0.0245   Epoch: 10   Global Step: 125360   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:34:44,594-Speed 3334.49 samples/sec   Loss 3.3583   LearningRate 0.0245   Epoch: 10   Global Step: 125370   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:34:47,655-Speed 3346.05 samples/sec   Loss 3.5154   LearningRate 0.0245   Epoch: 10   Global Step: 125380   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:34:50,788-Speed 3269.35 samples/sec   Loss 3.4783   LearningRate 0.0245   Epoch: 10   Global Step: 125390   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:34:53,855-Speed 3339.84 samples/sec   Loss 3.4526   LearningRate 0.0245   Epoch: 10   Global Step: 125400   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:34:56,981-Speed 3277.13 samples/sec   Loss 3.4328   LearningRate 0.0245   Epoch: 10   Global Step: 125410   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:35:00,081-Speed 3304.06 samples/sec   Loss 3.4371   LearningRate 0.0245   Epoch: 10   Global Step: 125420   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:35:03,270-Speed 3212.31 samples/sec   Loss 3.4349   LearningRate 0.0245   Epoch: 10   Global Step: 125430   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:35:06,397-Speed 3275.69 samples/sec   Loss 3.4910   LearningRate 0.0245   Epoch: 10   Global Step: 125440   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:35:09,450-Speed 3354.89 samples/sec   Loss 3.5255   LearningRate 0.0245   Epoch: 10   Global Step: 125450   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:35:12,542-Speed 3313.03 samples/sec   Loss 3.4889   LearningRate 0.0245   Epoch: 10   Global Step: 125460   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:35:15,631-Speed 3316.42 samples/sec   Loss 3.4737   LearningRate 0.0245   Epoch: 10   Global Step: 125470   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:35:18,798-Speed 3234.60 samples/sec   Loss 3.4599   LearningRate 0.0245   Epoch: 10   Global Step: 125480   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:35:21,848-Speed 3357.69 samples/sec   Loss 3.4882   LearningRate 0.0245   Epoch: 10   Global Step: 125490   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:35:24,939-Speed 3314.73 samples/sec   Loss 3.5419   LearningRate 0.0245   Epoch: 10   Global Step: 125500   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:35:28,014-Speed 3330.70 samples/sec   Loss 3.4653   LearningRate 0.0245   Epoch: 10   Global Step: 125510   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:35:31,126-Speed 3291.47 samples/sec   Loss 3.4969   LearningRate 0.0245   Epoch: 10   Global Step: 125520   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:35:34,195-Speed 3337.65 samples/sec   Loss 3.5006   LearningRate 0.0245   Epoch: 10   Global Step: 125530   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:35:37,290-Speed 3309.65 samples/sec   Loss 3.5094   LearningRate 0.0245   Epoch: 10   Global Step: 125540   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:35:40,356-Speed 3340.89 samples/sec   Loss 3.4451   LearningRate 0.0245   Epoch: 10   Global Step: 125550   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:35:43,465-Speed 3295.10 samples/sec   Loss 3.4547   LearningRate 0.0245   Epoch: 10   Global Step: 125560   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:35:46,578-Speed 3289.76 samples/sec   Loss 3.4016   LearningRate 0.0245   Epoch: 10   Global Step: 125570   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:35:49,656-Speed 3328.02 samples/sec   Loss 3.4447   LearningRate 0.0245   Epoch: 10   Global Step: 125580   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:35:52,735-Speed 3327.41 samples/sec   Loss 3.4657   LearningRate 0.0244   Epoch: 10   Global Step: 125590   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:35:55,813-Speed 3327.67 samples/sec   Loss 3.4816   LearningRate 0.0244   Epoch: 10   Global Step: 125600   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:35:58,925-Speed 3291.12 samples/sec   Loss 3.5428   LearningRate 0.0244   Epoch: 10   Global Step: 125610   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:36:02,070-Speed 3257.72 samples/sec   Loss 3.4517   LearningRate 0.0244   Epoch: 10   Global Step: 125620   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:36:05,270-Speed 3200.87 samples/sec   Loss 3.4850   LearningRate 0.0244   Epoch: 10   Global Step: 125630   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:36:08,326-Speed 3351.93 samples/sec   Loss 3.4776   LearningRate 0.0244   Epoch: 10   Global Step: 125640   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:36:11,391-Speed 3342.18 samples/sec   Loss 3.5324   LearningRate 0.0244   Epoch: 10   Global Step: 125650   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:36:14,503-Speed 3291.50 samples/sec   Loss 3.5258   LearningRate 0.0244   Epoch: 10   Global Step: 125660   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:36:17,563-Speed 3347.57 samples/sec   Loss 3.3979   LearningRate 0.0244   Epoch: 10   Global Step: 125670   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:36:20,647-Speed 3321.61 samples/sec   Loss 3.5063   LearningRate 0.0244   Epoch: 10   Global Step: 125680   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:36:23,816-Speed 3232.37 samples/sec   Loss 3.5165   LearningRate 0.0244   Epoch: 10   Global Step: 125690   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:36:26,920-Speed 3299.25 samples/sec   Loss 3.4796   LearningRate 0.0244   Epoch: 10   Global Step: 125700   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:36:30,048-Speed 3274.49 samples/sec   Loss 3.4772   LearningRate 0.0244   Epoch: 10   Global Step: 125710   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:36:33,120-Speed 3335.50 samples/sec   Loss 3.5675   LearningRate 0.0244   Epoch: 10   Global Step: 125720   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:36:36,200-Speed 3325.15 samples/sec   Loss 3.5121   LearningRate 0.0244   Epoch: 10   Global Step: 125730   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:36:39,345-Speed 3256.97 samples/sec   Loss 3.4790   LearningRate 0.0244   Epoch: 10   Global Step: 125740   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:36:42,552-Speed 3194.57 samples/sec   Loss 3.5215   LearningRate 0.0244   Epoch: 10   Global Step: 125750   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:36:45,634-Speed 3322.67 samples/sec   Loss 3.5787   LearningRate 0.0244   Epoch: 10   Global Step: 125760   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:36:48,715-Speed 3324.67 samples/sec   Loss 3.4964   LearningRate 0.0244   Epoch: 10   Global Step: 125770   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:36:51,913-Speed 3202.90 samples/sec   Loss 3.5899   LearningRate 0.0244   Epoch: 10   Global Step: 125780   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:36:54,987-Speed 3332.62 samples/sec   Loss 3.4916   LearningRate 0.0244   Epoch: 10   Global Step: 125790   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:36:58,072-Speed 3319.91 samples/sec   Loss 3.4835   LearningRate 0.0244   Epoch: 10   Global Step: 125800   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:37:01,161-Speed 3316.53 samples/sec   Loss 3.5699   LearningRate 0.0244   Epoch: 10   Global Step: 125810   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:37:04,245-Speed 3320.82 samples/sec   Loss 3.5924   LearningRate 0.0244   Epoch: 10   Global Step: 125820   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:37:07,310-Speed 3342.36 samples/sec   Loss 3.5025   LearningRate 0.0244   Epoch: 10   Global Step: 125830   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:37:10,389-Speed 3326.67 samples/sec   Loss 3.5701   LearningRate 0.0243   Epoch: 10   Global Step: 125840   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:37:13,556-Speed 3234.59 samples/sec   Loss 3.5629   LearningRate 0.0243   Epoch: 10   Global Step: 125850   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:37:16,667-Speed 3293.29 samples/sec   Loss 3.4900   LearningRate 0.0243   Epoch: 10   Global Step: 125860   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:37:19,774-Speed 3296.07 samples/sec   Loss 3.5316   LearningRate 0.0243   Epoch: 10   Global Step: 125870   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:37:22,874-Speed 3304.29 samples/sec   Loss 3.5436   LearningRate 0.0243   Epoch: 10   Global Step: 125880   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:37:26,013-Speed 3263.67 samples/sec   Loss 3.5274   LearningRate 0.0243   Epoch: 10   Global Step: 125890   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:37:29,158-Speed 3257.04 samples/sec   Loss 3.5941   LearningRate 0.0243   Epoch: 10   Global Step: 125900   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:37:32,272-Speed 3288.68 samples/sec   Loss 3.5713   LearningRate 0.0243   Epoch: 10   Global Step: 125910   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:37:35,343-Speed 3335.98 samples/sec   Loss 3.5106   LearningRate 0.0243   Epoch: 10   Global Step: 125920   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:37:38,450-Speed 3296.17 samples/sec   Loss 3.5775   LearningRate 0.0243   Epoch: 10   Global Step: 125930   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:37:41,599-Speed 3253.26 samples/sec   Loss 3.4912   LearningRate 0.0243   Epoch: 10   Global Step: 125940   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:37:44,687-Speed 3317.44 samples/sec   Loss 3.5759   LearningRate 0.0243   Epoch: 10   Global Step: 125950   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:37:47,798-Speed 3292.99 samples/sec   Loss 3.5012   LearningRate 0.0243   Epoch: 10   Global Step: 125960   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:37:50,902-Speed 3298.98 samples/sec   Loss 3.4995   LearningRate 0.0243   Epoch: 10   Global Step: 125970   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:37:54,012-Speed 3294.14 samples/sec   Loss 3.4608   LearningRate 0.0243   Epoch: 10   Global Step: 125980   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:37:57,078-Speed 3341.51 samples/sec   Loss 3.5542   LearningRate 0.0243   Epoch: 10   Global Step: 125990   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:38:00,226-Speed 3253.15 samples/sec   Loss 3.5402   LearningRate 0.0243   Epoch: 10   Global Step: 126000   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:38:03,396-Speed 3231.59 samples/sec   Loss 3.5411   LearningRate 0.0243   Epoch: 10   Global Step: 126010   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:38:06,530-Speed 3268.02 samples/sec   Loss 3.5546   LearningRate 0.0243   Epoch: 10   Global Step: 126020   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:38:09,623-Speed 3312.18 samples/sec   Loss 3.5184   LearningRate 0.0243   Epoch: 10   Global Step: 126030   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:38:12,803-Speed 3221.49 samples/sec   Loss 3.5461   LearningRate 0.0243   Epoch: 10   Global Step: 126040   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:38:15,982-Speed 3221.65 samples/sec   Loss 3.5399   LearningRate 0.0243   Epoch: 10   Global Step: 126050   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:38:19,088-Speed 3298.22 samples/sec   Loss 3.5708   LearningRate 0.0243   Epoch: 10   Global Step: 126060   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:38:22,220-Speed 3270.14 samples/sec   Loss 3.4952   LearningRate 0.0243   Epoch: 10   Global Step: 126070   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:38:25,294-Speed 3332.48 samples/sec   Loss 3.5331   LearningRate 0.0243   Epoch: 10   Global Step: 126080   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:38:28,456-Speed 3239.68 samples/sec   Loss 3.5913   LearningRate 0.0242   Epoch: 10   Global Step: 126090   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:38:31,587-Speed 3272.18 samples/sec   Loss 3.5764   LearningRate 0.0242   Epoch: 10   Global Step: 126100   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:38:34,691-Speed 3299.61 samples/sec   Loss 3.5418   LearningRate 0.0242   Epoch: 10   Global Step: 126110   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:38:37,765-Speed 3332.57 samples/sec   Loss 3.5318   LearningRate 0.0242   Epoch: 10   Global Step: 126120   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:38:40,821-Speed 3351.88 samples/sec   Loss 3.5630   LearningRate 0.0242   Epoch: 10   Global Step: 126130   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:38:43,958-Speed 3264.99 samples/sec   Loss 3.4977   LearningRate 0.0242   Epoch: 10   Global Step: 126140   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:38:47,037-Speed 3327.43 samples/sec   Loss 3.5783   LearningRate 0.0242   Epoch: 10   Global Step: 126150   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-04-27 12:38:50,075-Speed 3370.86 samples/sec   Loss 3.5099   LearningRate 0.0242   Epoch: 10   Global Step: 126160   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:38:53,183-Speed 3296.28 samples/sec   Loss 3.4800   LearningRate 0.0242   Epoch: 10   Global Step: 126170   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:38:56,251-Speed 3338.77 samples/sec   Loss 3.5775   LearningRate 0.0242   Epoch: 10   Global Step: 126180   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:38:59,285-Speed 3376.38 samples/sec   Loss 3.5676   LearningRate 0.0242   Epoch: 10   Global Step: 126190   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:39:02,379-Speed 3310.51 samples/sec   Loss 3.6424   LearningRate 0.0242   Epoch: 10   Global Step: 126200   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:39:05,451-Speed 3333.97 samples/sec   Loss 3.5671   LearningRate 0.0242   Epoch: 10   Global Step: 126210   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:39:08,528-Speed 3328.72 samples/sec   Loss 3.4800   LearningRate 0.0242   Epoch: 10   Global Step: 126220   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:39:11,693-Speed 3237.22 samples/sec   Loss 3.5593   LearningRate 0.0242   Epoch: 10   Global Step: 126230   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:39:14,790-Speed 3307.78 samples/sec   Loss 3.6146   LearningRate 0.0242   Epoch: 10   Global Step: 126240   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:39:17,848-Speed 3349.97 samples/sec   Loss 3.5667   LearningRate 0.0242   Epoch: 10   Global Step: 126250   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:39:20,907-Speed 3348.41 samples/sec   Loss 3.6126   LearningRate 0.0242   Epoch: 10   Global Step: 126260   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:39:24,049-Speed 3259.81 samples/sec   Loss 3.4893   LearningRate 0.0242   Epoch: 10   Global Step: 126270   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:39:27,142-Speed 3311.98 samples/sec   Loss 3.5622   LearningRate 0.0242   Epoch: 10   Global Step: 126280   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:39:30,188-Speed 3362.55 samples/sec   Loss 3.6750   LearningRate 0.0242   Epoch: 10   Global Step: 126290   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:39:33,278-Speed 3315.19 samples/sec   Loss 3.5352   LearningRate 0.0242   Epoch: 10   Global Step: 126300   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:39:36,356-Speed 3327.99 samples/sec   Loss 3.5226   LearningRate 0.0242   Epoch: 10   Global Step: 126310   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:39:39,550-Speed 3206.78 samples/sec   Loss 3.5363   LearningRate 0.0242   Epoch: 10   Global Step: 126320   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:39:42,629-Speed 3326.43 samples/sec   Loss 3.6804   LearningRate 0.0242   Epoch: 10   Global Step: 126330   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:39:45,700-Speed 3336.10 samples/sec   Loss 3.5386   LearningRate 0.0241   Epoch: 10   Global Step: 126340   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:39:48,776-Speed 3330.12 samples/sec   Loss 3.5372   LearningRate 0.0241   Epoch: 10   Global Step: 126350   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:39:51,938-Speed 3239.23 samples/sec   Loss 3.6800   LearningRate 0.0241   Epoch: 10   Global Step: 126360   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:39:55,095-Speed 3245.29 samples/sec   Loss 3.5709   LearningRate 0.0241   Epoch: 10   Global Step: 126370   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:39:58,160-Speed 3341.08 samples/sec   Loss 3.5316   LearningRate 0.0241   Epoch: 10   Global Step: 126380   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:40:01,258-Speed 3306.99 samples/sec   Loss 3.6381   LearningRate 0.0241   Epoch: 10   Global Step: 126390   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:40:04,337-Speed 3327.41 samples/sec   Loss 3.6057   LearningRate 0.0241   Epoch: 10   Global Step: 126400   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:40:07,479-Speed 3259.34 samples/sec   Loss 3.5730   LearningRate 0.0241   Epoch: 10   Global Step: 126410   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:40:10,556-Speed 3329.33 samples/sec   Loss 3.5422   LearningRate 0.0241   Epoch: 10   Global Step: 126420   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:40:13,693-Speed 3264.73 samples/sec   Loss 3.5414   LearningRate 0.0241   Epoch: 10   Global Step: 126430   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:40:16,782-Speed 3316.60 samples/sec   Loss 3.6800   LearningRate 0.0241   Epoch: 10   Global Step: 126440   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:40:19,931-Speed 3252.74 samples/sec   Loss 3.5674   LearningRate 0.0241   Epoch: 10   Global Step: 126450   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:40:23,071-Speed 3262.57 samples/sec   Loss 3.5544   LearningRate 0.0241   Epoch: 10   Global Step: 126460   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:40:26,140-Speed 3337.10 samples/sec   Loss 3.5690   LearningRate 0.0241   Epoch: 10   Global Step: 126470   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:40:29,272-Speed 3270.69 samples/sec   Loss 3.6430   LearningRate 0.0241   Epoch: 10   Global Step: 126480   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:40:32,372-Speed 3304.78 samples/sec   Loss 3.6589   LearningRate 0.0241   Epoch: 10   Global Step: 126490   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-04-27 12:40:35,471-Speed 3304.99 samples/sec   Loss 3.5079   LearningRate 0.0241   Epoch: 10   Global Step: 126500   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-04-27 12:40:38,552-Speed 3323.78 samples/sec   Loss 3.6046   LearningRate 0.0241   Epoch: 10   Global Step: 126510   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:40:41,631-Speed 3328.10 samples/sec   Loss 3.6558   LearningRate 0.0241   Epoch: 10   Global Step: 126520   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:40:44,730-Speed 3305.06 samples/sec   Loss 3.6464   LearningRate 0.0241   Epoch: 10   Global Step: 126530   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:40:47,842-Speed 3291.49 samples/sec   Loss 3.6726   LearningRate 0.0241   Epoch: 10   Global Step: 126540   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:40:50,916-Speed 3332.14 samples/sec   Loss 3.5770   LearningRate 0.0241   Epoch: 10   Global Step: 126550   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:40:53,973-Speed 3350.79 samples/sec   Loss 3.6030   LearningRate 0.0241   Epoch: 10   Global Step: 126560   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:40:57,067-Speed 3310.43 samples/sec   Loss 3.6012   LearningRate 0.0241   Epoch: 10   Global Step: 126570   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:41:00,138-Speed 3336.14 samples/sec   Loss 3.6427   LearningRate 0.0241   Epoch: 10   Global Step: 126580   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:41:03,253-Speed 3288.26 samples/sec   Loss 3.6410   LearningRate 0.0241   Epoch: 10   Global Step: 126590   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:41:06,356-Speed 3300.80 samples/sec   Loss 3.6190   LearningRate 0.0240   Epoch: 10   Global Step: 126600   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:41:09,429-Speed 3333.58 samples/sec   Loss 3.6052   LearningRate 0.0240   Epoch: 10   Global Step: 126610   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:41:12,520-Speed 3313.76 samples/sec   Loss 3.5866   LearningRate 0.0240   Epoch: 10   Global Step: 126620   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:41:15,620-Speed 3304.70 samples/sec   Loss 3.6389   LearningRate 0.0240   Epoch: 10   Global Step: 126630   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:41:18,686-Speed 3340.11 samples/sec   Loss 3.5105   LearningRate 0.0240   Epoch: 10   Global Step: 126640   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:41:21,743-Speed 3351.29 samples/sec   Loss 3.6205   LearningRate 0.0240   Epoch: 10   Global Step: 126650   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:41:24,839-Speed 3308.14 samples/sec   Loss 3.6416   LearningRate 0.0240   Epoch: 10   Global Step: 126660   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:41:27,959-Speed 3283.64 samples/sec   Loss 3.6162   LearningRate 0.0240   Epoch: 10   Global Step: 126670   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:41:31,055-Speed 3308.56 samples/sec   Loss 3.6054   LearningRate 0.0240   Epoch: 10   Global Step: 126680   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:41:34,147-Speed 3312.35 samples/sec   Loss 3.6965   LearningRate 0.0240   Epoch: 10   Global Step: 126690   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:41:37,287-Speed 3261.77 samples/sec   Loss 3.6113   LearningRate 0.0240   Epoch: 10   Global Step: 126700   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:41:40,341-Speed 3354.64 samples/sec   Loss 3.6528   LearningRate 0.0240   Epoch: 10   Global Step: 126710   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:41:43,428-Speed 3318.17 samples/sec   Loss 3.5532   LearningRate 0.0240   Epoch: 10   Global Step: 126720   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:41:46,505-Speed 3328.67 samples/sec   Loss 3.6191   LearningRate 0.0240   Epoch: 10   Global Step: 126730   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:41:49,640-Speed 3267.20 samples/sec   Loss 3.6336   LearningRate 0.0240   Epoch: 10   Global Step: 126740   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:41:52,738-Speed 3307.23 samples/sec   Loss 3.5923   LearningRate 0.0240   Epoch: 10   Global Step: 126750   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:41:55,905-Speed 3233.86 samples/sec   Loss 3.6170   LearningRate 0.0240   Epoch: 10   Global Step: 126760   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:41:59,001-Speed 3308.53 samples/sec   Loss 3.6817   LearningRate 0.0240   Epoch: 10   Global Step: 126770   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:42:02,086-Speed 3320.27 samples/sec   Loss 3.6767   LearningRate 0.0240   Epoch: 10   Global Step: 126780   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 12:42:05,214-Speed 3275.17 samples/sec   Loss 3.5567   LearningRate 0.0240   Epoch: 10   Global Step: 126790   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 12:42:08,262-Speed 3360.93 samples/sec   Loss 3.6441   LearningRate 0.0240   Epoch: 10   Global Step: 126800   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:42:11,345-Speed 3322.32 samples/sec   Loss 3.7325   LearningRate 0.0240   Epoch: 10   Global Step: 126810   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:42:14,421-Speed 3329.76 samples/sec   Loss 3.5792   LearningRate 0.0240   Epoch: 10   Global Step: 126820   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:42:17,481-Speed 3347.83 samples/sec   Loss 3.6471   LearningRate 0.0240   Epoch: 10   Global Step: 126830   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:42:20,545-Speed 3343.32 samples/sec   Loss 3.6224   LearningRate 0.0240   Epoch: 10   Global Step: 126840   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:42:23,619-Speed 3332.23 samples/sec   Loss 3.5543   LearningRate 0.0239   Epoch: 10   Global Step: 126850   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:42:26,695-Speed 3330.27 samples/sec   Loss 3.6167   LearningRate 0.0239   Epoch: 10   Global Step: 126860   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:42:29,780-Speed 3320.73 samples/sec   Loss 3.6604   LearningRate 0.0239   Epoch: 10   Global Step: 126870   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:42:32,872-Speed 3312.53 samples/sec   Loss 3.6359   LearningRate 0.0239   Epoch: 10   Global Step: 126880   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:42:35,966-Speed 3311.03 samples/sec   Loss 3.6416   LearningRate 0.0239   Epoch: 10   Global Step: 126890   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:42:39,093-Speed 3275.64 samples/sec   Loss 3.5871   LearningRate 0.0239   Epoch: 10   Global Step: 126900   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:42:42,181-Speed 3316.08 samples/sec   Loss 3.6097   LearningRate 0.0239   Epoch: 10   Global Step: 126910   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:42:45,289-Speed 3296.25 samples/sec   Loss 3.6908   LearningRate 0.0239   Epoch: 10   Global Step: 126920   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:42:48,464-Speed 3226.00 samples/sec   Loss 3.6517   LearningRate 0.0239   Epoch: 10   Global Step: 126930   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:42:51,554-Speed 3314.76 samples/sec   Loss 3.6263   LearningRate 0.0239   Epoch: 10   Global Step: 126940   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:42:54,643-Speed 3316.29 samples/sec   Loss 3.6564   LearningRate 0.0239   Epoch: 10   Global Step: 126950   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:42:57,728-Speed 3320.17 samples/sec   Loss 3.6346   LearningRate 0.0239   Epoch: 10   Global Step: 126960   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:43:00,825-Speed 3307.93 samples/sec   Loss 3.6171   LearningRate 0.0239   Epoch: 10   Global Step: 126970   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:43:03,926-Speed 3302.62 samples/sec   Loss 3.6563   LearningRate 0.0239   Epoch: 10   Global Step: 126980   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:43:07,013-Speed 3317.88 samples/sec   Loss 3.6626   LearningRate 0.0239   Epoch: 10   Global Step: 126990   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:43:10,063-Speed 3359.42 samples/sec   Loss 3.6975   LearningRate 0.0239   Epoch: 10   Global Step: 127000   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:43:13,144-Speed 3324.47 samples/sec   Loss 3.7069   LearningRate 0.0239   Epoch: 10   Global Step: 127010   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:43:16,231-Speed 3318.29 samples/sec   Loss 3.6891   LearningRate 0.0239   Epoch: 10   Global Step: 127020   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:43:19,310-Speed 3326.37 samples/sec   Loss 3.7005   LearningRate 0.0239   Epoch: 10   Global Step: 127030   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:43:22,388-Speed 3328.47 samples/sec   Loss 3.7700   LearningRate 0.0239   Epoch: 10   Global Step: 127040   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:43:25,540-Speed 3249.69 samples/sec   Loss 3.6991   LearningRate 0.0239   Epoch: 10   Global Step: 127050   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:43:28,693-Speed 3248.71 samples/sec   Loss 3.7148   LearningRate 0.0239   Epoch: 10   Global Step: 127060   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:43:31,832-Speed 3263.14 samples/sec   Loss 3.6598   LearningRate 0.0239   Epoch: 10   Global Step: 127070   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:43:34,938-Speed 3298.04 samples/sec   Loss 3.6561   LearningRate 0.0239   Epoch: 10   Global Step: 127080   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:43:38,065-Speed 3275.02 samples/sec   Loss 3.7333   LearningRate 0.0239   Epoch: 10   Global Step: 127090   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:43:41,161-Speed 3309.43 samples/sec   Loss 3.7402   LearningRate 0.0239   Epoch: 10   Global Step: 127100   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:43:44,303-Speed 3259.23 samples/sec   Loss 3.7204   LearningRate 0.0238   Epoch: 10   Global Step: 127110   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:43:47,397-Speed 3310.53 samples/sec   Loss 3.6245   LearningRate 0.0238   Epoch: 10   Global Step: 127120   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:43:50,487-Speed 3315.06 samples/sec   Loss 3.6499   LearningRate 0.0238   Epoch: 10   Global Step: 127130   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:43:53,618-Speed 3272.35 samples/sec   Loss 3.6873   LearningRate 0.0238   Epoch: 10   Global Step: 127140   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:43:56,713-Speed 3309.45 samples/sec   Loss 3.6267   LearningRate 0.0238   Epoch: 10   Global Step: 127150   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:43:59,818-Speed 3298.33 samples/sec   Loss 3.7563   LearningRate 0.0238   Epoch: 10   Global Step: 127160   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:44:02,917-Speed 3305.64 samples/sec   Loss 3.6301   LearningRate 0.0238   Epoch: 10   Global Step: 127170   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:44:06,065-Speed 3254.16 samples/sec   Loss 3.6183   LearningRate 0.0238   Epoch: 10   Global Step: 127180   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:44:09,175-Speed 3292.80 samples/sec   Loss 3.6785   LearningRate 0.0238   Epoch: 10   Global Step: 127190   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:44:12,266-Speed 3314.23 samples/sec   Loss 3.7136   LearningRate 0.0238   Epoch: 10   Global Step: 127200   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:44:15,375-Speed 3295.32 samples/sec   Loss 3.6855   LearningRate 0.0238   Epoch: 10   Global Step: 127210   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:44:18,518-Speed 3259.05 samples/sec   Loss 3.6956   LearningRate 0.0238   Epoch: 10   Global Step: 127220   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:44:21,602-Speed 3321.53 samples/sec   Loss 3.6599   LearningRate 0.0238   Epoch: 10   Global Step: 127230   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:44:24,844-Speed 3159.37 samples/sec   Loss 3.8049   LearningRate 0.0238   Epoch: 10   Global Step: 127240   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:44:27,985-Speed 3261.31 samples/sec   Loss 3.5800   LearningRate 0.0238   Epoch: 10   Global Step: 127250   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:44:31,155-Speed 3230.89 samples/sec   Loss 3.7151   LearningRate 0.0238   Epoch: 10   Global Step: 127260   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:44:34,225-Speed 3336.42 samples/sec   Loss 3.6949   LearningRate 0.0238   Epoch: 10   Global Step: 127270   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:44:37,311-Speed 3319.24 samples/sec   Loss 3.6228   LearningRate 0.0238   Epoch: 10   Global Step: 127280   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:44:40,525-Speed 3187.12 samples/sec   Loss 3.6924   LearningRate 0.0238   Epoch: 10   Global Step: 127290   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:44:43,616-Speed 3313.57 samples/sec   Loss 3.6452   LearningRate 0.0238   Epoch: 10   Global Step: 127300   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:44:46,691-Speed 3330.65 samples/sec   Loss 3.6739   LearningRate 0.0238   Epoch: 10   Global Step: 127310   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:44:49,781-Speed 3315.30 samples/sec   Loss 3.6889   LearningRate 0.0238   Epoch: 10   Global Step: 127320   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:44:52,901-Speed 3283.13 samples/sec   Loss 3.7707   LearningRate 0.0238   Epoch: 10   Global Step: 127330   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:44:55,971-Speed 3337.04 samples/sec   Loss 3.7576   LearningRate 0.0238   Epoch: 10   Global Step: 127340   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:44:59,105-Speed 3267.83 samples/sec   Loss 3.7677   LearningRate 0.0238   Epoch: 10   Global Step: 127350   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:45:02,221-Speed 3288.15 samples/sec   Loss 3.6750   LearningRate 0.0237   Epoch: 10   Global Step: 127360   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:45:05,344-Speed 3279.17 samples/sec   Loss 3.6880   LearningRate 0.0237   Epoch: 10   Global Step: 127370   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:45:08,443-Speed 3305.40 samples/sec   Loss 3.6666   LearningRate 0.0237   Epoch: 10   Global Step: 127380   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:45:11,542-Speed 3305.83 samples/sec   Loss 3.7599   LearningRate 0.0237   Epoch: 10   Global Step: 127390   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:45:14,689-Speed 3254.54 samples/sec   Loss 3.7611   LearningRate 0.0237   Epoch: 10   Global Step: 127400   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:45:17,783-Speed 3311.23 samples/sec   Loss 3.6517   LearningRate 0.0237   Epoch: 10   Global Step: 127410   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:45:20,882-Speed 3305.56 samples/sec   Loss 3.6029   LearningRate 0.0237   Epoch: 10   Global Step: 127420   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:45:23,999-Speed 3285.91 samples/sec   Loss 3.7519   LearningRate 0.0237   Epoch: 10   Global Step: 127430   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:45:27,098-Speed 3305.11 samples/sec   Loss 3.6918   LearningRate 0.0237   Epoch: 10   Global Step: 127440   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:45:30,208-Speed 3294.13 samples/sec   Loss 3.7679   LearningRate 0.0237   Epoch: 10   Global Step: 127450   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:45:33,301-Speed 3311.85 samples/sec   Loss 3.7138   LearningRate 0.0237   Epoch: 10   Global Step: 127460   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:45:36,404-Speed 3300.94 samples/sec   Loss 3.7099   LearningRate 0.0237   Epoch: 10   Global Step: 127470   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:45:39,603-Speed 3202.14 samples/sec   Loss 3.7295   LearningRate 0.0237   Epoch: 10   Global Step: 127480   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:45:42,734-Speed 3271.04 samples/sec   Loss 3.6810   LearningRate 0.0237   Epoch: 10   Global Step: 127490   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:45:45,820-Speed 3319.63 samples/sec   Loss 3.5438   LearningRate 0.0237   Epoch: 10   Global Step: 127500   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:45:48,923-Speed 3301.24 samples/sec   Loss 3.7153   LearningRate 0.0237   Epoch: 10   Global Step: 127510   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:45:52,080-Speed 3243.85 samples/sec   Loss 3.6059   LearningRate 0.0237   Epoch: 10   Global Step: 127520   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:45:55,242-Speed 3239.94 samples/sec   Loss 3.6889   LearningRate 0.0237   Epoch: 10   Global Step: 127530   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:45:58,333-Speed 3314.41 samples/sec   Loss 3.8115   LearningRate 0.0237   Epoch: 10   Global Step: 127540   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:46:01,455-Speed 3280.09 samples/sec   Loss 3.7152   LearningRate 0.0237   Epoch: 10   Global Step: 127550   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:46:04,598-Speed 3259.28 samples/sec   Loss 3.7302   LearningRate 0.0237   Epoch: 10   Global Step: 127560   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:46:07,739-Speed 3261.32 samples/sec   Loss 3.7160   LearningRate 0.0237   Epoch: 10   Global Step: 127570   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:46:10,833-Speed 3310.81 samples/sec   Loss 3.6838   LearningRate 0.0237   Epoch: 10   Global Step: 127580   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:46:13,916-Speed 3322.87 samples/sec   Loss 3.7103   LearningRate 0.0237   Epoch: 10   Global Step: 127590   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:46:17,006-Speed 3314.65 samples/sec   Loss 3.7308   LearningRate 0.0237   Epoch: 10   Global Step: 127600   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:46:20,069-Speed 3344.11 samples/sec   Loss 3.6849   LearningRate 0.0237   Epoch: 10   Global Step: 127610   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:46:23,177-Speed 3296.27 samples/sec   Loss 3.7735   LearningRate 0.0236   Epoch: 10   Global Step: 127620   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:46:26,350-Speed 3227.95 samples/sec   Loss 3.7050   LearningRate 0.0236   Epoch: 10   Global Step: 127630   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 12:46:29,540-Speed 3211.42 samples/sec   Loss 3.7872   LearningRate 0.0236   Epoch: 10   Global Step: 127640   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 12:46:32,634-Speed 3310.12 samples/sec   Loss 3.7354   LearningRate 0.0236   Epoch: 10   Global Step: 127650   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:46:35,724-Speed 3315.17 samples/sec   Loss 3.7464   LearningRate 0.0236   Epoch: 10   Global Step: 127660   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:46:38,878-Speed 3247.47 samples/sec   Loss 3.7340   LearningRate 0.0236   Epoch: 10   Global Step: 127670   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:46:41,963-Speed 3321.08 samples/sec   Loss 3.7461   LearningRate 0.0236   Epoch: 10   Global Step: 127680   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:46:45,042-Speed 3326.12 samples/sec   Loss 3.7460   LearningRate 0.0236   Epoch: 10   Global Step: 127690   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:46:48,161-Speed 3284.04 samples/sec   Loss 3.6932   LearningRate 0.0236   Epoch: 10   Global Step: 127700   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:46:51,279-Speed 3285.14 samples/sec   Loss 3.7332   LearningRate 0.0236   Epoch: 10   Global Step: 127710   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:46:54,404-Speed 3277.71 samples/sec   Loss 3.7654   LearningRate 0.0236   Epoch: 10   Global Step: 127720   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:46:57,530-Speed 3276.68 samples/sec   Loss 3.7145   LearningRate 0.0236   Epoch: 10   Global Step: 127730   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:47:00,636-Speed 3297.65 samples/sec   Loss 3.7007   LearningRate 0.0236   Epoch: 10   Global Step: 127740   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:47:03,752-Speed 3287.60 samples/sec   Loss 3.7644   LearningRate 0.0236   Epoch: 10   Global Step: 127750   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:47:06,826-Speed 3332.63 samples/sec   Loss 3.7251   LearningRate 0.0236   Epoch: 10   Global Step: 127760   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:47:09,902-Speed 3329.96 samples/sec   Loss 3.6969   LearningRate 0.0236   Epoch: 10   Global Step: 127770   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:47:13,020-Speed 3285.43 samples/sec   Loss 3.7622   LearningRate 0.0236   Epoch: 10   Global Step: 127780   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:47:16,140-Speed 3282.64 samples/sec   Loss 3.7521   LearningRate 0.0236   Epoch: 10   Global Step: 127790   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:47:19,288-Speed 3254.20 samples/sec   Loss 3.7874   LearningRate 0.0236   Epoch: 10   Global Step: 127800   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:47:22,419-Speed 3271.34 samples/sec   Loss 3.7486   LearningRate 0.0236   Epoch: 10   Global Step: 127810   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:47:25,517-Speed 3306.89 samples/sec   Loss 3.7587   LearningRate 0.0236   Epoch: 10   Global Step: 127820   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:47:28,661-Speed 3257.80 samples/sec   Loss 3.7843   LearningRate 0.0236   Epoch: 10   Global Step: 127830   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:47:31,771-Speed 3293.99 samples/sec   Loss 3.7176   LearningRate 0.0236   Epoch: 10   Global Step: 127840   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:47:34,845-Speed 3332.29 samples/sec   Loss 3.7402   LearningRate 0.0236   Epoch: 10   Global Step: 127850   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:47:37,925-Speed 3325.78 samples/sec   Loss 3.7052   LearningRate 0.0236   Epoch: 10   Global Step: 127860   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:47:41,026-Speed 3302.75 samples/sec   Loss 3.6566   LearningRate 0.0235   Epoch: 10   Global Step: 127870   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:47:44,160-Speed 3268.65 samples/sec   Loss 3.7601   LearningRate 0.0235   Epoch: 10   Global Step: 127880   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:47:47,274-Speed 3289.28 samples/sec   Loss 3.8932   LearningRate 0.0235   Epoch: 10   Global Step: 127890   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:47:50,379-Speed 3298.98 samples/sec   Loss 3.7136   LearningRate 0.0235   Epoch: 10   Global Step: 127900   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:47:53,528-Speed 3252.79 samples/sec   Loss 3.7245   LearningRate 0.0235   Epoch: 10   Global Step: 127910   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 12:47:56,575-Speed 3361.65 samples/sec   Loss 3.7328   LearningRate 0.0235   Epoch: 10   Global Step: 127920   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:47:59,745-Speed 3231.28 samples/sec   Loss 3.7873   LearningRate 0.0235   Epoch: 10   Global Step: 127930   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:48:02,936-Speed 3210.34 samples/sec   Loss 3.6968   LearningRate 0.0235   Epoch: 10   Global Step: 127940   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:48:06,124-Speed 3212.45 samples/sec   Loss 3.8140   LearningRate 0.0235   Epoch: 10   Global Step: 127950   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:48:09,211-Speed 3318.37 samples/sec   Loss 3.7234   LearningRate 0.0235   Epoch: 10   Global Step: 127960   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:48:12,452-Speed 3160.93 samples/sec   Loss 3.7781   LearningRate 0.0235   Epoch: 10   Global Step: 127970   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:48:15,601-Speed 3252.60 samples/sec   Loss 3.6861   LearningRate 0.0235   Epoch: 10   Global Step: 127980   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:48:18,752-Speed 3250.89 samples/sec   Loss 3.7990   LearningRate 0.0235   Epoch: 10   Global Step: 127990   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:48:21,855-Speed 3300.81 samples/sec   Loss 3.7581   LearningRate 0.0235   Epoch: 10   Global Step: 128000   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:48:25,000-Speed 3256.58 samples/sec   Loss 3.8034   LearningRate 0.0235   Epoch: 10   Global Step: 128010   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:48:28,101-Speed 3303.10 samples/sec   Loss 3.8114   LearningRate 0.0235   Epoch: 10   Global Step: 128020   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 12:48:31,220-Speed 3284.23 samples/sec   Loss 3.8173   LearningRate 0.0235   Epoch: 10   Global Step: 128030   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 12:48:34,352-Speed 3271.31 samples/sec   Loss 3.7872   LearningRate 0.0235   Epoch: 10   Global Step: 128040   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:48:37,438-Speed 3319.08 samples/sec   Loss 3.6613   LearningRate 0.0235   Epoch: 10   Global Step: 128050   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:48:40,553-Speed 3287.97 samples/sec   Loss 3.8302   LearningRate 0.0235   Epoch: 10   Global Step: 128060   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:48:43,648-Speed 3309.68 samples/sec   Loss 3.7696   LearningRate 0.0235   Epoch: 10   Global Step: 128070   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:48:46,730-Speed 3323.58 samples/sec   Loss 3.8566   LearningRate 0.0235   Epoch: 10   Global Step: 128080   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:48:49,868-Speed 3264.20 samples/sec   Loss 3.8196   LearningRate 0.0235   Epoch: 10   Global Step: 128090   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:48:53,122-Speed 3147.93 samples/sec   Loss 3.7215   LearningRate 0.0235   Epoch: 10   Global Step: 128100   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:48:56,280-Speed 3244.19 samples/sec   Loss 3.7147   LearningRate 0.0235   Epoch: 10   Global Step: 128110   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:48:59,345-Speed 3341.72 samples/sec   Loss 3.8471   LearningRate 0.0235   Epoch: 10   Global Step: 128120   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:49:02,437-Speed 3312.62 samples/sec   Loss 3.7650   LearningRate 0.0234   Epoch: 10   Global Step: 128130   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:49:05,589-Speed 3249.61 samples/sec   Loss 3.7659   LearningRate 0.0234   Epoch: 10   Global Step: 128140   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 12:49:08,729-Speed 3262.67 samples/sec   Loss 3.7680   LearningRate 0.0234   Epoch: 10   Global Step: 128150   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 12:49:11,810-Speed 3324.42 samples/sec   Loss 3.7809   LearningRate 0.0234   Epoch: 10   Global Step: 128160   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 12:49:14,985-Speed 3226.47 samples/sec   Loss 3.7199   LearningRate 0.0234   Epoch: 10   Global Step: 128170   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 12:49:18,125-Speed 3261.91 samples/sec   Loss 3.7406   LearningRate 0.0234   Epoch: 10   Global Step: 128180   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 12:49:21,193-Speed 3338.58 samples/sec   Loss 3.7982   LearningRate 0.0234   Epoch: 10   Global Step: 128190   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 12:49:24,272-Speed 3327.12 samples/sec   Loss 3.7502   LearningRate 0.0234   Epoch: 10   Global Step: 128200   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 12:49:27,316-Speed 3365.47 samples/sec   Loss 3.7969   LearningRate 0.0234   Epoch: 10   Global Step: 128210   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:49:30,492-Speed 3224.79 samples/sec   Loss 3.8076   LearningRate 0.0234   Epoch: 10   Global Step: 128220   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:49:33,598-Speed 3297.71 samples/sec   Loss 3.8186   LearningRate 0.0234   Epoch: 10   Global Step: 128230   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:49:36,727-Speed 3273.89 samples/sec   Loss 3.7770   LearningRate 0.0234   Epoch: 10   Global Step: 128240   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:49:39,868-Speed 3261.60 samples/sec   Loss 3.8218   LearningRate 0.0234   Epoch: 10   Global Step: 128250   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:49:43,003-Speed 3267.01 samples/sec   Loss 3.8460   LearningRate 0.0234   Epoch: 10   Global Step: 128260   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:49:46,066-Speed 3344.40 samples/sec   Loss 3.7495   LearningRate 0.0234   Epoch: 10   Global Step: 128270   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:49:49,200-Speed 3267.65 samples/sec   Loss 3.6818   LearningRate 0.0234   Epoch: 10   Global Step: 128280   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:49:52,334-Speed 3269.51 samples/sec   Loss 3.8581   LearningRate 0.0234   Epoch: 10   Global Step: 128290   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:49:55,409-Speed 3331.52 samples/sec   Loss 3.7808   LearningRate 0.0234   Epoch: 10   Global Step: 128300   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:49:58,500-Speed 3312.83 samples/sec   Loss 3.8327   LearningRate 0.0234   Epoch: 10   Global Step: 128310   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 12:50:01,643-Speed 3259.35 samples/sec   Loss 3.7786   LearningRate 0.0234   Epoch: 10   Global Step: 128320   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 12:50:04,841-Speed 3203.48 samples/sec   Loss 3.8686   LearningRate 0.0234   Epoch: 10   Global Step: 128330   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 12:50:07,889-Speed 3359.94 samples/sec   Loss 3.8362   LearningRate 0.0234   Epoch: 10   Global Step: 128340   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:50:10,926-Speed 3373.43 samples/sec   Loss 3.8726   LearningRate 0.0234   Epoch: 10   Global Step: 128350   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:50:14,073-Speed 3254.50 samples/sec   Loss 3.7942   LearningRate 0.0234   Epoch: 10   Global Step: 128360   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:50:17,286-Speed 3188.35 samples/sec   Loss 3.8330   LearningRate 0.0234   Epoch: 10   Global Step: 128370   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:50:20,383-Speed 3306.54 samples/sec   Loss 3.8647   LearningRate 0.0233   Epoch: 10   Global Step: 128380   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:50:23,497-Speed 3289.45 samples/sec   Loss 3.7671   LearningRate 0.0233   Epoch: 10   Global Step: 128390   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:50:26,613-Speed 3288.02 samples/sec   Loss 3.7557   LearningRate 0.0233   Epoch: 10   Global Step: 128400   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:50:29,814-Speed 3199.20 samples/sec   Loss 3.7371   LearningRate 0.0233   Epoch: 10   Global Step: 128410   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:50:32,964-Speed 3251.89 samples/sec   Loss 3.8818   LearningRate 0.0233   Epoch: 10   Global Step: 128420   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:50:36,042-Speed 3328.39 samples/sec   Loss 3.8631   LearningRate 0.0233   Epoch: 10   Global Step: 128430   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:50:39,177-Speed 3267.36 samples/sec   Loss 3.8444   LearningRate 0.0233   Epoch: 10   Global Step: 128440   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:50:42,309-Speed 3270.75 samples/sec   Loss 3.8520   LearningRate 0.0233   Epoch: 10   Global Step: 128450   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:50:45,374-Speed 3341.81 samples/sec   Loss 3.7969   LearningRate 0.0233   Epoch: 10   Global Step: 128460   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:50:48,487-Speed 3289.96 samples/sec   Loss 3.8759   LearningRate 0.0233   Epoch: 10   Global Step: 128470   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:50:51,595-Speed 3296.16 samples/sec   Loss 3.8114   LearningRate 0.0233   Epoch: 10   Global Step: 128480   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:50:54,740-Speed 3257.14 samples/sec   Loss 3.7409   LearningRate 0.0233   Epoch: 10   Global Step: 128490   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:50:57,810-Speed 3336.97 samples/sec   Loss 3.8628   LearningRate 0.0233   Epoch: 10   Global Step: 128500   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:51:00,881-Speed 3335.04 samples/sec   Loss 3.8047   LearningRate 0.0233   Epoch: 10   Global Step: 128510   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:51:03,979-Speed 3305.99 samples/sec   Loss 3.7693   LearningRate 0.0233   Epoch: 10   Global Step: 128520   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:51:07,102-Speed 3280.31 samples/sec   Loss 3.7891   LearningRate 0.0233   Epoch: 10   Global Step: 128530   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:51:10,165-Speed 3344.33 samples/sec   Loss 3.7904   LearningRate 0.0233   Epoch: 10   Global Step: 128540   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:51:13,265-Speed 3303.97 samples/sec   Loss 3.7719   LearningRate 0.0233   Epoch: 10   Global Step: 128550   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 12:51:16,359-Speed 3310.36 samples/sec   Loss 3.7773   LearningRate 0.0233   Epoch: 10   Global Step: 128560   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:51:19,466-Speed 3297.41 samples/sec   Loss 3.6752   LearningRate 0.0233   Epoch: 10   Global Step: 128570   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:51:22,543-Speed 3329.56 samples/sec   Loss 3.8155   LearningRate 0.0233   Epoch: 10   Global Step: 128580   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:51:25,648-Speed 3298.37 samples/sec   Loss 3.8267   LearningRate 0.0233   Epoch: 10   Global Step: 128590   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:51:28,716-Speed 3339.40 samples/sec   Loss 3.7180   LearningRate 0.0233   Epoch: 10   Global Step: 128600   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:51:31,787-Speed 3335.44 samples/sec   Loss 3.8051   LearningRate 0.0233   Epoch: 10   Global Step: 128610   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:51:34,854-Speed 3339.61 samples/sec   Loss 3.8011   LearningRate 0.0233   Epoch: 10   Global Step: 128620   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:51:37,914-Speed 3347.25 samples/sec   Loss 3.8102   LearningRate 0.0233   Epoch: 10   Global Step: 128630   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:51:40,985-Speed 3335.07 samples/sec   Loss 3.8049   LearningRate 0.0232   Epoch: 10   Global Step: 128640   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:51:44,102-Speed 3286.92 samples/sec   Loss 3.8014   LearningRate 0.0232   Epoch: 10   Global Step: 128650   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:51:47,230-Speed 3274.23 samples/sec   Loss 3.8319   LearningRate 0.0232   Epoch: 10   Global Step: 128660   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 12:51:50,374-Speed 3259.04 samples/sec   Loss 3.7001   LearningRate 0.0232   Epoch: 10   Global Step: 128670   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 12:51:53,489-Speed 3287.58 samples/sec   Loss 3.8010   LearningRate 0.0232   Epoch: 10   Global Step: 128680   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 12:51:56,568-Speed 3327.45 samples/sec   Loss 3.7193   LearningRate 0.0232   Epoch: 10   Global Step: 128690   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:51:59,701-Speed 3269.61 samples/sec   Loss 3.8152   LearningRate 0.0232   Epoch: 10   Global Step: 128700   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:52:02,842-Speed 3260.79 samples/sec   Loss 3.7980   LearningRate 0.0232   Epoch: 10   Global Step: 128710   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:52:05,956-Speed 3289.47 samples/sec   Loss 3.7785   LearningRate 0.0232   Epoch: 10   Global Step: 128720   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:52:09,019-Speed 3344.62 samples/sec   Loss 3.8519   LearningRate 0.0232   Epoch: 10   Global Step: 128730   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:52:12,124-Speed 3298.30 samples/sec   Loss 3.7489   LearningRate 0.0232   Epoch: 10   Global Step: 128740   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:52:15,203-Speed 3327.85 samples/sec   Loss 3.7510   LearningRate 0.0232   Epoch: 10   Global Step: 128750   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:52:18,278-Speed 3330.61 samples/sec   Loss 3.8602   LearningRate 0.0232   Epoch: 10   Global Step: 128760   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:52:21,356-Speed 3328.49 samples/sec   Loss 3.8495   LearningRate 0.0232   Epoch: 10   Global Step: 128770   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:52:24,452-Speed 3308.03 samples/sec   Loss 3.8375   LearningRate 0.0232   Epoch: 10   Global Step: 128780   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:52:27,612-Speed 3241.38 samples/sec   Loss 3.8349   LearningRate 0.0232   Epoch: 10   Global Step: 128790   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:52:30,696-Speed 3322.11 samples/sec   Loss 3.7733   LearningRate 0.0232   Epoch: 10   Global Step: 128800   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:52:33,753-Speed 3350.07 samples/sec   Loss 3.7879   LearningRate 0.0232   Epoch: 10   Global Step: 128810   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:52:36,860-Speed 3297.50 samples/sec   Loss 3.8613   LearningRate 0.0232   Epoch: 10   Global Step: 128820   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:52:39,965-Speed 3298.62 samples/sec   Loss 3.7310   LearningRate 0.0232   Epoch: 10   Global Step: 128830   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:52:43,071-Speed 3298.31 samples/sec   Loss 3.8361   LearningRate 0.0232   Epoch: 10   Global Step: 128840   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:52:46,166-Speed 3308.41 samples/sec   Loss 3.7843   LearningRate 0.0232   Epoch: 10   Global Step: 128850   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:52:49,302-Speed 3267.00 samples/sec   Loss 3.8266   LearningRate 0.0232   Epoch: 10   Global Step: 128860   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:52:52,427-Speed 3277.90 samples/sec   Loss 3.8286   LearningRate 0.0232   Epoch: 10   Global Step: 128870   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:52:55,498-Speed 3335.14 samples/sec   Loss 3.8382   LearningRate 0.0232   Epoch: 10   Global Step: 128880   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:52:58,565-Speed 3339.51 samples/sec   Loss 3.8092   LearningRate 0.0232   Epoch: 10   Global Step: 128890   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:53:01,642-Speed 3329.12 samples/sec   Loss 3.8398   LearningRate 0.0231   Epoch: 10   Global Step: 128900   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:53:04,724-Speed 3323.48 samples/sec   Loss 3.7945   LearningRate 0.0231   Epoch: 10   Global Step: 128910   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:53:07,817-Speed 3312.43 samples/sec   Loss 3.7780   LearningRate 0.0231   Epoch: 10   Global Step: 128920   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:53:10,914-Speed 3306.65 samples/sec   Loss 3.8160   LearningRate 0.0231   Epoch: 10   Global Step: 128930   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:53:14,039-Speed 3278.12 samples/sec   Loss 3.8571   LearningRate 0.0231   Epoch: 10   Global Step: 128940   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:53:17,177-Speed 3264.36 samples/sec   Loss 3.8354   LearningRate 0.0231   Epoch: 10   Global Step: 128950   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:53:20,256-Speed 3327.79 samples/sec   Loss 3.8081   LearningRate 0.0231   Epoch: 10   Global Step: 128960   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:53:23,409-Speed 3248.12 samples/sec   Loss 3.7629   LearningRate 0.0231   Epoch: 10   Global Step: 128970   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:53:26,535-Speed 3277.18 samples/sec   Loss 3.7547   LearningRate 0.0231   Epoch: 10   Global Step: 128980   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:53:29,662-Speed 3275.52 samples/sec   Loss 3.7762   LearningRate 0.0231   Epoch: 10   Global Step: 128990   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:53:32,752-Speed 3314.92 samples/sec   Loss 3.7952   LearningRate 0.0231   Epoch: 10   Global Step: 129000   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 12:53:35,920-Speed 3233.85 samples/sec   Loss 3.8773   LearningRate 0.0231   Epoch: 10   Global Step: 129010   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 12:53:39,048-Speed 3273.91 samples/sec   Loss 3.9231   LearningRate 0.0231   Epoch: 10   Global Step: 129020   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 12:53:42,185-Speed 3265.36 samples/sec   Loss 3.7902   LearningRate 0.0231   Epoch: 10   Global Step: 129030   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 12:53:45,288-Speed 3301.49 samples/sec   Loss 3.6934   LearningRate 0.0231   Epoch: 10   Global Step: 129040   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 12:53:48,390-Speed 3301.64 samples/sec   Loss 3.8443   LearningRate 0.0231   Epoch: 10   Global Step: 129050   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 12:53:51,536-Speed 3255.71 samples/sec   Loss 3.7716   LearningRate 0.0231   Epoch: 10   Global Step: 129060   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 12:53:54,710-Speed 3228.05 samples/sec   Loss 3.7850   LearningRate 0.0231   Epoch: 10   Global Step: 129070   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 12:53:57,830-Speed 3282.34 samples/sec   Loss 3.7483   LearningRate 0.0231   Epoch: 10   Global Step: 129080   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 12:54:00,982-Speed 3250.04 samples/sec   Loss 3.8315   LearningRate 0.0231   Epoch: 10   Global Step: 129090   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 12:54:04,119-Speed 3265.18 samples/sec   Loss 3.8524   LearningRate 0.0231   Epoch: 10   Global Step: 129100   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:54:07,280-Speed 3240.56 samples/sec   Loss 3.8754   LearningRate 0.0231   Epoch: 10   Global Step: 129110   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:54:10,391-Speed 3292.80 samples/sec   Loss 3.7767   LearningRate 0.0231   Epoch: 10   Global Step: 129120   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:54:13,487-Speed 3308.42 samples/sec   Loss 3.8545   LearningRate 0.0231   Epoch: 10   Global Step: 129130   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:54:16,593-Speed 3298.10 samples/sec   Loss 3.8097   LearningRate 0.0231   Epoch: 10   Global Step: 129140   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:54:19,722-Speed 3273.77 samples/sec   Loss 3.8151   LearningRate 0.0231   Epoch: 10   Global Step: 129150   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:54:22,838-Speed 3287.42 samples/sec   Loss 3.8559   LearningRate 0.0230   Epoch: 10   Global Step: 129160   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:54:25,986-Speed 3253.94 samples/sec   Loss 3.7173   LearningRate 0.0230   Epoch: 10   Global Step: 129170   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:54:29,205-Speed 3182.26 samples/sec   Loss 3.7182   LearningRate 0.0230   Epoch: 10   Global Step: 129180   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:54:32,356-Speed 3250.92 samples/sec   Loss 3.9110   LearningRate 0.0230   Epoch: 10   Global Step: 129190   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:54:35,491-Speed 3266.89 samples/sec   Loss 3.8658   LearningRate 0.0230   Epoch: 10   Global Step: 129200   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:54:38,614-Speed 3280.52 samples/sec   Loss 3.8962   LearningRate 0.0230   Epoch: 10   Global Step: 129210   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:54:41,718-Speed 3299.64 samples/sec   Loss 3.8123   LearningRate 0.0230   Epoch: 10   Global Step: 129220   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:54:44,810-Speed 3312.67 samples/sec   Loss 3.7922   LearningRate 0.0230   Epoch: 10   Global Step: 129230   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:54:47,912-Speed 3302.09 samples/sec   Loss 3.8323   LearningRate 0.0230   Epoch: 10   Global Step: 129240   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:54:50,980-Speed 3338.70 samples/sec   Loss 3.8768   LearningRate 0.0230   Epoch: 10   Global Step: 129250   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:54:54,107-Speed 3276.05 samples/sec   Loss 3.7983   LearningRate 0.0230   Epoch: 10   Global Step: 129260   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:54:57,190-Speed 3322.24 samples/sec   Loss 3.7994   LearningRate 0.0230   Epoch: 10   Global Step: 129270   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:55:00,299-Speed 3295.48 samples/sec   Loss 3.8623   LearningRate 0.0230   Epoch: 10   Global Step: 129280   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:55:03,472-Speed 3228.33 samples/sec   Loss 3.8962   LearningRate 0.0230   Epoch: 10   Global Step: 129290   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:55:06,593-Speed 3281.58 samples/sec   Loss 3.8844   LearningRate 0.0230   Epoch: 10   Global Step: 129300   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:55:09,661-Speed 3339.15 samples/sec   Loss 3.8056   LearningRate 0.0230   Epoch: 10   Global Step: 129310   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:55:12,869-Speed 3192.67 samples/sec   Loss 3.8119   LearningRate 0.0230   Epoch: 10   Global Step: 129320   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:55:15,949-Speed 3326.73 samples/sec   Loss 3.8341   LearningRate 0.0230   Epoch: 10   Global Step: 129330   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:55:19,026-Speed 3328.24 samples/sec   Loss 3.9031   LearningRate 0.0230   Epoch: 10   Global Step: 129340   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:55:22,111-Speed 3320.50 samples/sec   Loss 3.8570   LearningRate 0.0230   Epoch: 10   Global Step: 129350   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:55:25,240-Speed 3274.41 samples/sec   Loss 3.9293   LearningRate 0.0230   Epoch: 10   Global Step: 129360   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:55:28,406-Speed 3234.77 samples/sec   Loss 3.7500   LearningRate 0.0230   Epoch: 10   Global Step: 129370   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:55:31,527-Speed 3282.21 samples/sec   Loss 3.8633   LearningRate 0.0230   Epoch: 10   Global Step: 129380   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:55:34,597-Speed 3336.39 samples/sec   Loss 3.8398   LearningRate 0.0230   Epoch: 10   Global Step: 129390   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:55:37,773-Speed 3225.04 samples/sec   Loss 3.9591   LearningRate 0.0230   Epoch: 10   Global Step: 129400   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:55:40,979-Speed 3195.23 samples/sec   Loss 3.8722   LearningRate 0.0230   Epoch: 10   Global Step: 129410   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:55:44,022-Speed 3367.20 samples/sec   Loss 3.7885   LearningRate 0.0229   Epoch: 10   Global Step: 129420   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:55:47,129-Speed 3296.49 samples/sec   Loss 3.8884   LearningRate 0.0229   Epoch: 10   Global Step: 129430   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:55:50,235-Speed 3297.73 samples/sec   Loss 3.8326   LearningRate 0.0229   Epoch: 10   Global Step: 129440   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:55:53,349-Speed 3289.86 samples/sec   Loss 3.8384   LearningRate 0.0229   Epoch: 10   Global Step: 129450   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:55:56,478-Speed 3272.96 samples/sec   Loss 3.8236   LearningRate 0.0229   Epoch: 10   Global Step: 129460   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:55:59,549-Speed 3336.12 samples/sec   Loss 3.8177   LearningRate 0.0229   Epoch: 10   Global Step: 129470   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:56:02,668-Speed 3283.71 samples/sec   Loss 3.8881   LearningRate 0.0229   Epoch: 10   Global Step: 129480   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:56:05,783-Speed 3288.60 samples/sec   Loss 3.8318   LearningRate 0.0229   Epoch: 10   Global Step: 129490   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:56:08,840-Speed 3350.24 samples/sec   Loss 3.8534   LearningRate 0.0229   Epoch: 10   Global Step: 129500   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:56:11,910-Speed 3336.67 samples/sec   Loss 3.8558   LearningRate 0.0229   Epoch: 10   Global Step: 129510   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:56:15,044-Speed 3268.94 samples/sec   Loss 3.8019   LearningRate 0.0229   Epoch: 10   Global Step: 129520   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:56:18,153-Speed 3295.52 samples/sec   Loss 3.7792   LearningRate 0.0229   Epoch: 10   Global Step: 129530   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:56:21,254-Speed 3302.10 samples/sec   Loss 3.7840   LearningRate 0.0229   Epoch: 10   Global Step: 129540   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:56:24,351-Speed 3308.23 samples/sec   Loss 3.9000   LearningRate 0.0229   Epoch: 10   Global Step: 129550   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:56:27,472-Speed 3281.92 samples/sec   Loss 3.7857   LearningRate 0.0229   Epoch: 10   Global Step: 129560   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 12:56:30,585-Speed 3290.52 samples/sec   Loss 3.8285   LearningRate 0.0229   Epoch: 10   Global Step: 129570   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 12:56:33,650-Speed 3342.49 samples/sec   Loss 3.9815   LearningRate 0.0229   Epoch: 10   Global Step: 129580   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 12:56:36,770-Speed 3282.61 samples/sec   Loss 3.9096   LearningRate 0.0229   Epoch: 10   Global Step: 129590   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 12:56:39,854-Speed 3322.20 samples/sec   Loss 3.9519   LearningRate 0.0229   Epoch: 10   Global Step: 129600   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 12:56:42,971-Speed 3286.29 samples/sec   Loss 3.8325   LearningRate 0.0229   Epoch: 10   Global Step: 129610   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 12:56:46,031-Speed 3346.42 samples/sec   Loss 3.8142   LearningRate 0.0229   Epoch: 10   Global Step: 129620   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 12:56:49,136-Speed 3299.08 samples/sec   Loss 3.8485   LearningRate 0.0229   Epoch: 10   Global Step: 129630   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 12:56:52,255-Speed 3284.31 samples/sec   Loss 3.9735   LearningRate 0.0229   Epoch: 10   Global Step: 129640   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 12:56:55,343-Speed 3317.66 samples/sec   Loss 3.8533   LearningRate 0.0229   Epoch: 10   Global Step: 129650   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 12:56:58,467-Speed 3278.91 samples/sec   Loss 3.8630   LearningRate 0.0229   Epoch: 10   Global Step: 129660   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:57:01,558-Speed 3313.67 samples/sec   Loss 3.8600   LearningRate 0.0229   Epoch: 10   Global Step: 129670   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:57:04,757-Speed 3202.09 samples/sec   Loss 3.8396   LearningRate 0.0228   Epoch: 10   Global Step: 129680   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:57:07,886-Speed 3272.88 samples/sec   Loss 3.8434   LearningRate 0.0228   Epoch: 10   Global Step: 129690   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:57:10,943-Speed 3351.60 samples/sec   Loss 3.8957   LearningRate 0.0228   Epoch: 10   Global Step: 129700   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:57:14,047-Speed 3299.18 samples/sec   Loss 3.9026   LearningRate 0.0228   Epoch: 10   Global Step: 129710   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:57:17,224-Speed 3224.60 samples/sec   Loss 3.9523   LearningRate 0.0228   Epoch: 10   Global Step: 129720   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:57:20,338-Speed 3288.86 samples/sec   Loss 3.8808   LearningRate 0.0228   Epoch: 10   Global Step: 129730   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:57:23,444-Speed 3298.54 samples/sec   Loss 3.8714   LearningRate 0.0228   Epoch: 10   Global Step: 129740   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:57:26,560-Speed 3287.17 samples/sec   Loss 3.7950   LearningRate 0.0228   Epoch: 10   Global Step: 129750   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:57:29,708-Speed 3254.44 samples/sec   Loss 3.9097   LearningRate 0.0228   Epoch: 10   Global Step: 129760   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:57:32,830-Speed 3280.86 samples/sec   Loss 3.8574   LearningRate 0.0228   Epoch: 10   Global Step: 129770   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:57:35,909-Speed 3326.58 samples/sec   Loss 3.9113   LearningRate 0.0228   Epoch: 10   Global Step: 129780   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:57:39,013-Speed 3299.39 samples/sec   Loss 3.7827   LearningRate 0.0228   Epoch: 10   Global Step: 129790   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:57:42,121-Speed 3296.63 samples/sec   Loss 3.8390   LearningRate 0.0228   Epoch: 10   Global Step: 129800   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:57:45,215-Speed 3311.08 samples/sec   Loss 3.9263   LearningRate 0.0228   Epoch: 10   Global Step: 129810   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:57:48,275-Speed 3347.08 samples/sec   Loss 3.8059   LearningRate 0.0228   Epoch: 10   Global Step: 129820   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:57:51,423-Speed 3253.50 samples/sec   Loss 3.8816   LearningRate 0.0228   Epoch: 10   Global Step: 129830   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:57:54,542-Speed 3284.29 samples/sec   Loss 3.7876   LearningRate 0.0228   Epoch: 10   Global Step: 129840   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:57:57,602-Speed 3347.24 samples/sec   Loss 3.9069   LearningRate 0.0228   Epoch: 10   Global Step: 129850   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:58:00,759-Speed 3245.06 samples/sec   Loss 3.9121   LearningRate 0.0228   Epoch: 10   Global Step: 129860   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:58:03,868-Speed 3294.59 samples/sec   Loss 3.9323   LearningRate 0.0228   Epoch: 10   Global Step: 129870   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:58:06,983-Speed 3288.24 samples/sec   Loss 3.8608   LearningRate 0.0228   Epoch: 10   Global Step: 129880   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:58:10,084-Speed 3303.24 samples/sec   Loss 4.0245   LearningRate 0.0228   Epoch: 10   Global Step: 129890   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:58:13,236-Speed 3249.97 samples/sec   Loss 3.8920   LearningRate 0.0228   Epoch: 10   Global Step: 129900   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:58:16,309-Speed 3332.75 samples/sec   Loss 3.8933   LearningRate 0.0228   Epoch: 10   Global Step: 129910   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:58:19,454-Speed 3257.81 samples/sec   Loss 3.9073   LearningRate 0.0228   Epoch: 10   Global Step: 129920   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:58:22,561-Speed 3296.05 samples/sec   Loss 3.9438   LearningRate 0.0228   Epoch: 10   Global Step: 129930   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:58:25,716-Speed 3247.28 samples/sec   Loss 3.8594   LearningRate 0.0227   Epoch: 10   Global Step: 129940   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:58:28,802-Speed 3318.37 samples/sec   Loss 3.8515   LearningRate 0.0227   Epoch: 10   Global Step: 129950   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:58:31,902-Speed 3306.30 samples/sec   Loss 3.8477   LearningRate 0.0227   Epoch: 10   Global Step: 129960   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:58:35,043-Speed 3261.01 samples/sec   Loss 3.8630   LearningRate 0.0227   Epoch: 10   Global Step: 129970   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:58:38,114-Speed 3335.41 samples/sec   Loss 3.8179   LearningRate 0.0227   Epoch: 10   Global Step: 129980   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:58:41,251-Speed 3265.10 samples/sec   Loss 3.7965   LearningRate 0.0227   Epoch: 10   Global Step: 129990   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:58:44,356-Speed 3298.85 samples/sec   Loss 3.9606   LearningRate 0.0227   Epoch: 10   Global Step: 130000   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 12:58:47,445-Speed 3316.59 samples/sec   Loss 3.7857   LearningRate 0.0227   Epoch: 10   Global Step: 130010   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 12:58:50,553-Speed 3295.75 samples/sec   Loss 3.9231   LearningRate 0.0227   Epoch: 10   Global Step: 130020   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 12:58:53,626-Speed 3332.72 samples/sec   Loss 3.8454   LearningRate 0.0227   Epoch: 10   Global Step: 130030   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 12:58:56,688-Speed 3345.58 samples/sec   Loss 3.9171   LearningRate 0.0227   Epoch: 10   Global Step: 130040   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 12:58:59,804-Speed 3286.97 samples/sec   Loss 3.8520   LearningRate 0.0227   Epoch: 10   Global Step: 130050   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:59:02,940-Speed 3266.50 samples/sec   Loss 3.8614   LearningRate 0.0227   Epoch: 10   Global Step: 130060   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:59:06,086-Speed 3256.01 samples/sec   Loss 3.8832   LearningRate 0.0227   Epoch: 10   Global Step: 130070   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:59:09,147-Speed 3345.89 samples/sec   Loss 3.9446   LearningRate 0.0227   Epoch: 10   Global Step: 130080   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:59:12,286-Speed 3264.01 samples/sec   Loss 3.8534   LearningRate 0.0227   Epoch: 10   Global Step: 130090   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 12:59:15,381-Speed 3309.10 samples/sec   Loss 3.8424   LearningRate 0.0227   Epoch: 10   Global Step: 130100   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:59:18,464-Speed 3322.32 samples/sec   Loss 3.9689   LearningRate 0.0227   Epoch: 10   Global Step: 130110   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:59:21,519-Speed 3353.40 samples/sec   Loss 3.7965   LearningRate 0.0227   Epoch: 10   Global Step: 130120   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:59:24,611-Speed 3313.13 samples/sec   Loss 3.8788   LearningRate 0.0227   Epoch: 10   Global Step: 130130   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:59:27,667-Speed 3351.71 samples/sec   Loss 3.8447   LearningRate 0.0227   Epoch: 10   Global Step: 130140   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:59:30,722-Speed 3353.51 samples/sec   Loss 3.8727   LearningRate 0.0227   Epoch: 10   Global Step: 130150   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:59:33,818-Speed 3308.05 samples/sec   Loss 3.8449   LearningRate 0.0227   Epoch: 10   Global Step: 130160   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:59:36,916-Speed 3305.86 samples/sec   Loss 3.9414   LearningRate 0.0227   Epoch: 10   Global Step: 130170   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:59:39,978-Speed 3346.25 samples/sec   Loss 3.8345   LearningRate 0.0227   Epoch: 10   Global Step: 130180   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:59:43,028-Speed 3358.14 samples/sec   Loss 3.7623   LearningRate 0.0227   Epoch: 10   Global Step: 130190   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:59:46,080-Speed 3356.24 samples/sec   Loss 3.9460   LearningRate 0.0226   Epoch: 10   Global Step: 130200   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:59:49,202-Speed 3281.02 samples/sec   Loss 3.9261   LearningRate 0.0226   Epoch: 10   Global Step: 130210   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:59:52,274-Speed 3333.96 samples/sec   Loss 3.8547   LearningRate 0.0226   Epoch: 10   Global Step: 130220   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:59:55,326-Speed 3356.58 samples/sec   Loss 3.9252   LearningRate 0.0226   Epoch: 10   Global Step: 130230   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 12:59:58,388-Speed 3345.34 samples/sec   Loss 3.8817   LearningRate 0.0226   Epoch: 10   Global Step: 130240   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:00:01,483-Speed 3309.39 samples/sec   Loss 3.9916   LearningRate 0.0226   Epoch: 10   Global Step: 130250   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:00:04,604-Speed 3281.82 samples/sec   Loss 3.9208   LearningRate 0.0226   Epoch: 10   Global Step: 130260   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:00:07,732-Speed 3275.20 samples/sec   Loss 3.9424   LearningRate 0.0226   Epoch: 10   Global Step: 130270   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:00:10,777-Speed 3363.47 samples/sec   Loss 3.8771   LearningRate 0.0226   Epoch: 10   Global Step: 130280   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:00:13,825-Speed 3361.18 samples/sec   Loss 3.8452   LearningRate 0.0226   Epoch: 10   Global Step: 130290   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:00:16,883-Speed 3350.26 samples/sec   Loss 3.9542   LearningRate 0.0226   Epoch: 10   Global Step: 130300   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:00:19,952-Speed 3336.67 samples/sec   Loss 3.8612   LearningRate 0.0226   Epoch: 10   Global Step: 130310   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:00:23,008-Speed 3352.35 samples/sec   Loss 3.7919   LearningRate 0.0226   Epoch: 10   Global Step: 130320   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:00:26,073-Speed 3342.12 samples/sec   Loss 3.9889   LearningRate 0.0226   Epoch: 10   Global Step: 130330   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:00:29,225-Speed 3249.63 samples/sec   Loss 3.8470   LearningRate 0.0226   Epoch: 10   Global Step: 130340   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:00:32,280-Speed 3353.66 samples/sec   Loss 3.8948   LearningRate 0.0226   Epoch: 10   Global Step: 130350   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:00:35,362-Speed 3323.24 samples/sec   Loss 3.9275   LearningRate 0.0226   Epoch: 10   Global Step: 130360   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:00:38,494-Speed 3270.32 samples/sec   Loss 3.9009   LearningRate 0.0226   Epoch: 10   Global Step: 130370   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:00:41,576-Speed 3323.05 samples/sec   Loss 3.9323   LearningRate 0.0226   Epoch: 10   Global Step: 130380   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:00:44,659-Speed 3322.86 samples/sec   Loss 3.8382   LearningRate 0.0226   Epoch: 10   Global Step: 130390   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:00:47,779-Speed 3283.25 samples/sec   Loss 3.8327   LearningRate 0.0226   Epoch: 10   Global Step: 130400   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 13:00:50,831-Speed 3355.55 samples/sec   Loss 3.9490   LearningRate 0.0226   Epoch: 10   Global Step: 130410   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:00:54,012-Speed 3220.57 samples/sec   Loss 3.8233   LearningRate 0.0226   Epoch: 10   Global Step: 130420   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:00:57,079-Speed 3340.02 samples/sec   Loss 3.8600   LearningRate 0.0226   Epoch: 10   Global Step: 130430   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:01:00,192-Speed 3290.85 samples/sec   Loss 3.9429   LearningRate 0.0226   Epoch: 10   Global Step: 130440   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:01:03,255-Speed 3343.59 samples/sec   Loss 3.8860   LearningRate 0.0226   Epoch: 10   Global Step: 130450   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:01:06,328-Speed 3334.05 samples/sec   Loss 3.9096   LearningRate 0.0225   Epoch: 10   Global Step: 130460   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:01:09,414-Speed 3319.18 samples/sec   Loss 3.8503   LearningRate 0.0225   Epoch: 10   Global Step: 130470   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:01:12,508-Speed 3310.54 samples/sec   Loss 3.7978   LearningRate 0.0225   Epoch: 10   Global Step: 130480   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:01:15,578-Speed 3336.00 samples/sec   Loss 3.8989   LearningRate 0.0225   Epoch: 10   Global Step: 130490   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:01:18,631-Speed 3355.96 samples/sec   Loss 3.9028   LearningRate 0.0225   Epoch: 10   Global Step: 130500   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:01:21,688-Speed 3350.57 samples/sec   Loss 3.8008   LearningRate 0.0225   Epoch: 10   Global Step: 130510   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:01:24,791-Speed 3300.54 samples/sec   Loss 3.8471   LearningRate 0.0225   Epoch: 10   Global Step: 130520   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:01:27,991-Speed 3201.02 samples/sec   Loss 3.8659   LearningRate 0.0225   Epoch: 10   Global Step: 130530   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:01:31,134-Speed 3259.91 samples/sec   Loss 3.9121   LearningRate 0.0225   Epoch: 10   Global Step: 130540   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:01:34,185-Speed 3356.93 samples/sec   Loss 3.8427   LearningRate 0.0225   Epoch: 10   Global Step: 130550   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:01:37,287-Speed 3302.34 samples/sec   Loss 3.9076   LearningRate 0.0225   Epoch: 10   Global Step: 130560   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:01:40,387-Speed 3303.45 samples/sec   Loss 3.9277   LearningRate 0.0225   Epoch: 10   Global Step: 130570   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:01:43,460-Speed 3333.93 samples/sec   Loss 3.8179   LearningRate 0.0225   Epoch: 10   Global Step: 130580   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:01:46,561-Speed 3303.21 samples/sec   Loss 3.9246   LearningRate 0.0225   Epoch: 10   Global Step: 130590   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:01:49,695-Speed 3268.40 samples/sec   Loss 3.8896   LearningRate 0.0225   Epoch: 10   Global Step: 130600   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:01:52,815-Speed 3282.89 samples/sec   Loss 3.9425   LearningRate 0.0225   Epoch: 10   Global Step: 130610   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:01:55,895-Speed 3325.86 samples/sec   Loss 3.9774   LearningRate 0.0225   Epoch: 10   Global Step: 130620   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:01:58,998-Speed 3300.52 samples/sec   Loss 3.8434   LearningRate 0.0225   Epoch: 10   Global Step: 130630   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:02:02,079-Speed 3325.68 samples/sec   Loss 3.8583   LearningRate 0.0225   Epoch: 10   Global Step: 130640   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:02:05,238-Speed 3241.68 samples/sec   Loss 3.9429   LearningRate 0.0225   Epoch: 10   Global Step: 130650   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:02:08,347-Speed 3295.00 samples/sec   Loss 3.8543   LearningRate 0.0225   Epoch: 10   Global Step: 130660   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 13:02:11,398-Speed 3357.66 samples/sec   Loss 3.8565   LearningRate 0.0225   Epoch: 10   Global Step: 130670   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:02:14,515-Speed 3285.58 samples/sec   Loss 3.8654   LearningRate 0.0225   Epoch: 10   Global Step: 130680   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:02:17,700-Speed 3216.65 samples/sec   Loss 3.7809   LearningRate 0.0225   Epoch: 10   Global Step: 130690   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:02:20,821-Speed 3282.27 samples/sec   Loss 3.8581   LearningRate 0.0225   Epoch: 10   Global Step: 130700   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:02:23,903-Speed 3323.34 samples/sec   Loss 3.8421   LearningRate 0.0225   Epoch: 10   Global Step: 130710   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:02:27,079-Speed 3225.36 samples/sec   Loss 3.8587   LearningRate 0.0224   Epoch: 10   Global Step: 130720   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:02:30,136-Speed 3351.07 samples/sec   Loss 3.9629   LearningRate 0.0224   Epoch: 10   Global Step: 130730   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:02:33,172-Speed 3373.65 samples/sec   Loss 3.9536   LearningRate 0.0224   Epoch: 10   Global Step: 130740   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 13:02:36,273-Speed 3303.43 samples/sec   Loss 3.8671   LearningRate 0.0224   Epoch: 10   Global Step: 130750   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 13:02:39,344-Speed 3335.19 samples/sec   Loss 3.8950   LearningRate 0.0224   Epoch: 10   Global Step: 130760   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 13:02:42,430-Speed 3319.23 samples/sec   Loss 3.8367   LearningRate 0.0224   Epoch: 10   Global Step: 130770   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 13:02:45,483-Speed 3354.73 samples/sec   Loss 3.9089   LearningRate 0.0224   Epoch: 10   Global Step: 130780   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 13:02:48,588-Speed 3299.15 samples/sec   Loss 3.9356   LearningRate 0.0224   Epoch: 10   Global Step: 130790   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 13:02:51,684-Speed 3308.11 samples/sec   Loss 3.8305   LearningRate 0.0224   Epoch: 10   Global Step: 130800   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 13:02:54,787-Speed 3301.19 samples/sec   Loss 3.8506   LearningRate 0.0224   Epoch: 10   Global Step: 130810   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 13:02:57,896-Speed 3294.32 samples/sec   Loss 3.9349   LearningRate 0.0224   Epoch: 10   Global Step: 130820   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 13:03:01,051-Speed 3247.32 samples/sec   Loss 3.8770   LearningRate 0.0224   Epoch: 10   Global Step: 130830   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 13:03:04,147-Speed 3309.25 samples/sec   Loss 3.8703   LearningRate 0.0224   Epoch: 10   Global Step: 130840   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:03:07,276-Speed 3273.09 samples/sec   Loss 3.8439   LearningRate 0.0224   Epoch: 10   Global Step: 130850   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:03:10,340-Speed 3343.31 samples/sec   Loss 3.9543   LearningRate 0.0224   Epoch: 10   Global Step: 130860   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:03:13,490-Speed 3252.09 samples/sec   Loss 3.9192   LearningRate 0.0224   Epoch: 10   Global Step: 130870   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:03:16,637-Speed 3254.70 samples/sec   Loss 3.8286   LearningRate 0.0224   Epoch: 10   Global Step: 130880   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:03:19,717-Speed 3325.09 samples/sec   Loss 3.9466   LearningRate 0.0224   Epoch: 10   Global Step: 130890   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:03:22,791-Speed 3332.16 samples/sec   Loss 4.0185   LearningRate 0.0224   Epoch: 10   Global Step: 130900   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:03:25,965-Speed 3227.96 samples/sec   Loss 3.9388   LearningRate 0.0224   Epoch: 10   Global Step: 130910   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:03:29,138-Speed 3227.94 samples/sec   Loss 3.9412   LearningRate 0.0224   Epoch: 10   Global Step: 130920   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:03:32,287-Speed 3253.59 samples/sec   Loss 3.9272   LearningRate 0.0224   Epoch: 10   Global Step: 130930   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:03:35,365-Speed 3327.07 samples/sec   Loss 3.9177   LearningRate 0.0224   Epoch: 10   Global Step: 130940   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:03:38,515-Speed 3251.24 samples/sec   Loss 3.8872   LearningRate 0.0224   Epoch: 10   Global Step: 130950   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:03:41,712-Speed 3204.96 samples/sec   Loss 3.9465   LearningRate 0.0224   Epoch: 10   Global Step: 130960   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:03:44,839-Speed 3275.62 samples/sec   Loss 3.9229   LearningRate 0.0224   Epoch: 10   Global Step: 130970   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:03:47,898-Speed 3348.10 samples/sec   Loss 3.8905   LearningRate 0.0223   Epoch: 10   Global Step: 130980   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:03:50,963-Speed 3342.20 samples/sec   Loss 3.9166   LearningRate 0.0223   Epoch: 10   Global Step: 130990   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:03:54,075-Speed 3291.26 samples/sec   Loss 3.8962   LearningRate 0.0223   Epoch: 10   Global Step: 131000   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:03:57,165-Speed 3315.72 samples/sec   Loss 3.9039   LearningRate 0.0223   Epoch: 10   Global Step: 131010   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:04:00,282-Speed 3285.98 samples/sec   Loss 3.8451   LearningRate 0.0223   Epoch: 10   Global Step: 131020   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:04:03,354-Speed 3334.30 samples/sec   Loss 3.9710   LearningRate 0.0223   Epoch: 10   Global Step: 131030   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:04:06,454-Speed 3303.96 samples/sec   Loss 3.9180   LearningRate 0.0223   Epoch: 10   Global Step: 131040   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 13:04:09,578-Speed 3278.69 samples/sec   Loss 3.8662   LearningRate 0.0223   Epoch: 10   Global Step: 131050   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 13:04:12,681-Speed 3302.33 samples/sec   Loss 3.9038   LearningRate 0.0223   Epoch: 10   Global Step: 131060   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 13:04:15,778-Speed 3306.65 samples/sec   Loss 3.9201   LearningRate 0.0223   Epoch: 10   Global Step: 131070   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 13:04:18,943-Speed 3236.40 samples/sec   Loss 3.9032   LearningRate 0.0223   Epoch: 10   Global Step: 131080   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 13:04:22,009-Speed 3341.84 samples/sec   Loss 3.9790   LearningRate 0.0223   Epoch: 10   Global Step: 131090   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 13:04:25,136-Speed 3275.11 samples/sec   Loss 3.8169   LearningRate 0.0223   Epoch: 10   Global Step: 131100   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 13:04:28,320-Speed 3217.35 samples/sec   Loss 3.9174   LearningRate 0.0223   Epoch: 10   Global Step: 131110   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 13:04:31,415-Speed 3309.87 samples/sec   Loss 3.9084   LearningRate 0.0223   Epoch: 10   Global Step: 131120   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 13:04:34,551-Speed 3265.33 samples/sec   Loss 3.8579   LearningRate 0.0223   Epoch: 10   Global Step: 131130   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 13:04:37,687-Speed 3267.08 samples/sec   Loss 4.0366   LearningRate 0.0223   Epoch: 10   Global Step: 131140   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:04:40,800-Speed 3290.59 samples/sec   Loss 3.9194   LearningRate 0.0223   Epoch: 10   Global Step: 131150   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:04:43,918-Speed 3284.46 samples/sec   Loss 3.8955   LearningRate 0.0223   Epoch: 10   Global Step: 131160   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:04:47,006-Speed 3317.37 samples/sec   Loss 3.8790   LearningRate 0.0223   Epoch: 10   Global Step: 131170   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:04:50,141-Speed 3267.74 samples/sec   Loss 3.9627   LearningRate 0.0223   Epoch: 10   Global Step: 131180   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:04:53,354-Speed 3187.72 samples/sec   Loss 3.9092   LearningRate 0.0223   Epoch: 10   Global Step: 131190   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:04:56,416-Speed 3345.05 samples/sec   Loss 3.9021   LearningRate 0.0223   Epoch: 10   Global Step: 131200   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:04:59,512-Speed 3309.29 samples/sec   Loss 3.9781   LearningRate 0.0223   Epoch: 10   Global Step: 131210   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:05:02,651-Speed 3262.63 samples/sec   Loss 3.9205   LearningRate 0.0223   Epoch: 10   Global Step: 131220   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:05:05,803-Speed 3250.03 samples/sec   Loss 3.8917   LearningRate 0.0223   Epoch: 10   Global Step: 131230   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:05:08,902-Speed 3305.51 samples/sec   Loss 3.9564   LearningRate 0.0223   Epoch: 10   Global Step: 131240   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:05:12,026-Speed 3278.40 samples/sec   Loss 3.9245   LearningRate 0.0222   Epoch: 10   Global Step: 131250   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:05:15,157-Speed 3272.15 samples/sec   Loss 3.9141   LearningRate 0.0222   Epoch: 10   Global Step: 131260   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:05:18,316-Speed 3242.01 samples/sec   Loss 3.8677   LearningRate 0.0222   Epoch: 10   Global Step: 131270   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:05:21,395-Speed 3326.25 samples/sec   Loss 3.9224   LearningRate 0.0222   Epoch: 10   Global Step: 131280   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:05:24,520-Speed 3278.43 samples/sec   Loss 3.9477   LearningRate 0.0222   Epoch: 10   Global Step: 131290   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:05:27,673-Speed 3248.82 samples/sec   Loss 3.8971   LearningRate 0.0222   Epoch: 10   Global Step: 131300   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:05:30,788-Speed 3288.14 samples/sec   Loss 3.9382   LearningRate 0.0222   Epoch: 10   Global Step: 131310   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:05:33,848-Speed 3346.66 samples/sec   Loss 3.9162   LearningRate 0.0222   Epoch: 10   Global Step: 131320   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:05:36,955-Speed 3297.35 samples/sec   Loss 3.9225   LearningRate 0.0222   Epoch: 10   Global Step: 131330   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:05:40,135-Speed 3221.44 samples/sec   Loss 3.9012   LearningRate 0.0222   Epoch: 10   Global Step: 131340   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 13:05:43,285-Speed 3251.68 samples/sec   Loss 3.9100   LearningRate 0.0222   Epoch: 10   Global Step: 131350   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 13:05:46,400-Speed 3288.91 samples/sec   Loss 3.8985   LearningRate 0.0222   Epoch: 10   Global Step: 131360   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 13:05:49,473-Speed 3332.63 samples/sec   Loss 3.8635   LearningRate 0.0222   Epoch: 10   Global Step: 131370   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 13:05:52,604-Speed 3272.09 samples/sec   Loss 3.8692   LearningRate 0.0222   Epoch: 10   Global Step: 131380   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 13:05:55,755-Speed 3250.81 samples/sec   Loss 4.0099   LearningRate 0.0222   Epoch: 10   Global Step: 131390   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:05:58,833-Speed 3327.42 samples/sec   Loss 3.9206   LearningRate 0.0222   Epoch: 10   Global Step: 131400   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:06:01,994-Speed 3240.21 samples/sec   Loss 3.9370   LearningRate 0.0222   Epoch: 10   Global Step: 131410   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:06:05,178-Speed 3217.19 samples/sec   Loss 3.9601   LearningRate 0.0222   Epoch: 10   Global Step: 131420   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:06:08,313-Speed 3267.47 samples/sec   Loss 3.9021   LearningRate 0.0222   Epoch: 10   Global Step: 131430   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:06:11,452-Speed 3263.81 samples/sec   Loss 3.9557   LearningRate 0.0222   Epoch: 10   Global Step: 131440   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:06:14,587-Speed 3267.05 samples/sec   Loss 3.9359   LearningRate 0.0222   Epoch: 10   Global Step: 131450   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:06:17,686-Speed 3305.26 samples/sec   Loss 3.8919   LearningRate 0.0222   Epoch: 10   Global Step: 131460   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:06:20,817-Speed 3270.94 samples/sec   Loss 3.8188   LearningRate 0.0222   Epoch: 10   Global Step: 131470   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:06:23,959-Speed 3260.59 samples/sec   Loss 3.9520   LearningRate 0.0222   Epoch: 10   Global Step: 131480   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:06:27,066-Speed 3296.78 samples/sec   Loss 3.9298   LearningRate 0.0222   Epoch: 10   Global Step: 131490   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 13:06:30,214-Speed 3253.91 samples/sec   Loss 3.9694   LearningRate 0.0222   Epoch: 10   Global Step: 131500   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:06:33,307-Speed 3311.45 samples/sec   Loss 3.8164   LearningRate 0.0221   Epoch: 10   Global Step: 131510   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:06:36,522-Speed 3186.28 samples/sec   Loss 3.9718   LearningRate 0.0221   Epoch: 10   Global Step: 131520   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:06:39,599-Speed 3329.57 samples/sec   Loss 3.8892   LearningRate 0.0221   Epoch: 10   Global Step: 131530   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:06:42,731-Speed 3270.40 samples/sec   Loss 3.8836   LearningRate 0.0221   Epoch: 10   Global Step: 131540   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:06:45,783-Speed 3355.67 samples/sec   Loss 3.9373   LearningRate 0.0221   Epoch: 10   Global Step: 131550   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 13:06:48,881-Speed 3306.95 samples/sec   Loss 3.9218   LearningRate 0.0221   Epoch: 10   Global Step: 131560   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 13:06:52,027-Speed 3255.78 samples/sec   Loss 3.9019   LearningRate 0.0221   Epoch: 10   Global Step: 131570   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 13:06:55,217-Speed 3210.84 samples/sec   Loss 3.8199   LearningRate 0.0221   Epoch: 10   Global Step: 131580   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 13:06:58,304-Speed 3318.33 samples/sec   Loss 3.9088   LearningRate 0.0221   Epoch: 10   Global Step: 131590   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 13:07:01,451-Speed 3254.67 samples/sec   Loss 3.9665   LearningRate 0.0221   Epoch: 10   Global Step: 131600   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 13:07:04,586-Speed 3266.97 samples/sec   Loss 3.8785   LearningRate 0.0221   Epoch: 10   Global Step: 131610   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 13:07:07,688-Speed 3302.86 samples/sec   Loss 3.9358   LearningRate 0.0221   Epoch: 10   Global Step: 131620   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 13:07:10,741-Speed 3354.68 samples/sec   Loss 3.8629   LearningRate 0.0221   Epoch: 10   Global Step: 131630   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 13:07:13,904-Speed 3239.01 samples/sec   Loss 3.8943   LearningRate 0.0221   Epoch: 10   Global Step: 131640   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 13:07:17,047-Speed 3258.83 samples/sec   Loss 4.0163   LearningRate 0.0221   Epoch: 10   Global Step: 131650   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:07:20,165-Speed 3285.55 samples/sec   Loss 3.9947   LearningRate 0.0221   Epoch: 10   Global Step: 131660   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:07:23,260-Speed 3309.27 samples/sec   Loss 3.9330   LearningRate 0.0221   Epoch: 10   Global Step: 131670   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:07:26,379-Speed 3284.33 samples/sec   Loss 3.8673   LearningRate 0.0221   Epoch: 10   Global Step: 131680   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:07:29,490-Speed 3292.53 samples/sec   Loss 3.8917   LearningRate 0.0221   Epoch: 10   Global Step: 131690   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:07:32,549-Speed 3349.21 samples/sec   Loss 3.9245   LearningRate 0.0221   Epoch: 10   Global Step: 131700   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:07:35,714-Speed 3236.36 samples/sec   Loss 3.9239   LearningRate 0.0221   Epoch: 10   Global Step: 131710   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:07:38,802-Speed 3316.93 samples/sec   Loss 3.9167   LearningRate 0.0221   Epoch: 10   Global Step: 131720   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:07:41,939-Speed 3265.07 samples/sec   Loss 3.8791   LearningRate 0.0221   Epoch: 10   Global Step: 131730   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:07:45,025-Speed 3319.23 samples/sec   Loss 3.9315   LearningRate 0.0221   Epoch: 10   Global Step: 131740   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:07:48,093-Speed 3338.95 samples/sec   Loss 3.9835   LearningRate 0.0221   Epoch: 10   Global Step: 131750   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:07:51,205-Speed 3291.10 samples/sec   Loss 3.9729   LearningRate 0.0221   Epoch: 10   Global Step: 131760   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:07:54,354-Speed 3252.84 samples/sec   Loss 3.8210   LearningRate 0.0220   Epoch: 10   Global Step: 131770   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:07:57,427-Speed 3333.56 samples/sec   Loss 3.8237   LearningRate 0.0220   Epoch: 10   Global Step: 131780   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:08:00,550-Speed 3279.50 samples/sec   Loss 3.8984   LearningRate 0.0220   Epoch: 10   Global Step: 131790   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:08:03,641-Speed 3313.71 samples/sec   Loss 3.8747   LearningRate 0.0220   Epoch: 10   Global Step: 131800   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:08:06,786-Speed 3256.97 samples/sec   Loss 3.9058   LearningRate 0.0220   Epoch: 10   Global Step: 131810   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:08:09,905-Speed 3285.20 samples/sec   Loss 3.8713   LearningRate 0.0220   Epoch: 10   Global Step: 131820   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:08:12,998-Speed 3311.16 samples/sec   Loss 3.9283   LearningRate 0.0220   Epoch: 10   Global Step: 131830   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:08:16,125-Speed 3275.50 samples/sec   Loss 3.9170   LearningRate 0.0220   Epoch: 10   Global Step: 131840   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:08:19,213-Speed 3317.18 samples/sec   Loss 3.9329   LearningRate 0.0220   Epoch: 10   Global Step: 131850   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:08:22,302-Speed 3316.52 samples/sec   Loss 3.9539   LearningRate 0.0220   Epoch: 10   Global Step: 131860   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:08:25,482-Speed 3221.14 samples/sec   Loss 3.9266   LearningRate 0.0220   Epoch: 10   Global Step: 131870   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:08:28,564-Speed 3323.33 samples/sec   Loss 4.0532   LearningRate 0.0220   Epoch: 10   Global Step: 131880   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:08:31,735-Speed 3229.97 samples/sec   Loss 3.9314   LearningRate 0.0220   Epoch: 10   Global Step: 131890   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:08:34,838-Speed 3301.23 samples/sec   Loss 3.9445   LearningRate 0.0220   Epoch: 10   Global Step: 131900   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:08:37,908-Speed 3336.36 samples/sec   Loss 3.9907   LearningRate 0.0220   Epoch: 10   Global Step: 131910   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:08:41,049-Speed 3261.42 samples/sec   Loss 3.9588   LearningRate 0.0220   Epoch: 10   Global Step: 131920   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:08:44,135-Speed 3319.14 samples/sec   Loss 4.0303   LearningRate 0.0220   Epoch: 10   Global Step: 131930   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:08:47,221-Speed 3319.34 samples/sec   Loss 3.9259   LearningRate 0.0220   Epoch: 10   Global Step: 131940   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:08:50,350-Speed 3272.87 samples/sec   Loss 3.9588   LearningRate 0.0220   Epoch: 10   Global Step: 131950   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:08:53,506-Speed 3245.95 samples/sec   Loss 3.8982   LearningRate 0.0220   Epoch: 10   Global Step: 131960   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:08:56,609-Speed 3301.33 samples/sec   Loss 3.9314   LearningRate 0.0220   Epoch: 10   Global Step: 131970   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:08:59,684-Speed 3330.38 samples/sec   Loss 3.9328   LearningRate 0.0220   Epoch: 10   Global Step: 131980   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:09:02,778-Speed 3311.61 samples/sec   Loss 3.9128   LearningRate 0.0220   Epoch: 10   Global Step: 131990   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:09:05,957-Speed 3222.10 samples/sec   Loss 3.9284   LearningRate 0.0220   Epoch: 10   Global Step: 132000   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:09:09,059-Speed 3301.94 samples/sec   Loss 3.9924   LearningRate 0.0220   Epoch: 10   Global Step: 132010   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:09:12,164-Speed 3298.81 samples/sec   Loss 4.0486   LearningRate 0.0220   Epoch: 10   Global Step: 132020   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:09:15,293-Speed 3273.52 samples/sec   Loss 3.9510   LearningRate 0.0220   Epoch: 10   Global Step: 132030   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:09:18,409-Speed 3287.76 samples/sec   Loss 3.9481   LearningRate 0.0219   Epoch: 10   Global Step: 132040   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:09:21,500-Speed 3313.96 samples/sec   Loss 3.9534   LearningRate 0.0219   Epoch: 10   Global Step: 132050   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:09:24,606-Speed 3297.28 samples/sec   Loss 3.9387   LearningRate 0.0219   Epoch: 10   Global Step: 132060   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:09:27,756-Speed 3252.19 samples/sec   Loss 4.0776   LearningRate 0.0219   Epoch: 10   Global Step: 132070   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:09:30,902-Speed 3255.79 samples/sec   Loss 3.9158   LearningRate 0.0219   Epoch: 10   Global Step: 132080   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:09:33,962-Speed 3347.40 samples/sec   Loss 3.8361   LearningRate 0.0219   Epoch: 10   Global Step: 132090   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:09:37,075-Speed 3290.21 samples/sec   Loss 3.9421   LearningRate 0.0219   Epoch: 10   Global Step: 132100   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:09:40,740-Speed 2794.80 samples/sec   Loss 4.1094   LearningRate 0.0219   Epoch: 10   Global Step: 132110   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:09:43,845-Speed 3298.45 samples/sec   Loss 3.9214   LearningRate 0.0219   Epoch: 10   Global Step: 132120   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:09:46,921-Speed 3330.81 samples/sec   Loss 4.0333   LearningRate 0.0219   Epoch: 10   Global Step: 132130   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:09:50,025-Speed 3300.17 samples/sec   Loss 4.0111   LearningRate 0.0219   Epoch: 10   Global Step: 132140   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:09:53,159-Speed 3267.79 samples/sec   Loss 3.9720   LearningRate 0.0219   Epoch: 10   Global Step: 132150   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:09:56,272-Speed 3290.19 samples/sec   Loss 3.9460   LearningRate 0.0219   Epoch: 10   Global Step: 132160   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:09:59,328-Speed 3352.44 samples/sec   Loss 4.0041   LearningRate 0.0219   Epoch: 10   Global Step: 132170   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:10:02,470-Speed 3260.12 samples/sec   Loss 3.9470   LearningRate 0.0219   Epoch: 10   Global Step: 132180   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:10:05,606-Speed 3266.72 samples/sec   Loss 3.9984   LearningRate 0.0219   Epoch: 10   Global Step: 132190   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:10:08,648-Speed 3367.13 samples/sec   Loss 3.9784   LearningRate 0.0219   Epoch: 10   Global Step: 132200   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:10:11,758-Speed 3293.78 samples/sec   Loss 4.0192   LearningRate 0.0219   Epoch: 10   Global Step: 132210   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:10:14,912-Speed 3247.84 samples/sec   Loss 4.0240   LearningRate 0.0219   Epoch: 10   Global Step: 132220   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:10:18,055-Speed 3259.34 samples/sec   Loss 3.9741   LearningRate 0.0219   Epoch: 10   Global Step: 132230   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:10:21,154-Speed 3305.10 samples/sec   Loss 3.8956   LearningRate 0.0219   Epoch: 10   Global Step: 132240   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:10:24,244-Speed 3314.76 samples/sec   Loss 4.0062   LearningRate 0.0219   Epoch: 10   Global Step: 132250   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:10:27,384-Speed 3261.90 samples/sec   Loss 3.8855   LearningRate 0.0219   Epoch: 10   Global Step: 132260   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:10:30,630-Speed 3156.71 samples/sec   Loss 3.8897   LearningRate 0.0219   Epoch: 10   Global Step: 132270   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:10:33,689-Speed 3347.31 samples/sec   Loss 3.8394   LearningRate 0.0219   Epoch: 10   Global Step: 132280   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:10:36,779-Speed 3315.32 samples/sec   Loss 3.9911   LearningRate 0.0219   Epoch: 10   Global Step: 132290   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:10:39,872-Speed 3312.76 samples/sec   Loss 4.0215   LearningRate 0.0218   Epoch: 10   Global Step: 132300   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:10:43,006-Speed 3267.98 samples/sec   Loss 3.9467   LearningRate 0.0218   Epoch: 10   Global Step: 132310   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:10:46,083-Speed 3328.47 samples/sec   Loss 3.8645   LearningRate 0.0218   Epoch: 10   Global Step: 132320   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:10:49,208-Speed 3278.44 samples/sec   Loss 3.9511   LearningRate 0.0218   Epoch: 10   Global Step: 132330   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:10:52,319-Speed 3291.97 samples/sec   Loss 4.0104   LearningRate 0.0218   Epoch: 10   Global Step: 132340   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:10:55,421-Speed 3302.32 samples/sec   Loss 3.8903   LearningRate 0.0218   Epoch: 10   Global Step: 132350   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:10:58,535-Speed 3290.12 samples/sec   Loss 4.0693   LearningRate 0.0218   Epoch: 10   Global Step: 132360   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:11:01,644-Speed 3294.72 samples/sec   Loss 3.8864   LearningRate 0.0218   Epoch: 10   Global Step: 132370   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:11:04,729-Speed 3320.37 samples/sec   Loss 3.9689   LearningRate 0.0218   Epoch: 10   Global Step: 132380   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:11:07,783-Speed 3353.29 samples/sec   Loss 3.9313   LearningRate 0.0218   Epoch: 10   Global Step: 132390   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:11:10,856-Speed 3333.48 samples/sec   Loss 3.9195   LearningRate 0.0218   Epoch: 10   Global Step: 132400   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 13:11:13,970-Speed 3289.41 samples/sec   Loss 3.9141   LearningRate 0.0218   Epoch: 10   Global Step: 132410   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:11:17,166-Speed 3204.80 samples/sec   Loss 3.9543   LearningRate 0.0218   Epoch: 10   Global Step: 132420   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:11:20,310-Speed 3257.99 samples/sec   Loss 3.9896   LearningRate 0.0218   Epoch: 10   Global Step: 132430   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:11:23,414-Speed 3300.57 samples/sec   Loss 3.9024   LearningRate 0.0218   Epoch: 10   Global Step: 132440   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:11:26,530-Speed 3287.09 samples/sec   Loss 3.9361   LearningRate 0.0218   Epoch: 10   Global Step: 132450   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:11:29,681-Speed 3250.39 samples/sec   Loss 3.9298   LearningRate 0.0218   Epoch: 10   Global Step: 132460   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:11:32,745-Speed 3343.33 samples/sec   Loss 4.0049   LearningRate 0.0218   Epoch: 10   Global Step: 132470   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:11:35,869-Speed 3278.76 samples/sec   Loss 3.9699   LearningRate 0.0218   Epoch: 10   Global Step: 132480   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:11:38,939-Speed 3337.29 samples/sec   Loss 3.9145   LearningRate 0.0218   Epoch: 10   Global Step: 132490   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:11:42,032-Speed 3311.91 samples/sec   Loss 4.0061   LearningRate 0.0218   Epoch: 10   Global Step: 132500   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:11:45,112-Speed 3325.58 samples/sec   Loss 3.9130   LearningRate 0.0218   Epoch: 10   Global Step: 132510   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:11:48,216-Speed 3300.59 samples/sec   Loss 4.0498   LearningRate 0.0218   Epoch: 10   Global Step: 132520   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:11:51,284-Speed 3337.96 samples/sec   Loss 3.7764   LearningRate 0.0218   Epoch: 10   Global Step: 132530   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:11:54,384-Speed 3303.94 samples/sec   Loss 3.8534   LearningRate 0.0218   Epoch: 10   Global Step: 132540   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:11:57,459-Speed 3331.75 samples/sec   Loss 3.8954   LearningRate 0.0218   Epoch: 10   Global Step: 132550   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:12:00,538-Speed 3326.71 samples/sec   Loss 4.0054   LearningRate 0.0218   Epoch: 10   Global Step: 132560   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:12:03,592-Speed 3354.24 samples/sec   Loss 3.8980   LearningRate 0.0217   Epoch: 10   Global Step: 132570   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:12:06,730-Speed 3263.51 samples/sec   Loss 3.9937   LearningRate 0.0217   Epoch: 10   Global Step: 132580   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:12:09,789-Speed 3349.94 samples/sec   Loss 3.9339   LearningRate 0.0217   Epoch: 10   Global Step: 132590   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:12:12,840-Speed 3356.84 samples/sec   Loss 3.9608   LearningRate 0.0217   Epoch: 10   Global Step: 132600   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:12:15,929-Speed 3316.56 samples/sec   Loss 3.9791   LearningRate 0.0217   Epoch: 10   Global Step: 132610   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:12:19,019-Speed 3314.32 samples/sec   Loss 4.0300   LearningRate 0.0217   Epoch: 10   Global Step: 132620   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:12:22,091-Speed 3335.14 samples/sec   Loss 4.0523   LearningRate 0.0217   Epoch: 10   Global Step: 132630   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:12:25,154-Speed 3344.09 samples/sec   Loss 3.9488   LearningRate 0.0217   Epoch: 10   Global Step: 132640   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:12:28,351-Speed 3204.64 samples/sec   Loss 3.9708   LearningRate 0.0217   Epoch: 10   Global Step: 132650   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:12:31,448-Speed 3307.14 samples/sec   Loss 3.9306   LearningRate 0.0217   Epoch: 10   Global Step: 132660   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:12:34,528-Speed 3326.29 samples/sec   Loss 4.0244   LearningRate 0.0217   Epoch: 10   Global Step: 132670   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:12:37,658-Speed 3272.79 samples/sec   Loss 3.9568   LearningRate 0.0217   Epoch: 10   Global Step: 132680   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:12:40,735-Speed 3329.30 samples/sec   Loss 3.9304   LearningRate 0.0217   Epoch: 10   Global Step: 132690   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:12:43,917-Speed 3218.27 samples/sec   Loss 3.8680   LearningRate 0.0217   Epoch: 10   Global Step: 132700   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:12:47,013-Speed 3309.26 samples/sec   Loss 3.9638   LearningRate 0.0217   Epoch: 10   Global Step: 132710   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:12:50,091-Speed 3328.07 samples/sec   Loss 3.9044   LearningRate 0.0217   Epoch: 10   Global Step: 132720   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:12:53,270-Speed 3221.51 samples/sec   Loss 4.0229   LearningRate 0.0217   Epoch: 10   Global Step: 132730   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:12:56,373-Speed 3300.67 samples/sec   Loss 3.9435   LearningRate 0.0217   Epoch: 10   Global Step: 132740   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:12:59,495-Speed 3281.57 samples/sec   Loss 3.8142   LearningRate 0.0217   Epoch: 10   Global Step: 132750   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:13:02,688-Speed 3208.59 samples/sec   Loss 3.9713   LearningRate 0.0217   Epoch: 10   Global Step: 132760   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:13:05,839-Speed 3250.65 samples/sec   Loss 3.9598   LearningRate 0.0217   Epoch: 10   Global Step: 132770   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:13:08,917-Speed 3327.83 samples/sec   Loss 3.9601   LearningRate 0.0217   Epoch: 10   Global Step: 132780   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:13:11,979-Speed 3345.07 samples/sec   Loss 3.9911   LearningRate 0.0217   Epoch: 10   Global Step: 132790   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:13:15,159-Speed 3221.05 samples/sec   Loss 3.8462   LearningRate 0.0217   Epoch: 10   Global Step: 132800   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:13:18,292-Speed 3269.45 samples/sec   Loss 3.9966   LearningRate 0.0217   Epoch: 10   Global Step: 132810   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:13:21,362-Speed 3337.18 samples/sec   Loss 3.9283   LearningRate 0.0217   Epoch: 10   Global Step: 132820   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:13:24,471-Speed 3294.46 samples/sec   Loss 3.9526   LearningRate 0.0217   Epoch: 10   Global Step: 132830   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:13:27,573-Speed 3302.52 samples/sec   Loss 3.9732   LearningRate 0.0216   Epoch: 10   Global Step: 132840   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:13:30,650-Speed 3328.95 samples/sec   Loss 3.8485   LearningRate 0.0216   Epoch: 10   Global Step: 132850   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:13:33,706-Speed 3352.00 samples/sec   Loss 3.8922   LearningRate 0.0216   Epoch: 10   Global Step: 132860   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:13:36,797-Speed 3313.24 samples/sec   Loss 3.9679   LearningRate 0.0216   Epoch: 10   Global Step: 132870   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:13:39,860-Speed 3344.17 samples/sec   Loss 4.0400   LearningRate 0.0216   Epoch: 10   Global Step: 132880   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:13:42,961-Speed 3304.00 samples/sec   Loss 3.9652   LearningRate 0.0216   Epoch: 10   Global Step: 132890   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:13:46,030-Speed 3337.37 samples/sec   Loss 3.9879   LearningRate 0.0216   Epoch: 10   Global Step: 132900   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:13:49,133-Speed 3301.71 samples/sec   Loss 3.9515   LearningRate 0.0216   Epoch: 10   Global Step: 132910   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:13:52,278-Speed 3256.28 samples/sec   Loss 3.9437   LearningRate 0.0216   Epoch: 10   Global Step: 132920   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:13:55,377-Speed 3306.30 samples/sec   Loss 3.9211   LearningRate 0.0216   Epoch: 10   Global Step: 132930   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:13:58,487-Speed 3292.89 samples/sec   Loss 3.9381   LearningRate 0.0216   Epoch: 10   Global Step: 132940   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:14:01,621-Speed 3268.84 samples/sec   Loss 3.8976   LearningRate 0.0216   Epoch: 10   Global Step: 132950   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:14:04,767-Speed 3255.54 samples/sec   Loss 4.0478   LearningRate 0.0216   Epoch: 10   Global Step: 132960   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:14:07,886-Speed 3284.90 samples/sec   Loss 3.9369   LearningRate 0.0216   Epoch: 10   Global Step: 132970   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:14:10,958-Speed 3333.72 samples/sec   Loss 3.9602   LearningRate 0.0216   Epoch: 10   Global Step: 132980   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:14:14,035-Speed 3329.77 samples/sec   Loss 3.9557   LearningRate 0.0216   Epoch: 10   Global Step: 132990   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:14:17,107-Speed 3334.30 samples/sec   Loss 3.9374   LearningRate 0.0216   Epoch: 10   Global Step: 133000   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:14:20,182-Speed 3331.60 samples/sec   Loss 3.9740   LearningRate 0.0216   Epoch: 10   Global Step: 133010   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:14:23,241-Speed 3347.94 samples/sec   Loss 3.9086   LearningRate 0.0216   Epoch: 10   Global Step: 133020   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:14:26,411-Speed 3231.97 samples/sec   Loss 4.0360   LearningRate 0.0216   Epoch: 10   Global Step: 133030   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:14:29,545-Speed 3267.58 samples/sec   Loss 3.9396   LearningRate 0.0216   Epoch: 10   Global Step: 133040   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:14:32,688-Speed 3258.73 samples/sec   Loss 4.0255   LearningRate 0.0216   Epoch: 10   Global Step: 133050   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:14:35,775-Speed 3318.63 samples/sec   Loss 3.8993   LearningRate 0.0216   Epoch: 10   Global Step: 133060   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:14:38,874-Speed 3305.40 samples/sec   Loss 3.9860   LearningRate 0.0216   Epoch: 10   Global Step: 133070   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:14:42,060-Speed 3214.81 samples/sec   Loss 3.9297   LearningRate 0.0216   Epoch: 10   Global Step: 133080   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:14:45,136-Speed 3330.82 samples/sec   Loss 3.8860   LearningRate 0.0216   Epoch: 10   Global Step: 133090   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:14:48,234-Speed 3306.53 samples/sec   Loss 3.9923   LearningRate 0.0215   Epoch: 10   Global Step: 133100   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:14:51,324-Speed 3314.21 samples/sec   Loss 3.9671   LearningRate 0.0215   Epoch: 10   Global Step: 133110   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:14:54,421-Speed 3307.91 samples/sec   Loss 3.9223   LearningRate 0.0215   Epoch: 10   Global Step: 133120   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:14:57,495-Speed 3332.07 samples/sec   Loss 3.9459   LearningRate 0.0215   Epoch: 10   Global Step: 133130   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:15:00,680-Speed 3216.84 samples/sec   Loss 4.0038   LearningRate 0.0215   Epoch: 10   Global Step: 133140   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:15:03,851-Speed 3229.54 samples/sec   Loss 3.9614   LearningRate 0.0215   Epoch: 10   Global Step: 133150   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:15:07,026-Speed 3226.14 samples/sec   Loss 3.9362   LearningRate 0.0215   Epoch: 10   Global Step: 133160   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:15:10,089-Speed 3344.74 samples/sec   Loss 3.8813   LearningRate 0.0215   Epoch: 10   Global Step: 133170   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:15:13,258-Speed 3232.76 samples/sec   Loss 3.8742   LearningRate 0.0215   Epoch: 10   Global Step: 133180   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:15:16,385-Speed 3275.69 samples/sec   Loss 3.9699   LearningRate 0.0215   Epoch: 10   Global Step: 133190   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:15:19,537-Speed 3249.58 samples/sec   Loss 4.0010   LearningRate 0.0215   Epoch: 10   Global Step: 133200   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 13:15:22,627-Speed 3315.03 samples/sec   Loss 4.0766   LearningRate 0.0215   Epoch: 10   Global Step: 133210   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 13:15:25,743-Speed 3287.02 samples/sec   Loss 4.0021   LearningRate 0.0215   Epoch: 10   Global Step: 133220   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 13:15:28,857-Speed 3289.11 samples/sec   Loss 3.8942   LearningRate 0.0215   Epoch: 10   Global Step: 133230   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 13:15:31,909-Speed 3356.54 samples/sec   Loss 3.9526   LearningRate 0.0215   Epoch: 10   Global Step: 133240   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:15:35,026-Speed 3286.69 samples/sec   Loss 3.9529   LearningRate 0.0215   Epoch: 10   Global Step: 133250   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:15:38,157-Speed 3271.83 samples/sec   Loss 3.9256   LearningRate 0.0215   Epoch: 10   Global Step: 133260   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:15:41,347-Speed 3209.84 samples/sec   Loss 3.9118   LearningRate 0.0215   Epoch: 10   Global Step: 133270   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:15:44,463-Speed 3287.59 samples/sec   Loss 3.9095   LearningRate 0.0215   Epoch: 10   Global Step: 133280   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:15:47,573-Speed 3293.28 samples/sec   Loss 3.8632   LearningRate 0.0215   Epoch: 10   Global Step: 133290   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:15:50,711-Speed 3265.13 samples/sec   Loss 4.0037   LearningRate 0.0215   Epoch: 10   Global Step: 133300   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:15:53,863-Speed 3248.81 samples/sec   Loss 3.9731   LearningRate 0.0215   Epoch: 10   Global Step: 133310   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:15:56,942-Speed 3327.51 samples/sec   Loss 3.9083   LearningRate 0.0215   Epoch: 10   Global Step: 133320   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:15:59,999-Speed 3350.25 samples/sec   Loss 3.9935   LearningRate 0.0215   Epoch: 10   Global Step: 133330   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:16:03,114-Speed 3288.14 samples/sec   Loss 3.9463   LearningRate 0.0215   Epoch: 10   Global Step: 133340   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 13:16:06,215-Speed 3303.28 samples/sec   Loss 3.9622   LearningRate 0.0215   Epoch: 10   Global Step: 133350   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 13:16:09,308-Speed 3312.50 samples/sec   Loss 3.9767   LearningRate 0.0215   Epoch: 10   Global Step: 133360   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 13:16:12,390-Speed 3323.48 samples/sec   Loss 3.9433   LearningRate 0.0214   Epoch: 10   Global Step: 133370   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:16:15,610-Speed 3181.62 samples/sec   Loss 4.0103   LearningRate 0.0214   Epoch: 10   Global Step: 133380   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:16:18,696-Speed 3318.46 samples/sec   Loss 3.9849   LearningRate 0.0214   Epoch: 10   Global Step: 133390   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:16:21,784-Speed 3318.07 samples/sec   Loss 3.9160   LearningRate 0.0214   Epoch: 10   Global Step: 133400   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:16:24,957-Speed 3227.88 samples/sec   Loss 3.9226   LearningRate 0.0214   Epoch: 10   Global Step: 133410   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:16:28,024-Speed 3340.33 samples/sec   Loss 4.0708   LearningRate 0.0214   Epoch: 10   Global Step: 133420   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:16:31,131-Speed 3296.25 samples/sec   Loss 3.9544   LearningRate 0.0214   Epoch: 10   Global Step: 133430   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:16:34,203-Speed 3334.19 samples/sec   Loss 3.9048   LearningRate 0.0214   Epoch: 10   Global Step: 133440   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:16:37,266-Speed 3344.33 samples/sec   Loss 3.9133   LearningRate 0.0214   Epoch: 10   Global Step: 133450   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:16:40,404-Speed 3264.47 samples/sec   Loss 3.9101   LearningRate 0.0214   Epoch: 10   Global Step: 133460   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:16:43,530-Speed 3276.88 samples/sec   Loss 3.9982   LearningRate 0.0214   Epoch: 10   Global Step: 133470   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 13:16:46,597-Speed 3340.16 samples/sec   Loss 3.8768   LearningRate 0.0214   Epoch: 10   Global Step: 133480   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:16:49,676-Speed 3326.66 samples/sec   Loss 3.9924   LearningRate 0.0214   Epoch: 10   Global Step: 133490   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:16:52,765-Speed 3316.16 samples/sec   Loss 3.9413   LearningRate 0.0214   Epoch: 10   Global Step: 133500   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:16:55,836-Speed 3336.05 samples/sec   Loss 3.9591   LearningRate 0.0214   Epoch: 10   Global Step: 133510   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:16:58,948-Speed 3291.50 samples/sec   Loss 3.9592   LearningRate 0.0214   Epoch: 10   Global Step: 133520   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:17:02,069-Speed 3282.59 samples/sec   Loss 3.9998   LearningRate 0.0214   Epoch: 10   Global Step: 133530   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:17:05,178-Speed 3294.35 samples/sec   Loss 3.9210   LearningRate 0.0214   Epoch: 10   Global Step: 133540   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:17:08,255-Speed 3329.27 samples/sec   Loss 3.9377   LearningRate 0.0214   Epoch: 10   Global Step: 133550   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:17:11,390-Speed 3267.41 samples/sec   Loss 3.9449   LearningRate 0.0214   Epoch: 10   Global Step: 133560   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:17:14,486-Speed 3307.95 samples/sec   Loss 4.0773   LearningRate 0.0214   Epoch: 10   Global Step: 133570   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:17:17,638-Speed 3249.26 samples/sec   Loss 3.9691   LearningRate 0.0214   Epoch: 10   Global Step: 133580   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 13:17:20,719-Speed 3325.86 samples/sec   Loss 3.9494   LearningRate 0.0214   Epoch: 10   Global Step: 133590   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 13:17:23,818-Speed 3304.11 samples/sec   Loss 3.8680   LearningRate 0.0214   Epoch: 10   Global Step: 133600   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 13:17:26,967-Speed 3253.23 samples/sec   Loss 3.9684   LearningRate 0.0214   Epoch: 10   Global Step: 133610   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:17:30,056-Speed 3316.35 samples/sec   Loss 3.9727   LearningRate 0.0214   Epoch: 10   Global Step: 133620   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:17:33,186-Speed 3272.00 samples/sec   Loss 3.9423   LearningRate 0.0214   Epoch: 10   Global Step: 133630   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:17:36,345-Speed 3243.01 samples/sec   Loss 3.8937   LearningRate 0.0213   Epoch: 10   Global Step: 133640   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:17:39,470-Speed 3278.62 samples/sec   Loss 3.9673   LearningRate 0.0213   Epoch: 10   Global Step: 133650   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:17:42,607-Speed 3264.75 samples/sec   Loss 3.9556   LearningRate 0.0213   Epoch: 10   Global Step: 133660   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:17:45,681-Speed 3332.58 samples/sec   Loss 4.0136   LearningRate 0.0213   Epoch: 10   Global Step: 133670   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:17:48,822-Speed 3261.05 samples/sec   Loss 3.9534   LearningRate 0.0213   Epoch: 10   Global Step: 133680   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:17:52,010-Speed 3212.88 samples/sec   Loss 3.9823   LearningRate 0.0213   Epoch: 10   Global Step: 133690   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:17:55,078-Speed 3338.74 samples/sec   Loss 3.9280   LearningRate 0.0213   Epoch: 10   Global Step: 133700   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:17:58,167-Speed 3315.94 samples/sec   Loss 3.9651   LearningRate 0.0213   Epoch: 10   Global Step: 133710   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:18:01,256-Speed 3316.04 samples/sec   Loss 3.9620   LearningRate 0.0213   Epoch: 10   Global Step: 133720   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:18:04,397-Speed 3261.20 samples/sec   Loss 4.0191   LearningRate 0.0213   Epoch: 10   Global Step: 133730   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:18:07,506-Speed 3294.98 samples/sec   Loss 4.0584   LearningRate 0.0213   Epoch: 10   Global Step: 133740   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:18:10,598-Speed 3312.77 samples/sec   Loss 3.9009   LearningRate 0.0213   Epoch: 10   Global Step: 133750   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:18:13,723-Speed 3277.58 samples/sec   Loss 3.9151   LearningRate 0.0213   Epoch: 10   Global Step: 133760   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:18:16,873-Speed 3252.24 samples/sec   Loss 3.8977   LearningRate 0.0213   Epoch: 10   Global Step: 133770   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:18:19,993-Speed 3282.82 samples/sec   Loss 3.9971   LearningRate 0.0213   Epoch: 10   Global Step: 133780   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:18:23,105-Speed 3291.18 samples/sec   Loss 3.9402   LearningRate 0.0213   Epoch: 10   Global Step: 133790   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:18:26,248-Speed 3260.09 samples/sec   Loss 3.9577   LearningRate 0.0213   Epoch: 10   Global Step: 133800   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:18:29,338-Speed 3314.10 samples/sec   Loss 3.9763   LearningRate 0.0213   Epoch: 10   Global Step: 133810   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:18:32,452-Speed 3289.84 samples/sec   Loss 3.9263   LearningRate 0.0213   Epoch: 10   Global Step: 133820   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:18:35,581-Speed 3273.33 samples/sec   Loss 3.9295   LearningRate 0.0213   Epoch: 10   Global Step: 133830   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:18:38,686-Speed 3299.97 samples/sec   Loss 4.0056   LearningRate 0.0213   Epoch: 10   Global Step: 133840   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:18:41,789-Speed 3300.94 samples/sec   Loss 3.9471   LearningRate 0.0213   Epoch: 10   Global Step: 133850   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:18:44,878-Speed 3315.59 samples/sec   Loss 3.9510   LearningRate 0.0213   Epoch: 10   Global Step: 133860   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:18:48,021-Speed 3258.82 samples/sec   Loss 3.9233   LearningRate 0.0213   Epoch: 10   Global Step: 133870   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:18:51,082-Speed 3346.90 samples/sec   Loss 3.8985   LearningRate 0.0213   Epoch: 10   Global Step: 133880   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:18:54,193-Speed 3292.47 samples/sec   Loss 3.9902   LearningRate 0.0213   Epoch: 10   Global Step: 133890   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:18:57,283-Speed 3315.03 samples/sec   Loss 3.9437   LearningRate 0.0213   Epoch: 10   Global Step: 133900   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:19:00,366-Speed 3322.13 samples/sec   Loss 3.9844   LearningRate 0.0212   Epoch: 10   Global Step: 133910   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:19:03,426-Speed 3347.72 samples/sec   Loss 3.8624   LearningRate 0.0212   Epoch: 10   Global Step: 133920   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:19:06,544-Speed 3284.90 samples/sec   Loss 3.9850   LearningRate 0.0212   Epoch: 10   Global Step: 133930   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:19:09,615-Speed 3335.25 samples/sec   Loss 4.0063   LearningRate 0.0212   Epoch: 10   Global Step: 133940   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:19:12,741-Speed 3276.84 samples/sec   Loss 3.9886   LearningRate 0.0212   Epoch: 10   Global Step: 133950   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:19:15,914-Speed 3227.89 samples/sec   Loss 4.0137   LearningRate 0.0212   Epoch: 10   Global Step: 133960   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:19:19,012-Speed 3307.11 samples/sec   Loss 3.9108   LearningRate 0.0212   Epoch: 10   Global Step: 133970   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:19:22,090-Speed 3327.71 samples/sec   Loss 4.0072   LearningRate 0.0212   Epoch: 10   Global Step: 133980   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:19:25,274-Speed 3216.98 samples/sec   Loss 3.9927   LearningRate 0.0212   Epoch: 10   Global Step: 133990   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:19:28,394-Speed 3282.70 samples/sec   Loss 3.9779   LearningRate 0.0212   Epoch: 10   Global Step: 134000   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:19:31,511-Speed 3286.51 samples/sec   Loss 3.7833   LearningRate 0.0212   Epoch: 10   Global Step: 134010   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 13:19:34,607-Speed 3307.73 samples/sec   Loss 4.0132   LearningRate 0.0212   Epoch: 10   Global Step: 134020   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:19:37,706-Speed 3306.14 samples/sec   Loss 4.0233   LearningRate 0.0212   Epoch: 10   Global Step: 134030   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:19:40,790-Speed 3320.92 samples/sec   Loss 3.9276   LearningRate 0.0212   Epoch: 10   Global Step: 134040   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:19:43,910-Speed 3283.05 samples/sec   Loss 3.9821   LearningRate 0.0212   Epoch: 10   Global Step: 134050   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:19:46,983-Speed 3333.85 samples/sec   Loss 4.0354   LearningRate 0.0212   Epoch: 10   Global Step: 134060   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:19:50,143-Speed 3240.89 samples/sec   Loss 4.0067   LearningRate 0.0212   Epoch: 10   Global Step: 134070   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:19:53,263-Speed 3283.76 samples/sec   Loss 3.9062   LearningRate 0.0212   Epoch: 10   Global Step: 134080   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:19:56,328-Speed 3342.42 samples/sec   Loss 3.9312   LearningRate 0.0212   Epoch: 10   Global Step: 134090   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:19:59,428-Speed 3303.46 samples/sec   Loss 3.9461   LearningRate 0.0212   Epoch: 10   Global Step: 134100   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:20:02,607-Speed 3222.72 samples/sec   Loss 3.9333   LearningRate 0.0212   Epoch: 10   Global Step: 134110   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:20:05,730-Speed 3279.68 samples/sec   Loss 4.0031   LearningRate 0.0212   Epoch: 10   Global Step: 134120   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 13:20:08,831-Speed 3302.79 samples/sec   Loss 3.9112   LearningRate 0.0212   Epoch: 10   Global Step: 134130   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:20:11,945-Speed 3289.77 samples/sec   Loss 3.9343   LearningRate 0.0212   Epoch: 10   Global Step: 134140   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:20:15,038-Speed 3311.46 samples/sec   Loss 3.9983   LearningRate 0.0212   Epoch: 10   Global Step: 134150   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:20:18,119-Speed 3324.93 samples/sec   Loss 3.9305   LearningRate 0.0212   Epoch: 10   Global Step: 134160   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:20:21,198-Speed 3326.61 samples/sec   Loss 3.9262   LearningRate 0.0212   Epoch: 10   Global Step: 134170   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:20:24,313-Speed 3289.14 samples/sec   Loss 3.9731   LearningRate 0.0211   Epoch: 10   Global Step: 134180   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:20:27,371-Speed 3349.18 samples/sec   Loss 3.9739   LearningRate 0.0211   Epoch: 10   Global Step: 134190   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:20:30,463-Speed 3312.82 samples/sec   Loss 3.9712   LearningRate 0.0211   Epoch: 10   Global Step: 134200   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:20:33,574-Speed 3293.13 samples/sec   Loss 3.9490   LearningRate 0.0211   Epoch: 10   Global Step: 134210   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:20:36,722-Speed 3253.53 samples/sec   Loss 4.0251   LearningRate 0.0211   Epoch: 10   Global Step: 134220   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:20:39,823-Speed 3303.80 samples/sec   Loss 3.9591   LearningRate 0.0211   Epoch: 10   Global Step: 134230   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:20:42,938-Speed 3287.83 samples/sec   Loss 4.0375   LearningRate 0.0211   Epoch: 10   Global Step: 134240   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:20:46,003-Speed 3342.21 samples/sec   Loss 3.9310   LearningRate 0.0211   Epoch: 10   Global Step: 134250   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:20:49,125-Speed 3281.05 samples/sec   Loss 3.9520   LearningRate 0.0211   Epoch: 10   Global Step: 134260   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:20:52,262-Speed 3265.76 samples/sec   Loss 3.9203   LearningRate 0.0211   Epoch: 10   Global Step: 134270   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:20:55,381-Speed 3283.54 samples/sec   Loss 4.0094   LearningRate 0.0211   Epoch: 10   Global Step: 134280   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:20:58,550-Speed 3232.41 samples/sec   Loss 3.9599   LearningRate 0.0211   Epoch: 10   Global Step: 134290   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:21:01,617-Speed 3340.02 samples/sec   Loss 3.9180   LearningRate 0.0211   Epoch: 10   Global Step: 134300   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:21:04,698-Speed 3324.26 samples/sec   Loss 4.0095   LearningRate 0.0211   Epoch: 10   Global Step: 134310   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:21:07,792-Speed 3310.68 samples/sec   Loss 4.0007   LearningRate 0.0211   Epoch: 10   Global Step: 134320   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:21:10,851-Speed 3347.96 samples/sec   Loss 3.9911   LearningRate 0.0211   Epoch: 10   Global Step: 134330   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 13:21:13,969-Speed 3285.25 samples/sec   Loss 3.9306   LearningRate 0.0211   Epoch: 10   Global Step: 134340   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 13:21:17,091-Speed 3281.62 samples/sec   Loss 4.0027   LearningRate 0.0211   Epoch: 10   Global Step: 134350   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 13:21:20,148-Speed 3350.91 samples/sec   Loss 4.0257   LearningRate 0.0211   Epoch: 10   Global Step: 134360   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:21:23,199-Speed 3356.40 samples/sec   Loss 3.9548   LearningRate 0.0211   Epoch: 10   Global Step: 134370   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:21:26,296-Speed 3307.87 samples/sec   Loss 3.9576   LearningRate 0.0211   Epoch: 10   Global Step: 134380   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:21:29,352-Speed 3352.09 samples/sec   Loss 4.0388   LearningRate 0.0211   Epoch: 10   Global Step: 134390   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:21:32,480-Speed 3274.73 samples/sec   Loss 4.0127   LearningRate 0.0211   Epoch: 10   Global Step: 134400   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:21:35,563-Speed 3322.50 samples/sec   Loss 3.8915   LearningRate 0.0211   Epoch: 10   Global Step: 134410   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:21:38,661-Speed 3306.77 samples/sec   Loss 4.0138   LearningRate 0.0211   Epoch: 10   Global Step: 134420   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:21:41,724-Speed 3344.20 samples/sec   Loss 3.9751   LearningRate 0.0211   Epoch: 10   Global Step: 134430   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:21:44,785-Speed 3346.26 samples/sec   Loss 3.9392   LearningRate 0.0211   Epoch: 10   Global Step: 134440   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:21:47,865-Speed 3326.17 samples/sec   Loss 3.9534   LearningRate 0.0210   Epoch: 10   Global Step: 134450   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:21:51,002-Speed 3264.74 samples/sec   Loss 3.9382   LearningRate 0.0210   Epoch: 10   Global Step: 134460   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 13:21:54,079-Speed 3329.01 samples/sec   Loss 4.0542   LearningRate 0.0210   Epoch: 10   Global Step: 134470   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:21:57,162-Speed 3322.45 samples/sec   Loss 4.0418   LearningRate 0.0210   Epoch: 10   Global Step: 134480   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:22:00,272-Speed 3293.35 samples/sec   Loss 3.9565   LearningRate 0.0210   Epoch: 10   Global Step: 134490   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:22:03,350-Speed 3328.57 samples/sec   Loss 3.9544   LearningRate 0.0210   Epoch: 10   Global Step: 134500   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:22:06,444-Speed 3310.25 samples/sec   Loss 4.0159   LearningRate 0.0210   Epoch: 10   Global Step: 134510   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:22:09,548-Speed 3299.77 samples/sec   Loss 3.9490   LearningRate 0.0210   Epoch: 10   Global Step: 134520   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:22:12,629-Speed 3324.53 samples/sec   Loss 3.8298   LearningRate 0.0210   Epoch: 10   Global Step: 134530   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:22:15,713-Speed 3321.88 samples/sec   Loss 4.0321   LearningRate 0.0210   Epoch: 10   Global Step: 134540   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:22:18,813-Speed 3304.79 samples/sec   Loss 3.9113   LearningRate 0.0210   Epoch: 10   Global Step: 134550   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:22:21,881-Speed 3337.68 samples/sec   Loss 4.0059   LearningRate 0.0210   Epoch: 10   Global Step: 134560   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:22:25,049-Speed 3234.29 samples/sec   Loss 3.8779   LearningRate 0.0210   Epoch: 10   Global Step: 134570   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:22:28,181-Speed 3270.45 samples/sec   Loss 3.9127   LearningRate 0.0210   Epoch: 10   Global Step: 134580   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:22:31,278-Speed 3307.14 samples/sec   Loss 3.9095   LearningRate 0.0210   Epoch: 10   Global Step: 134590   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:22:34,348-Speed 3337.17 samples/sec   Loss 3.9766   LearningRate 0.0210   Epoch: 10   Global Step: 134600   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:22:37,465-Speed 3285.97 samples/sec   Loss 3.9349   LearningRate 0.0210   Epoch: 10   Global Step: 134610   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:22:40,538-Speed 3333.57 samples/sec   Loss 3.8612   LearningRate 0.0210   Epoch: 10   Global Step: 134620   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:22:43,613-Speed 3331.00 samples/sec   Loss 4.0168   LearningRate 0.0210   Epoch: 10   Global Step: 134630   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:22:46,661-Speed 3360.39 samples/sec   Loss 4.0111   LearningRate 0.0210   Epoch: 10   Global Step: 134640   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:22:49,749-Speed 3317.74 samples/sec   Loss 3.9699   LearningRate 0.0210   Epoch: 10   Global Step: 134650   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:22:52,819-Speed 3336.64 samples/sec   Loss 3.8739   LearningRate 0.0210   Epoch: 10   Global Step: 134660   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:22:55,945-Speed 3276.84 samples/sec   Loss 3.9688   LearningRate 0.0210   Epoch: 10   Global Step: 134670   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:22:59,086-Speed 3260.44 samples/sec   Loss 3.9354   LearningRate 0.0210   Epoch: 10   Global Step: 134680   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:23:02,262-Speed 3225.49 samples/sec   Loss 3.8810   LearningRate 0.0210   Epoch: 10   Global Step: 134690   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:23:05,330-Speed 3339.59 samples/sec   Loss 3.9620   LearningRate 0.0210   Epoch: 10   Global Step: 134700   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:23:08,403-Speed 3333.50 samples/sec   Loss 4.0166   LearningRate 0.0210   Epoch: 10   Global Step: 134710   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:23:11,468-Speed 3341.35 samples/sec   Loss 3.8964   LearningRate 0.0209   Epoch: 10   Global Step: 134720   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:23:14,563-Speed 3310.34 samples/sec   Loss 3.9736   LearningRate 0.0209   Epoch: 10   Global Step: 134730   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:23:17,639-Speed 3329.88 samples/sec   Loss 4.0166   LearningRate 0.0209   Epoch: 10   Global Step: 134740   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:23:20,734-Speed 3309.59 samples/sec   Loss 4.0165   LearningRate 0.0209   Epoch: 10   Global Step: 134750   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:23:23,904-Speed 3231.34 samples/sec   Loss 3.9068   LearningRate 0.0209   Epoch: 10   Global Step: 134760   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:23:26,992-Speed 3317.54 samples/sec   Loss 3.8689   LearningRate 0.0209   Epoch: 10   Global Step: 134770   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:23:30,141-Speed 3252.07 samples/sec   Loss 3.9189   LearningRate 0.0209   Epoch: 10   Global Step: 134780   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:23:33,242-Speed 3303.36 samples/sec   Loss 3.9920   LearningRate 0.0209   Epoch: 10   Global Step: 134790   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:23:36,401-Speed 3243.19 samples/sec   Loss 3.8850   LearningRate 0.0209   Epoch: 10   Global Step: 134800   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:23:39,502-Speed 3303.16 samples/sec   Loss 3.9940   LearningRate 0.0209   Epoch: 10   Global Step: 134810   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:23:42,573-Speed 3335.35 samples/sec   Loss 4.0781   LearningRate 0.0209   Epoch: 10   Global Step: 134820   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:23:45,681-Speed 3296.33 samples/sec   Loss 3.9370   LearningRate 0.0209   Epoch: 10   Global Step: 134830   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:23:48,891-Speed 3190.54 samples/sec   Loss 4.0117   LearningRate 0.0209   Epoch: 10   Global Step: 134840   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:23:51,976-Speed 3320.64 samples/sec   Loss 4.0412   LearningRate 0.0209   Epoch: 10   Global Step: 134850   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:23:55,046-Speed 3336.55 samples/sec   Loss 3.9320   LearningRate 0.0209   Epoch: 10   Global Step: 134860   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:23:58,149-Speed 3300.47 samples/sec   Loss 4.0075   LearningRate 0.0209   Epoch: 10   Global Step: 134870   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:24:01,304-Speed 3246.48 samples/sec   Loss 4.0153   LearningRate 0.0209   Epoch: 10   Global Step: 134880   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:24:04,405-Speed 3303.18 samples/sec   Loss 3.9724   LearningRate 0.0209   Epoch: 10   Global Step: 134890   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:24:07,495-Speed 3315.15 samples/sec   Loss 4.0104   LearningRate 0.0209   Epoch: 10   Global Step: 134900   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:24:10,550-Speed 3353.06 samples/sec   Loss 3.9212   LearningRate 0.0209   Epoch: 10   Global Step: 134910   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:24:13,598-Speed 3360.57 samples/sec   Loss 3.9225   LearningRate 0.0209   Epoch: 10   Global Step: 134920   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:24:16,653-Speed 3352.51 samples/sec   Loss 3.9146   LearningRate 0.0209   Epoch: 10   Global Step: 134930   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:24:19,749-Speed 3309.60 samples/sec   Loss 3.9888   LearningRate 0.0209   Epoch: 10   Global Step: 134940   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:24:22,808-Speed 3348.34 samples/sec   Loss 4.0363   LearningRate 0.0209   Epoch: 10   Global Step: 134950   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:24:25,944-Speed 3265.95 samples/sec   Loss 3.9460   LearningRate 0.0209   Epoch: 10   Global Step: 134960   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:24:29,106-Speed 3239.70 samples/sec   Loss 3.9400   LearningRate 0.0209   Epoch: 10   Global Step: 134970   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:24:32,250-Speed 3257.28 samples/sec   Loss 3.8884   LearningRate 0.0209   Epoch: 10   Global Step: 134980   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:24:35,401-Speed 3251.28 samples/sec   Loss 4.0189   LearningRate 0.0208   Epoch: 10   Global Step: 134990   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:24:38,534-Speed 3269.70 samples/sec   Loss 3.9752   LearningRate 0.0208   Epoch: 10   Global Step: 135000   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:24:41,605-Speed 3335.11 samples/sec   Loss 3.9229   LearningRate 0.0208   Epoch: 10   Global Step: 135010   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:24:44,720-Speed 3288.46 samples/sec   Loss 3.9586   LearningRate 0.0208   Epoch: 10   Global Step: 135020   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:24:47,823-Speed 3300.60 samples/sec   Loss 4.1096   LearningRate 0.0208   Epoch: 10   Global Step: 135030   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:24:51,066-Speed 3158.60 samples/sec   Loss 3.8992   LearningRate 0.0208   Epoch: 10   Global Step: 135040   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:24:54,198-Speed 3271.17 samples/sec   Loss 3.9777   LearningRate 0.0208   Epoch: 10   Global Step: 135050   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:24:57,272-Speed 3332.52 samples/sec   Loss 3.9533   LearningRate 0.0208   Epoch: 10   Global Step: 135060   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:25:00,357-Speed 3319.38 samples/sec   Loss 4.0161   LearningRate 0.0208   Epoch: 10   Global Step: 135070   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:25:03,463-Speed 3298.62 samples/sec   Loss 3.9166   LearningRate 0.0208   Epoch: 10   Global Step: 135080   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:25:06,623-Speed 3241.02 samples/sec   Loss 3.9677   LearningRate 0.0208   Epoch: 10   Global Step: 135090   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:25:09,698-Speed 3330.89 samples/sec   Loss 3.9900   LearningRate 0.0208   Epoch: 10   Global Step: 135100   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:25:12,772-Speed 3332.64 samples/sec   Loss 3.9649   LearningRate 0.0208   Epoch: 10   Global Step: 135110   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:25:15,917-Speed 3257.38 samples/sec   Loss 3.9225   LearningRate 0.0208   Epoch: 10   Global Step: 135120   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 13:25:19,033-Speed 3287.12 samples/sec   Loss 4.0385   LearningRate 0.0208   Epoch: 10   Global Step: 135130   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:25:22,084-Speed 3357.48 samples/sec   Loss 3.9286   LearningRate 0.0208   Epoch: 10   Global Step: 135140   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:25:25,153-Speed 3338.00 samples/sec   Loss 3.9312   LearningRate 0.0208   Epoch: 10   Global Step: 135150   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:25:28,247-Speed 3310.66 samples/sec   Loss 4.0509   LearningRate 0.0208   Epoch: 10   Global Step: 135160   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:25:31,350-Speed 3300.96 samples/sec   Loss 3.9882   LearningRate 0.0208   Epoch: 10   Global Step: 135170   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:25:34,413-Speed 3344.59 samples/sec   Loss 4.0618   LearningRate 0.0208   Epoch: 10   Global Step: 135180   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:25:37,550-Speed 3264.73 samples/sec   Loss 4.0294   LearningRate 0.0208   Epoch: 10   Global Step: 135190   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:25:40,692-Speed 3260.60 samples/sec   Loss 3.9854   LearningRate 0.0208   Epoch: 10   Global Step: 135200   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:25:43,796-Speed 3299.98 samples/sec   Loss 3.9419   LearningRate 0.0208   Epoch: 10   Global Step: 135210   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:25:46,862-Speed 3341.08 samples/sec   Loss 3.9864   LearningRate 0.0208   Epoch: 10   Global Step: 135220   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:25:49,957-Speed 3309.53 samples/sec   Loss 3.9231   LearningRate 0.0208   Epoch: 10   Global Step: 135230   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:25:53,123-Speed 3235.60 samples/sec   Loss 4.0368   LearningRate 0.0208   Epoch: 10   Global Step: 135240   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:25:56,221-Speed 3306.03 samples/sec   Loss 3.9484   LearningRate 0.0208   Epoch: 10   Global Step: 135250   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:25:59,312-Speed 3313.82 samples/sec   Loss 4.0013   LearningRate 0.0207   Epoch: 10   Global Step: 135260   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:26:02,371-Speed 3349.19 samples/sec   Loss 3.9268   LearningRate 0.0207   Epoch: 10   Global Step: 135270   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:26:05,485-Speed 3288.82 samples/sec   Loss 3.9313   LearningRate 0.0207   Epoch: 10   Global Step: 135280   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:26:08,566-Speed 3324.58 samples/sec   Loss 4.0616   LearningRate 0.0207   Epoch: 10   Global Step: 135290   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:26:11,688-Speed 3281.71 samples/sec   Loss 3.9092   LearningRate 0.0207   Epoch: 10   Global Step: 135300   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:26:14,843-Speed 3246.06 samples/sec   Loss 4.0187   LearningRate 0.0207   Epoch: 10   Global Step: 135310   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:26:18,001-Speed 3243.96 samples/sec   Loss 3.9220   LearningRate 0.0207   Epoch: 10   Global Step: 135320   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:26:21,108-Speed 3296.67 samples/sec   Loss 3.9113   LearningRate 0.0207   Epoch: 10   Global Step: 135330   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:26:24,302-Speed 3207.59 samples/sec   Loss 3.8548   LearningRate 0.0207   Epoch: 10   Global Step: 135340   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:26:27,457-Speed 3246.26 samples/sec   Loss 4.0349   LearningRate 0.0207   Epoch: 10   Global Step: 135350   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:26:30,559-Speed 3302.06 samples/sec   Loss 3.9589   LearningRate 0.0207   Epoch: 10   Global Step: 135360   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:26:33,610-Speed 3357.11 samples/sec   Loss 3.9819   LearningRate 0.0207   Epoch: 10   Global Step: 135370   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:26:36,742-Speed 3270.54 samples/sec   Loss 4.0139   LearningRate 0.0207   Epoch: 10   Global Step: 135380   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:26:39,877-Speed 3266.72 samples/sec   Loss 4.0098   LearningRate 0.0207   Epoch: 10   Global Step: 135390   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:26:43,078-Speed 3200.55 samples/sec   Loss 3.9635   LearningRate 0.0207   Epoch: 10   Global Step: 135400   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:26:46,145-Speed 3339.96 samples/sec   Loss 3.9867   LearningRate 0.0207   Epoch: 10   Global Step: 135410   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:26:49,223-Speed 3327.27 samples/sec   Loss 3.8951   LearningRate 0.0207   Epoch: 10   Global Step: 135420   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:26:52,369-Speed 3257.02 samples/sec   Loss 3.9836   LearningRate 0.0207   Epoch: 10   Global Step: 135430   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:26:55,517-Speed 3253.48 samples/sec   Loss 3.9374   LearningRate 0.0207   Epoch: 10   Global Step: 135440   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:26:58,574-Speed 3350.47 samples/sec   Loss 4.0885   LearningRate 0.0207   Epoch: 10   Global Step: 135450   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 13:27:01,732-Speed 3243.88 samples/sec   Loss 4.0067   LearningRate 0.0207   Epoch: 10   Global Step: 135460   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 13:27:04,857-Speed 3277.87 samples/sec   Loss 3.9490   LearningRate 0.0207   Epoch: 10   Global Step: 135470   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 13:27:07,938-Speed 3324.52 samples/sec   Loss 3.9959   LearningRate 0.0207   Epoch: 10   Global Step: 135480   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 13:27:11,043-Speed 3299.58 samples/sec   Loss 3.9488   LearningRate 0.0207   Epoch: 10   Global Step: 135490   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 13:27:14,183-Speed 3262.12 samples/sec   Loss 3.9580   LearningRate 0.0207   Epoch: 10   Global Step: 135500   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 13:27:17,290-Speed 3297.12 samples/sec   Loss 4.0224   LearningRate 0.0207   Epoch: 10   Global Step: 135510   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 13:27:20,399-Speed 3294.51 samples/sec   Loss 3.9308   LearningRate 0.0207   Epoch: 10   Global Step: 135520   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 13:27:23,486-Speed 3317.50 samples/sec   Loss 3.9563   LearningRate 0.0207   Epoch: 10   Global Step: 135530   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 13:27:26,585-Speed 3305.85 samples/sec   Loss 4.0073   LearningRate 0.0206   Epoch: 10   Global Step: 135540   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-04-27 13:27:29,758-Speed 3228.04 samples/sec   Loss 3.9599   LearningRate 0.0206   Epoch: 10   Global Step: 135550   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:27:32,856-Speed 3305.92 samples/sec   Loss 3.9198   LearningRate 0.0206   Epoch: 10   Global Step: 135560   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:27:36,009-Speed 3249.80 samples/sec   Loss 3.9845   LearningRate 0.0206   Epoch: 10   Global Step: 135570   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:27:39,137-Speed 3274.31 samples/sec   Loss 3.9712   LearningRate 0.0206   Epoch: 10   Global Step: 135580   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:27:42,251-Speed 3288.71 samples/sec   Loss 3.9161   LearningRate 0.0206   Epoch: 10   Global Step: 135590   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:27:45,312-Speed 3347.15 samples/sec   Loss 3.9282   LearningRate 0.0206   Epoch: 10   Global Step: 135600   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:27:48,400-Speed 3316.52 samples/sec   Loss 4.0045   LearningRate 0.0206   Epoch: 10   Global Step: 135610   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:27:51,486-Speed 3319.62 samples/sec   Loss 3.9485   LearningRate 0.0206   Epoch: 10   Global Step: 135620   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:27:54,609-Speed 3279.70 samples/sec   Loss 3.9589   LearningRate 0.0206   Epoch: 10   Global Step: 135630   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:27:57,699-Speed 3315.43 samples/sec   Loss 3.8920   LearningRate 0.0206   Epoch: 10   Global Step: 135640   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:28:00,814-Speed 3288.57 samples/sec   Loss 4.0623   LearningRate 0.0206   Epoch: 10   Global Step: 135650   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:28:03,953-Speed 3263.20 samples/sec   Loss 3.9589   LearningRate 0.0206   Epoch: 10   Global Step: 135660   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:28:07,054-Speed 3303.38 samples/sec   Loss 4.0234   LearningRate 0.0206   Epoch: 10   Global Step: 135670   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:28:10,107-Speed 3354.61 samples/sec   Loss 3.9749   LearningRate 0.0206   Epoch: 10   Global Step: 135680   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:28:13,224-Speed 3286.48 samples/sec   Loss 3.9864   LearningRate 0.0206   Epoch: 10   Global Step: 135690   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:28:16,330-Speed 3298.19 samples/sec   Loss 4.0038   LearningRate 0.0206   Epoch: 10   Global Step: 135700   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:28:19,436-Speed 3297.31 samples/sec   Loss 3.9614   LearningRate 0.0206   Epoch: 10   Global Step: 135710   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:28:22,501-Speed 3342.88 samples/sec   Loss 3.9696   LearningRate 0.0206   Epoch: 10   Global Step: 135720   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:28:25,596-Speed 3309.13 samples/sec   Loss 3.9722   LearningRate 0.0206   Epoch: 10   Global Step: 135730   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:28:28,666-Speed 3336.82 samples/sec   Loss 3.9482   LearningRate 0.0206   Epoch: 10   Global Step: 135740   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:28:31,785-Speed 3284.08 samples/sec   Loss 3.9780   LearningRate 0.0206   Epoch: 10   Global Step: 135750   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:28:34,900-Speed 3289.02 samples/sec   Loss 3.9825   LearningRate 0.0206   Epoch: 10   Global Step: 135760   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:28:37,996-Speed 3308.36 samples/sec   Loss 3.9032   LearningRate 0.0206   Epoch: 10   Global Step: 135770   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:28:41,130-Speed 3268.58 samples/sec   Loss 4.0090   LearningRate 0.0206   Epoch: 10   Global Step: 135780   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:28:44,193-Speed 3344.23 samples/sec   Loss 3.9506   LearningRate 0.0206   Epoch: 10   Global Step: 135790   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:28:47,290-Speed 3308.29 samples/sec   Loss 3.8767   LearningRate 0.0206   Epoch: 10   Global Step: 135800   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:28:50,447-Speed 3243.67 samples/sec   Loss 3.9930   LearningRate 0.0205   Epoch: 10   Global Step: 135810   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:28:53,605-Speed 3243.71 samples/sec   Loss 4.0113   LearningRate 0.0205   Epoch: 10   Global Step: 135820   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:28:56,706-Speed 3303.68 samples/sec   Loss 3.9787   LearningRate 0.0205   Epoch: 10   Global Step: 135830   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:28:59,857-Speed 3250.41 samples/sec   Loss 3.9461   LearningRate 0.0205   Epoch: 10   Global Step: 135840   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:29:02,920-Speed 3344.72 samples/sec   Loss 3.9603   LearningRate 0.0205   Epoch: 10   Global Step: 135850   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:29:06,055-Speed 3267.25 samples/sec   Loss 3.9919   LearningRate 0.0205   Epoch: 10   Global Step: 135860   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:29:09,111-Speed 3352.11 samples/sec   Loss 3.9679   LearningRate 0.0205   Epoch: 10   Global Step: 135870   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:29:12,245-Speed 3268.48 samples/sec   Loss 3.9727   LearningRate 0.0205   Epoch: 10   Global Step: 135880   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:29:15,321-Speed 3329.94 samples/sec   Loss 3.9672   LearningRate 0.0205   Epoch: 10   Global Step: 135890   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:29:18,421-Speed 3304.49 samples/sec   Loss 4.0392   LearningRate 0.0205   Epoch: 10   Global Step: 135900   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:29:21,496-Speed 3330.93 samples/sec   Loss 4.0050   LearningRate 0.0205   Epoch: 10   Global Step: 135910   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:29:24,561-Speed 3342.04 samples/sec   Loss 3.9334   LearningRate 0.0205   Epoch: 10   Global Step: 135920   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:29:27,651-Speed 3314.72 samples/sec   Loss 4.0333   LearningRate 0.0205   Epoch: 10   Global Step: 135930   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:29:30,806-Speed 3246.60 samples/sec   Loss 3.9763   LearningRate 0.0205   Epoch: 10   Global Step: 135940   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:29:33,880-Speed 3332.28 samples/sec   Loss 3.9655   LearningRate 0.0205   Epoch: 10   Global Step: 135950   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:29:36,963-Speed 3322.90 samples/sec   Loss 3.9582   LearningRate 0.0205   Epoch: 10   Global Step: 135960   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:29:40,041-Speed 3327.75 samples/sec   Loss 3.9963   LearningRate 0.0205   Epoch: 10   Global Step: 135970   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:29:43,136-Speed 3308.82 samples/sec   Loss 4.0569   LearningRate 0.0205   Epoch: 10   Global Step: 135980   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:29:46,208-Speed 3335.32 samples/sec   Loss 4.0936   LearningRate 0.0205   Epoch: 10   Global Step: 135990   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:29:49,291-Speed 3322.28 samples/sec   Loss 3.9941   LearningRate 0.0205   Epoch: 10   Global Step: 136000   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:29:52,351-Speed 3346.83 samples/sec   Loss 3.8986   LearningRate 0.0205   Epoch: 10   Global Step: 136010   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:29:55,436-Speed 3320.36 samples/sec   Loss 3.9642   LearningRate 0.0205   Epoch: 10   Global Step: 136020   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:29:58,532-Speed 3308.92 samples/sec   Loss 4.0128   LearningRate 0.0205   Epoch: 10   Global Step: 136030   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:30:01,604-Speed 3333.74 samples/sec   Loss 3.9750   LearningRate 0.0205   Epoch: 10   Global Step: 136040   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:30:04,748-Speed 3258.67 samples/sec   Loss 4.0492   LearningRate 0.0205   Epoch: 10   Global Step: 136050   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:30:07,883-Speed 3267.37 samples/sec   Loss 4.0182   LearningRate 0.0205   Epoch: 10   Global Step: 136060   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:30:10,945-Speed 3345.23 samples/sec   Loss 3.9413   LearningRate 0.0205   Epoch: 10   Global Step: 136070   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:30:14,085-Speed 3262.63 samples/sec   Loss 3.9702   LearningRate 0.0205   Epoch: 10   Global Step: 136080   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:30:17,171-Speed 3318.59 samples/sec   Loss 4.0049   LearningRate 0.0204   Epoch: 10   Global Step: 136090   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:30:20,221-Speed 3359.02 samples/sec   Loss 3.9159   LearningRate 0.0204   Epoch: 10   Global Step: 136100   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:30:23,345-Speed 3278.86 samples/sec   Loss 3.9804   LearningRate 0.0204   Epoch: 10   Global Step: 136110   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:30:26,522-Speed 3223.56 samples/sec   Loss 3.9008   LearningRate 0.0204   Epoch: 10   Global Step: 136120   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:30:29,639-Speed 3286.57 samples/sec   Loss 3.9306   LearningRate 0.0204   Epoch: 10   Global Step: 136130   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:30:32,752-Speed 3290.20 samples/sec   Loss 3.9972   LearningRate 0.0204   Epoch: 10   Global Step: 136140   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:30:35,825-Speed 3333.84 samples/sec   Loss 3.9932   LearningRate 0.0204   Epoch: 10   Global Step: 136150   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:30:38,961-Speed 3266.02 samples/sec   Loss 3.9800   LearningRate 0.0204   Epoch: 10   Global Step: 136160   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:30:42,050-Speed 3316.73 samples/sec   Loss 3.9386   LearningRate 0.0204   Epoch: 10   Global Step: 136170   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 13:30:45,139-Speed 3315.41 samples/sec   Loss 3.9335   LearningRate 0.0204   Epoch: 10   Global Step: 136180   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:30:48,276-Speed 3265.40 samples/sec   Loss 3.9313   LearningRate 0.0204   Epoch: 10   Global Step: 136190   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:30:51,356-Speed 3325.57 samples/sec   Loss 3.9336   LearningRate 0.0204   Epoch: 10   Global Step: 136200   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:30:54,469-Speed 3291.45 samples/sec   Loss 4.0556   LearningRate 0.0204   Epoch: 10   Global Step: 136210   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:30:57,522-Speed 3354.37 samples/sec   Loss 3.8654   LearningRate 0.0204   Epoch: 10   Global Step: 136220   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:31:00,618-Speed 3308.83 samples/sec   Loss 3.9089   LearningRate 0.0204   Epoch: 10   Global Step: 136230   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:31:03,763-Speed 3256.93 samples/sec   Loss 4.0174   LearningRate 0.0204   Epoch: 10   Global Step: 136240   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:31:06,934-Speed 3230.80 samples/sec   Loss 4.0088   LearningRate 0.0204   Epoch: 10   Global Step: 136250   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:31:10,063-Speed 3273.34 samples/sec   Loss 3.9088   LearningRate 0.0204   Epoch: 10   Global Step: 136260   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:31:13,195-Speed 3270.76 samples/sec   Loss 3.9551   LearningRate 0.0204   Epoch: 10   Global Step: 136270   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:31:16,287-Speed 3313.29 samples/sec   Loss 3.8926   LearningRate 0.0204   Epoch: 10   Global Step: 136280   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:31:19,408-Speed 3282.00 samples/sec   Loss 3.9390   LearningRate 0.0204   Epoch: 10   Global Step: 136290   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:31:22,512-Speed 3299.91 samples/sec   Loss 3.9563   LearningRate 0.0204   Epoch: 10   Global Step: 136300   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:31:25,626-Speed 3288.86 samples/sec   Loss 3.9436   LearningRate 0.0204   Epoch: 10   Global Step: 136310   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:31:28,810-Speed 3216.84 samples/sec   Loss 4.1478   LearningRate 0.0204   Epoch: 10   Global Step: 136320   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:31:31,911-Speed 3304.15 samples/sec   Loss 3.9684   LearningRate 0.0204   Epoch: 10   Global Step: 136330   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:31:35,027-Speed 3287.34 samples/sec   Loss 4.0181   LearningRate 0.0204   Epoch: 10   Global Step: 136340   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:31:38,125-Speed 3306.01 samples/sec   Loss 3.9329   LearningRate 0.0204   Epoch: 10   Global Step: 136350   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:31:41,337-Speed 3189.40 samples/sec   Loss 3.9696   LearningRate 0.0203   Epoch: 10   Global Step: 136360   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:31:44,435-Speed 3306.63 samples/sec   Loss 3.9795   LearningRate 0.0203   Epoch: 10   Global Step: 136370   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:31:47,541-Speed 3298.47 samples/sec   Loss 4.0089   LearningRate 0.0203   Epoch: 10   Global Step: 136380   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:31:50,713-Speed 3229.46 samples/sec   Loss 3.9158   LearningRate 0.0203   Epoch: 10   Global Step: 136390   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:31:53,789-Speed 3329.18 samples/sec   Loss 3.9221   LearningRate 0.0203   Epoch: 10   Global Step: 136400   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:31:56,869-Speed 3326.55 samples/sec   Loss 4.0560   LearningRate 0.0203   Epoch: 10   Global Step: 136410   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:31:59,943-Speed 3332.68 samples/sec   Loss 3.9597   LearningRate 0.0203   Epoch: 10   Global Step: 136420   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:32:03,069-Speed 3276.64 samples/sec   Loss 4.0186   LearningRate 0.0203   Epoch: 10   Global Step: 136430   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:32:06,142-Speed 3333.30 samples/sec   Loss 3.9450   LearningRate 0.0203   Epoch: 10   Global Step: 136440   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:32:09,214-Speed 3333.89 samples/sec   Loss 3.9855   LearningRate 0.0203   Epoch: 10   Global Step: 136450   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:32:12,306-Speed 3312.82 samples/sec   Loss 3.9591   LearningRate 0.0203   Epoch: 10   Global Step: 136460   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:32:15,494-Speed 3213.04 samples/sec   Loss 3.9361   LearningRate 0.0203   Epoch: 10   Global Step: 136470   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:32:18,637-Speed 3259.03 samples/sec   Loss 4.0047   LearningRate 0.0203   Epoch: 10   Global Step: 136480   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:32:21,704-Speed 3339.83 samples/sec   Loss 4.0243   LearningRate 0.0203   Epoch: 10   Global Step: 136490   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:32:24,821-Speed 3286.25 samples/sec   Loss 3.9781   LearningRate 0.0203   Epoch: 10   Global Step: 136500   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:32:28,016-Speed 3206.01 samples/sec   Loss 3.9328   LearningRate 0.0203   Epoch: 10   Global Step: 136510   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:32:31,197-Speed 3219.89 samples/sec   Loss 3.9677   LearningRate 0.0203   Epoch: 10   Global Step: 136520   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:32:34,260-Speed 3344.54 samples/sec   Loss 4.0075   LearningRate 0.0203   Epoch: 10   Global Step: 136530   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:32:37,362-Speed 3302.14 samples/sec   Loss 4.0433   LearningRate 0.0203   Epoch: 10   Global Step: 136540   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:32:40,423-Speed 3345.88 samples/sec   Loss 3.8995   LearningRate 0.0203   Epoch: 10   Global Step: 136550   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:32:43,513-Speed 3314.81 samples/sec   Loss 4.0017   LearningRate 0.0203   Epoch: 10   Global Step: 136560   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:32:46,592-Speed 3327.39 samples/sec   Loss 3.9826   LearningRate 0.0203   Epoch: 10   Global Step: 136570   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:32:49,702-Speed 3293.39 samples/sec   Loss 3.9798   LearningRate 0.0203   Epoch: 10   Global Step: 136580   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:32:52,816-Speed 3289.77 samples/sec   Loss 3.9627   LearningRate 0.0203   Epoch: 10   Global Step: 136590   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:32:55,893-Speed 3329.43 samples/sec   Loss 4.0171   LearningRate 0.0203   Epoch: 10   Global Step: 136600   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:32:59,007-Speed 3288.48 samples/sec   Loss 4.0126   LearningRate 0.0203   Epoch: 10   Global Step: 136610   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:33:02,344-Speed 3069.69 samples/sec   Loss 3.9283   LearningRate 0.0203   Epoch: 10   Global Step: 136620   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:33:05,531-Speed 3214.22 samples/sec   Loss 3.9324   LearningRate 0.0203   Epoch: 10   Global Step: 136630   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:33:37,707-Speed 318.27 samples/sec   Loss 2.9401   LearningRate 0.0202   Epoch: 11   Global Step: 136640   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:33:41,185-Speed 2945.04 samples/sec   Loss 2.8611   LearningRate 0.0202   Epoch: 11   Global Step: 136650   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:33:44,269-Speed 3322.10 samples/sec   Loss 2.8331   LearningRate 0.0202   Epoch: 11   Global Step: 136660   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:33:47,320-Speed 3356.78 samples/sec   Loss 2.7897   LearningRate 0.0202   Epoch: 11   Global Step: 136670   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:33:50,365-Speed 3363.53 samples/sec   Loss 2.7682   LearningRate 0.0202   Epoch: 11   Global Step: 136680   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:33:53,543-Speed 3223.48 samples/sec   Loss 2.8845   LearningRate 0.0202   Epoch: 11   Global Step: 136690   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:33:56,666-Speed 3280.16 samples/sec   Loss 2.9041   LearningRate 0.0202   Epoch: 11   Global Step: 136700   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:33:59,856-Speed 3210.77 samples/sec   Loss 2.9079   LearningRate 0.0202   Epoch: 11   Global Step: 136710   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:34:03,007-Speed 3251.23 samples/sec   Loss 2.9523   LearningRate 0.0202   Epoch: 11   Global Step: 136720   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:34:06,106-Speed 3305.22 samples/sec   Loss 2.9998   LearningRate 0.0202   Epoch: 11   Global Step: 136730   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:34:09,160-Speed 3353.80 samples/sec   Loss 2.8862   LearningRate 0.0202   Epoch: 11   Global Step: 136740   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:34:12,287-Speed 3276.70 samples/sec   Loss 2.8751   LearningRate 0.0202   Epoch: 11   Global Step: 136750   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:34:15,452-Speed 3235.69 samples/sec   Loss 2.9094   LearningRate 0.0202   Epoch: 11   Global Step: 136760   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:34:18,607-Speed 3247.46 samples/sec   Loss 2.8508   LearningRate 0.0202   Epoch: 11   Global Step: 136770   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:34:21,671-Speed 3343.16 samples/sec   Loss 2.8919   LearningRate 0.0202   Epoch: 11   Global Step: 136780   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 13:34:24,798-Speed 3275.68 samples/sec   Loss 2.8054   LearningRate 0.0202   Epoch: 11   Global Step: 136790   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:34:27,985-Speed 3213.83 samples/sec   Loss 2.8865   LearningRate 0.0202   Epoch: 11   Global Step: 136800   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:34:31,246-Speed 3141.07 samples/sec   Loss 2.9381   LearningRate 0.0202   Epoch: 11   Global Step: 136810   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:34:34,417-Speed 3229.72 samples/sec   Loss 2.8432   LearningRate 0.0202   Epoch: 11   Global Step: 136820   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:34:37,911-Speed 2932.22 samples/sec   Loss 2.9254   LearningRate 0.0202   Epoch: 11   Global Step: 136830   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:34:41,074-Speed 3238.48 samples/sec   Loss 2.9179   LearningRate 0.0202   Epoch: 11   Global Step: 136840   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:34:44,196-Speed 3280.38 samples/sec   Loss 2.8516   LearningRate 0.0202   Epoch: 11   Global Step: 136850   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:34:47,298-Speed 3303.05 samples/sec   Loss 2.8545   LearningRate 0.0202   Epoch: 11   Global Step: 136860   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:34:50,395-Speed 3307.43 samples/sec   Loss 2.9388   LearningRate 0.0202   Epoch: 11   Global Step: 136870   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:34:53,663-Speed 3134.42 samples/sec   Loss 2.9119   LearningRate 0.0202   Epoch: 11   Global Step: 136880   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:34:56,739-Speed 3329.98 samples/sec   Loss 2.8979   LearningRate 0.0202   Epoch: 11   Global Step: 136890   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:34:59,810-Speed 3335.53 samples/sec   Loss 2.9367   LearningRate 0.0202   Epoch: 11   Global Step: 136900   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:35:02,872-Speed 3345.14 samples/sec   Loss 2.8235   LearningRate 0.0201   Epoch: 11   Global Step: 136910   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:35:06,007-Speed 3267.69 samples/sec   Loss 2.9321   LearningRate 0.0201   Epoch: 11   Global Step: 136920   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:35:09,117-Speed 3292.71 samples/sec   Loss 2.9390   LearningRate 0.0201   Epoch: 11   Global Step: 136930   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:35:12,204-Speed 3318.23 samples/sec   Loss 2.9867   LearningRate 0.0201   Epoch: 11   Global Step: 136940   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:35:15,314-Speed 3293.82 samples/sec   Loss 3.0147   LearningRate 0.0201   Epoch: 11   Global Step: 136950   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:35:18,414-Speed 3304.50 samples/sec   Loss 3.0088   LearningRate 0.0201   Epoch: 11   Global Step: 136960   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:35:21,529-Speed 3288.26 samples/sec   Loss 2.9474   LearningRate 0.0201   Epoch: 11   Global Step: 136970   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:35:24,633-Speed 3300.73 samples/sec   Loss 2.9411   LearningRate 0.0201   Epoch: 11   Global Step: 136980   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:35:27,764-Speed 3271.11 samples/sec   Loss 2.9918   LearningRate 0.0201   Epoch: 11   Global Step: 136990   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:35:30,839-Speed 3331.36 samples/sec   Loss 2.9020   LearningRate 0.0201   Epoch: 11   Global Step: 137000   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:35:33,906-Speed 3339.10 samples/sec   Loss 2.8617   LearningRate 0.0201   Epoch: 11   Global Step: 137010   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:35:37,011-Speed 3299.06 samples/sec   Loss 2.8938   LearningRate 0.0201   Epoch: 11   Global Step: 137020   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:35:40,110-Speed 3306.02 samples/sec   Loss 3.0542   LearningRate 0.0201   Epoch: 11   Global Step: 137030   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:35:43,266-Speed 3245.71 samples/sec   Loss 2.9961   LearningRate 0.0201   Epoch: 11   Global Step: 137040   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:35:46,334-Speed 3338.64 samples/sec   Loss 2.9213   LearningRate 0.0201   Epoch: 11   Global Step: 137050   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:35:49,384-Speed 3357.48 samples/sec   Loss 2.8990   LearningRate 0.0201   Epoch: 11   Global Step: 137060   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:35:52,462-Speed 3329.20 samples/sec   Loss 2.9681   LearningRate 0.0201   Epoch: 11   Global Step: 137070   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:35:55,535-Speed 3333.13 samples/sec   Loss 2.9344   LearningRate 0.0201   Epoch: 11   Global Step: 137080   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:35:58,591-Speed 3351.76 samples/sec   Loss 2.9083   LearningRate 0.0201   Epoch: 11   Global Step: 137090   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:36:01,666-Speed 3330.56 samples/sec   Loss 2.9118   LearningRate 0.0201   Epoch: 11   Global Step: 137100   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:36:04,752-Speed 3319.89 samples/sec   Loss 2.9780   LearningRate 0.0201   Epoch: 11   Global Step: 137110   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:36:07,810-Speed 3349.47 samples/sec   Loss 2.9103   LearningRate 0.0201   Epoch: 11   Global Step: 137120   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:36:10,892-Speed 3323.16 samples/sec   Loss 3.0268   LearningRate 0.0201   Epoch: 11   Global Step: 137130   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:36:14,102-Speed 3191.12 samples/sec   Loss 2.9934   LearningRate 0.0201   Epoch: 11   Global Step: 137140   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:36:17,333-Speed 3170.06 samples/sec   Loss 2.9284   LearningRate 0.0201   Epoch: 11   Global Step: 137150   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:36:20,438-Speed 3299.63 samples/sec   Loss 2.9272   LearningRate 0.0201   Epoch: 11   Global Step: 137160   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:36:23,568-Speed 3272.73 samples/sec   Loss 2.9824   LearningRate 0.0201   Epoch: 11   Global Step: 137170   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:36:26,698-Speed 3272.42 samples/sec   Loss 2.9569   LearningRate 0.0201   Epoch: 11   Global Step: 137180   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:36:29,786-Speed 3317.10 samples/sec   Loss 2.9560   LearningRate 0.0200   Epoch: 11   Global Step: 137190   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:36:32,838-Speed 3355.87 samples/sec   Loss 2.9762   LearningRate 0.0200   Epoch: 11   Global Step: 137200   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:36:35,930-Speed 3313.26 samples/sec   Loss 2.9393   LearningRate 0.0200   Epoch: 11   Global Step: 137210   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:36:39,052-Speed 3281.40 samples/sec   Loss 2.9911   LearningRate 0.0200   Epoch: 11   Global Step: 137220   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:36:42,130-Speed 3327.95 samples/sec   Loss 3.0028   LearningRate 0.0200   Epoch: 11   Global Step: 137230   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:36:45,183-Speed 3354.19 samples/sec   Loss 3.0195   LearningRate 0.0200   Epoch: 11   Global Step: 137240   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:36:48,317-Speed 3269.45 samples/sec   Loss 3.0512   LearningRate 0.0200   Epoch: 11   Global Step: 137250   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:36:51,409-Speed 3312.49 samples/sec   Loss 2.9080   LearningRate 0.0200   Epoch: 11   Global Step: 137260   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:36:54,464-Speed 3352.61 samples/sec   Loss 2.9355   LearningRate 0.0200   Epoch: 11   Global Step: 137270   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:36:57,517-Speed 3354.98 samples/sec   Loss 2.9216   LearningRate 0.0200   Epoch: 11   Global Step: 137280   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:37:00,654-Speed 3265.23 samples/sec   Loss 2.9101   LearningRate 0.0200   Epoch: 11   Global Step: 137290   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:37:03,733-Speed 3327.52 samples/sec   Loss 2.9536   LearningRate 0.0200   Epoch: 11   Global Step: 137300   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:37:06,814-Speed 3324.63 samples/sec   Loss 2.9760   LearningRate 0.0200   Epoch: 11   Global Step: 137310   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:37:09,880-Speed 3341.42 samples/sec   Loss 3.0037   LearningRate 0.0200   Epoch: 11   Global Step: 137320   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:37:12,992-Speed 3290.93 samples/sec   Loss 2.9794   LearningRate 0.0200   Epoch: 11   Global Step: 137330   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:37:16,121-Speed 3273.31 samples/sec   Loss 3.0347   LearningRate 0.0200   Epoch: 11   Global Step: 137340   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:37:19,214-Speed 3312.34 samples/sec   Loss 3.0643   LearningRate 0.0200   Epoch: 11   Global Step: 137350   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:37:22,301-Speed 3317.39 samples/sec   Loss 2.9784   LearningRate 0.0200   Epoch: 11   Global Step: 137360   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:37:25,382-Speed 3325.54 samples/sec   Loss 2.9640   LearningRate 0.0200   Epoch: 11   Global Step: 137370   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:37:28,481-Speed 3305.17 samples/sec   Loss 3.0190   LearningRate 0.0200   Epoch: 11   Global Step: 137380   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:37:31,680-Speed 3201.65 samples/sec   Loss 2.9705   LearningRate 0.0200   Epoch: 11   Global Step: 137390   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-04-27 13:37:34,758-Speed 3327.78 samples/sec   Loss 2.9778   LearningRate 0.0200   Epoch: 11   Global Step: 137400   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:37:37,874-Speed 3287.74 samples/sec   Loss 3.0479   LearningRate 0.0200   Epoch: 11   Global Step: 137410   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:37:40,969-Speed 3310.05 samples/sec   Loss 2.9792   LearningRate 0.0200   Epoch: 11   Global Step: 137420   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:37:44,051-Speed 3322.94 samples/sec   Loss 3.1022   LearningRate 0.0200   Epoch: 11   Global Step: 137430   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:37:47,113-Speed 3345.88 samples/sec   Loss 3.0029   LearningRate 0.0200   Epoch: 11   Global Step: 137440   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:37:50,252-Speed 3262.53 samples/sec   Loss 3.0026   LearningRate 0.0200   Epoch: 11   Global Step: 137450   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:37:53,410-Speed 3244.11 samples/sec   Loss 2.9949   LearningRate 0.0200   Epoch: 11   Global Step: 137460   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:37:56,466-Speed 3352.17 samples/sec   Loss 2.9885   LearningRate 0.0199   Epoch: 11   Global Step: 137470   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:37:59,581-Speed 3288.44 samples/sec   Loss 3.0573   LearningRate 0.0199   Epoch: 11   Global Step: 137480   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:38:02,774-Speed 3207.39 samples/sec   Loss 3.0157   LearningRate 0.0199   Epoch: 11   Global Step: 137490   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:38:05,878-Speed 3300.78 samples/sec   Loss 2.9805   LearningRate 0.0199   Epoch: 11   Global Step: 137500   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:38:08,931-Speed 3354.88 samples/sec   Loss 2.9520   LearningRate 0.0199   Epoch: 11   Global Step: 137510   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:38:12,010-Speed 3327.23 samples/sec   Loss 2.9963   LearningRate 0.0199   Epoch: 11   Global Step: 137520   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:38:15,172-Speed 3239.45 samples/sec   Loss 3.0443   LearningRate 0.0199   Epoch: 11   Global Step: 137530   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:38:18,267-Speed 3309.22 samples/sec   Loss 3.0916   LearningRate 0.0199   Epoch: 11   Global Step: 137540   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:38:21,332-Speed 3341.67 samples/sec   Loss 3.0639   LearningRate 0.0199   Epoch: 11   Global Step: 137550   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:38:24,407-Speed 3331.18 samples/sec   Loss 3.0271   LearningRate 0.0199   Epoch: 11   Global Step: 137560   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:38:27,514-Speed 3297.33 samples/sec   Loss 3.0324   LearningRate 0.0199   Epoch: 11   Global Step: 137570   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:38:30,605-Speed 3313.73 samples/sec   Loss 2.9618   LearningRate 0.0199   Epoch: 11   Global Step: 137580   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:38:33,688-Speed 3322.73 samples/sec   Loss 3.0481   LearningRate 0.0199   Epoch: 11   Global Step: 137590   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:38:36,794-Speed 3297.85 samples/sec   Loss 3.0899   LearningRate 0.0199   Epoch: 11   Global Step: 137600   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:38:39,865-Speed 3334.63 samples/sec   Loss 3.0587   LearningRate 0.0199   Epoch: 11   Global Step: 137610   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:38:42,976-Speed 3292.60 samples/sec   Loss 3.0024   LearningRate 0.0199   Epoch: 11   Global Step: 137620   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:38:46,037-Speed 3347.02 samples/sec   Loss 3.0826   LearningRate 0.0199   Epoch: 11   Global Step: 137630   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:38:49,156-Speed 3284.14 samples/sec   Loss 3.0410   LearningRate 0.0199   Epoch: 11   Global Step: 137640   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:38:52,350-Speed 3206.68 samples/sec   Loss 3.0573   LearningRate 0.0199   Epoch: 11   Global Step: 137650   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:38:55,457-Speed 3297.05 samples/sec   Loss 3.0693   LearningRate 0.0199   Epoch: 11   Global Step: 137660   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:38:58,497-Speed 3369.58 samples/sec   Loss 3.0268   LearningRate 0.0199   Epoch: 11   Global Step: 137670   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:39:01,602-Speed 3299.23 samples/sec   Loss 3.0078   LearningRate 0.0199   Epoch: 11   Global Step: 137680   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:39:04,705-Speed 3301.35 samples/sec   Loss 2.9507   LearningRate 0.0199   Epoch: 11   Global Step: 137690   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:39:07,811-Speed 3297.61 samples/sec   Loss 3.0718   LearningRate 0.0199   Epoch: 11   Global Step: 137700   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:39:10,913-Speed 3301.97 samples/sec   Loss 3.1006   LearningRate 0.0199   Epoch: 11   Global Step: 137710   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:39:14,088-Speed 3226.13 samples/sec   Loss 3.0190   LearningRate 0.0199   Epoch: 11   Global Step: 137720   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:39:17,151-Speed 3344.57 samples/sec   Loss 3.0691   LearningRate 0.0199   Epoch: 11   Global Step: 137730   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:39:20,213-Speed 3344.37 samples/sec   Loss 3.0226   LearningRate 0.0199   Epoch: 11   Global Step: 137740   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:39:23,345-Speed 3270.91 samples/sec   Loss 3.0287   LearningRate 0.0198   Epoch: 11   Global Step: 137750   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:39:26,499-Speed 3247.76 samples/sec   Loss 3.0392   LearningRate 0.0198   Epoch: 11   Global Step: 137760   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:39:29,617-Speed 3285.18 samples/sec   Loss 3.1354   LearningRate 0.0198   Epoch: 11   Global Step: 137770   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:39:32,781-Speed 3237.96 samples/sec   Loss 3.0952   LearningRate 0.0198   Epoch: 11   Global Step: 137780   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:39:36,564-Speed 2707.40 samples/sec   Loss 3.1069   LearningRate 0.0198   Epoch: 11   Global Step: 137790   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:39:39,626-Speed 3345.69 samples/sec   Loss 3.0278   LearningRate 0.0198   Epoch: 11   Global Step: 137800   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:39:42,736-Speed 3293.39 samples/sec   Loss 3.0545   LearningRate 0.0198   Epoch: 11   Global Step: 137810   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:39:45,788-Speed 3355.64 samples/sec   Loss 3.1130   LearningRate 0.0198   Epoch: 11   Global Step: 137820   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:39:48,871-Speed 3322.78 samples/sec   Loss 3.0758   LearningRate 0.0198   Epoch: 11   Global Step: 137830   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:39:51,954-Speed 3323.28 samples/sec   Loss 3.1478   LearningRate 0.0198   Epoch: 11   Global Step: 137840   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:39:55,079-Speed 3277.46 samples/sec   Loss 3.1008   LearningRate 0.0198   Epoch: 11   Global Step: 137850   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:39:58,151-Speed 3333.97 samples/sec   Loss 3.0776   LearningRate 0.0198   Epoch: 11   Global Step: 137860   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:40:01,343-Speed 3209.44 samples/sec   Loss 3.1136   LearningRate 0.0198   Epoch: 11   Global Step: 137870   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:40:04,530-Speed 3214.11 samples/sec   Loss 3.0440   LearningRate 0.0198   Epoch: 11   Global Step: 137880   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:40:07,682-Speed 3249.20 samples/sec   Loss 3.0770   LearningRate 0.0198   Epoch: 11   Global Step: 137890   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:40:10,793-Speed 3292.42 samples/sec   Loss 2.9812   LearningRate 0.0198   Epoch: 11   Global Step: 137900   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:40:13,903-Speed 3294.14 samples/sec   Loss 3.0852   LearningRate 0.0198   Epoch: 11   Global Step: 137910   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:40:17,086-Speed 3218.13 samples/sec   Loss 3.0874   LearningRate 0.0198   Epoch: 11   Global Step: 137920   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:40:20,246-Speed 3241.62 samples/sec   Loss 3.0427   LearningRate 0.0198   Epoch: 11   Global Step: 137930   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:40:23,359-Speed 3290.00 samples/sec   Loss 3.0831   LearningRate 0.0198   Epoch: 11   Global Step: 137940   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:40:26,460-Speed 3303.32 samples/sec   Loss 3.0613   LearningRate 0.0198   Epoch: 11   Global Step: 137950   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:40:29,607-Speed 3254.81 samples/sec   Loss 3.0311   LearningRate 0.0198   Epoch: 11   Global Step: 137960   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:40:32,743-Speed 3265.90 samples/sec   Loss 3.1829   LearningRate 0.0198   Epoch: 11   Global Step: 137970   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:40:35,890-Speed 3255.53 samples/sec   Loss 3.0939   LearningRate 0.0198   Epoch: 11   Global Step: 137980   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:40:39,006-Speed 3287.24 samples/sec   Loss 3.1438   LearningRate 0.0198   Epoch: 11   Global Step: 137990   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:40:42,148-Speed 3259.88 samples/sec   Loss 3.1318   LearningRate 0.0198   Epoch: 11   Global Step: 138000   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:40:45,252-Speed 3299.70 samples/sec   Loss 3.0678   LearningRate 0.0198   Epoch: 11   Global Step: 138010   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:40:48,394-Speed 3260.43 samples/sec   Loss 3.0242   LearningRate 0.0197   Epoch: 11   Global Step: 138020   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:40:51,476-Speed 3323.83 samples/sec   Loss 3.0944   LearningRate 0.0197   Epoch: 11   Global Step: 138030   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:40:54,596-Speed 3282.74 samples/sec   Loss 3.0501   LearningRate 0.0197   Epoch: 11   Global Step: 138040   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:40:57,696-Speed 3304.19 samples/sec   Loss 3.1330   LearningRate 0.0197   Epoch: 11   Global Step: 138050   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:41:00,917-Speed 3180.43 samples/sec   Loss 3.1153   LearningRate 0.0197   Epoch: 11   Global Step: 138060   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:41:04,047-Speed 3272.83 samples/sec   Loss 3.1172   LearningRate 0.0197   Epoch: 11   Global Step: 138070   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:41:07,792-Speed 2734.91 samples/sec   Loss 3.0496   LearningRate 0.0197   Epoch: 11   Global Step: 138080   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:41:10,909-Speed 3285.97 samples/sec   Loss 3.0773   LearningRate 0.0197   Epoch: 11   Global Step: 138090   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:41:15,334-Speed 2314.56 samples/sec   Loss 3.0966   LearningRate 0.0197   Epoch: 11   Global Step: 138100   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:41:19,825-Speed 2280.89 samples/sec   Loss 3.0684   LearningRate 0.0197   Epoch: 11   Global Step: 138110   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:41:22,906-Speed 3324.83 samples/sec   Loss 3.1233   LearningRate 0.0197   Epoch: 11   Global Step: 138120   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:41:26,009-Speed 3300.95 samples/sec   Loss 3.1196   LearningRate 0.0197   Epoch: 11   Global Step: 138130   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:41:29,154-Speed 3256.85 samples/sec   Loss 3.0760   LearningRate 0.0197   Epoch: 11   Global Step: 138140   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:41:32,311-Speed 3244.78 samples/sec   Loss 3.1260   LearningRate 0.0197   Epoch: 11   Global Step: 138150   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:41:35,497-Speed 3215.55 samples/sec   Loss 3.0963   LearningRate 0.0197   Epoch: 11   Global Step: 138160   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:41:38,668-Speed 3229.86 samples/sec   Loss 3.1184   LearningRate 0.0197   Epoch: 11   Global Step: 138170   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:41:41,766-Speed 3306.10 samples/sec   Loss 3.1344   LearningRate 0.0197   Epoch: 11   Global Step: 138180   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:41:44,849-Speed 3323.03 samples/sec   Loss 3.1668   LearningRate 0.0197   Epoch: 11   Global Step: 138190   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:41:48,050-Speed 3199.01 samples/sec   Loss 3.1163   LearningRate 0.0197   Epoch: 11   Global Step: 138200   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:41:51,140-Speed 3315.30 samples/sec   Loss 3.0902   LearningRate 0.0197   Epoch: 11   Global Step: 138210   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-04-27 13:41:54,297-Speed 3244.06 samples/sec   Loss 3.1951   LearningRate 0.0197   Epoch: 11   Global Step: 138220   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-04-27 13:41:57,373-Speed 3331.38 samples/sec   Loss 3.1892   LearningRate 0.0197   Epoch: 11   Global Step: 138230   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:42:00,470-Speed 3306.52 samples/sec   Loss 3.1045   LearningRate 0.0197   Epoch: 11   Global Step: 138240   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:42:03,591-Speed 3282.79 samples/sec   Loss 3.1161   LearningRate 0.0197   Epoch: 11   Global Step: 138250   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:42:06,702-Speed 3292.46 samples/sec   Loss 3.0605   LearningRate 0.0197   Epoch: 11   Global Step: 138260   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:42:09,793-Speed 3314.09 samples/sec   Loss 3.0878   LearningRate 0.0197   Epoch: 11   Global Step: 138270   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:42:12,920-Speed 3276.26 samples/sec   Loss 3.1256   LearningRate 0.0197   Epoch: 11   Global Step: 138280   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:42:16,063-Speed 3259.13 samples/sec   Loss 3.0920   LearningRate 0.0197   Epoch: 11   Global Step: 138290   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:42:19,197-Speed 3267.95 samples/sec   Loss 3.1458   LearningRate 0.0196   Epoch: 11   Global Step: 138300   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:42:22,334-Speed 3264.63 samples/sec   Loss 3.1472   LearningRate 0.0196   Epoch: 11   Global Step: 138310   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:42:25,490-Speed 3246.16 samples/sec   Loss 3.0793   LearningRate 0.0196   Epoch: 11   Global Step: 138320   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:42:28,566-Speed 3329.77 samples/sec   Loss 3.1540   LearningRate 0.0196   Epoch: 11   Global Step: 138330   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:42:31,692-Speed 3276.90 samples/sec   Loss 3.1125   LearningRate 0.0196   Epoch: 11   Global Step: 138340   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:42:34,762-Speed 3336.78 samples/sec   Loss 3.1431   LearningRate 0.0196   Epoch: 11   Global Step: 138350   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:42:37,865-Speed 3300.89 samples/sec   Loss 3.0955   LearningRate 0.0196   Epoch: 11   Global Step: 138360   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:42:40,965-Speed 3304.53 samples/sec   Loss 3.1561   LearningRate 0.0196   Epoch: 11   Global Step: 138370   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:42:44,101-Speed 3266.52 samples/sec   Loss 3.1677   LearningRate 0.0196   Epoch: 11   Global Step: 138380   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:42:47,243-Speed 3260.10 samples/sec   Loss 3.1719   LearningRate 0.0196   Epoch: 11   Global Step: 138390   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:42:50,439-Speed 3205.16 samples/sec   Loss 3.1821   LearningRate 0.0196   Epoch: 11   Global Step: 138400   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:42:53,637-Speed 3203.08 samples/sec   Loss 3.1284   LearningRate 0.0196   Epoch: 11   Global Step: 138410   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:42:56,729-Speed 3312.37 samples/sec   Loss 3.1359   LearningRate 0.0196   Epoch: 11   Global Step: 138420   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:42:59,895-Speed 3235.29 samples/sec   Loss 3.1722   LearningRate 0.0196   Epoch: 11   Global Step: 138430   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:43:03,105-Speed 3191.16 samples/sec   Loss 3.1440   LearningRate 0.0196   Epoch: 11   Global Step: 138440   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:43:06,186-Speed 3324.89 samples/sec   Loss 3.1751   LearningRate 0.0196   Epoch: 11   Global Step: 138450   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:43:09,289-Speed 3301.36 samples/sec   Loss 3.1460   LearningRate 0.0196   Epoch: 11   Global Step: 138460   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:43:12,363-Speed 3331.79 samples/sec   Loss 3.1939   LearningRate 0.0196   Epoch: 11   Global Step: 138470   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:43:15,521-Speed 3244.64 samples/sec   Loss 3.1063   LearningRate 0.0196   Epoch: 11   Global Step: 138480   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:43:18,620-Speed 3305.26 samples/sec   Loss 3.1581   LearningRate 0.0196   Epoch: 11   Global Step: 138490   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:43:21,701-Speed 3324.25 samples/sec   Loss 3.1050   LearningRate 0.0196   Epoch: 11   Global Step: 138500   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:43:24,809-Speed 3296.44 samples/sec   Loss 3.2375   LearningRate 0.0196   Epoch: 11   Global Step: 138510   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:43:28,002-Speed 3207.42 samples/sec   Loss 3.1288   LearningRate 0.0196   Epoch: 11   Global Step: 138520   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:43:31,121-Speed 3284.84 samples/sec   Loss 3.2221   LearningRate 0.0196   Epoch: 11   Global Step: 138530   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:43:34,213-Speed 3313.13 samples/sec   Loss 3.1042   LearningRate 0.0196   Epoch: 11   Global Step: 138540   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:43:37,384-Speed 3229.94 samples/sec   Loss 3.1283   LearningRate 0.0196   Epoch: 11   Global Step: 138550   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:43:40,440-Speed 3351.80 samples/sec   Loss 3.1360   LearningRate 0.0196   Epoch: 11   Global Step: 138560   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:43:43,531-Speed 3313.37 samples/sec   Loss 3.1580   LearningRate 0.0196   Epoch: 11   Global Step: 138570   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:43:46,614-Speed 3323.27 samples/sec   Loss 3.1217   LearningRate 0.0196   Epoch: 11   Global Step: 138580   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:43:49,745-Speed 3270.91 samples/sec   Loss 3.2129   LearningRate 0.0195   Epoch: 11   Global Step: 138590   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:43:52,894-Speed 3252.90 samples/sec   Loss 3.1876   LearningRate 0.0195   Epoch: 11   Global Step: 138600   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:43:55,982-Speed 3317.48 samples/sec   Loss 3.1486   LearningRate 0.0195   Epoch: 11   Global Step: 138610   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:43:59,071-Speed 3315.90 samples/sec   Loss 3.2081   LearningRate 0.0195   Epoch: 11   Global Step: 138620   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:44:02,177-Speed 3297.32 samples/sec   Loss 3.2059   LearningRate 0.0195   Epoch: 11   Global Step: 138630   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:44:05,257-Speed 3326.44 samples/sec   Loss 3.2874   LearningRate 0.0195   Epoch: 11   Global Step: 138640   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:44:08,317-Speed 3347.19 samples/sec   Loss 3.1658   LearningRate 0.0195   Epoch: 11   Global Step: 138650   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:44:11,443-Speed 3276.52 samples/sec   Loss 3.1957   LearningRate 0.0195   Epoch: 11   Global Step: 138660   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:44:14,559-Speed 3287.62 samples/sec   Loss 3.2645   LearningRate 0.0195   Epoch: 11   Global Step: 138670   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:44:17,732-Speed 3227.99 samples/sec   Loss 3.1410   LearningRate 0.0195   Epoch: 11   Global Step: 138680   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:44:20,802-Speed 3336.59 samples/sec   Loss 3.0828   LearningRate 0.0195   Epoch: 11   Global Step: 138690   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:44:23,894-Speed 3312.94 samples/sec   Loss 3.1971   LearningRate 0.0195   Epoch: 11   Global Step: 138700   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:44:26,989-Speed 3309.55 samples/sec   Loss 3.1984   LearningRate 0.0195   Epoch: 11   Global Step: 138710   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:44:30,155-Speed 3234.56 samples/sec   Loss 3.1576   LearningRate 0.0195   Epoch: 11   Global Step: 138720   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:44:33,252-Speed 3307.83 samples/sec   Loss 3.1927   LearningRate 0.0195   Epoch: 11   Global Step: 138730   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:44:36,367-Speed 3288.81 samples/sec   Loss 3.1936   LearningRate 0.0195   Epoch: 11   Global Step: 138740   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:44:39,537-Speed 3231.26 samples/sec   Loss 3.1412   LearningRate 0.0195   Epoch: 11   Global Step: 138750   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:44:42,729-Speed 3208.49 samples/sec   Loss 3.2228   LearningRate 0.0195   Epoch: 11   Global Step: 138760   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:44:45,806-Speed 3329.20 samples/sec   Loss 3.1431   LearningRate 0.0195   Epoch: 11   Global Step: 138770   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:44:48,928-Speed 3281.40 samples/sec   Loss 3.2460   LearningRate 0.0195   Epoch: 11   Global Step: 138780   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:44:52,066-Speed 3263.72 samples/sec   Loss 3.1920   LearningRate 0.0195   Epoch: 11   Global Step: 138790   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:44:55,283-Speed 3184.40 samples/sec   Loss 3.1375   LearningRate 0.0195   Epoch: 11   Global Step: 138800   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:44:58,401-Speed 3285.94 samples/sec   Loss 3.1025   LearningRate 0.0195   Epoch: 11   Global Step: 138810   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:45:01,656-Speed 3146.67 samples/sec   Loss 3.1406   LearningRate 0.0195   Epoch: 11   Global Step: 138820   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:45:04,887-Speed 3170.36 samples/sec   Loss 3.1999   LearningRate 0.0195   Epoch: 11   Global Step: 138830   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:45:07,981-Speed 3310.31 samples/sec   Loss 3.1723   LearningRate 0.0195   Epoch: 11   Global Step: 138840   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 13:45:11,037-Speed 3351.88 samples/sec   Loss 3.2001   LearningRate 0.0195   Epoch: 11   Global Step: 138850   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:45:14,286-Speed 3152.49 samples/sec   Loss 3.1317   LearningRate 0.0195   Epoch: 11   Global Step: 138860   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:45:17,378-Speed 3313.53 samples/sec   Loss 3.1704   LearningRate 0.0194   Epoch: 11   Global Step: 138870   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:45:20,506-Speed 3274.67 samples/sec   Loss 3.1970   LearningRate 0.0194   Epoch: 11   Global Step: 138880   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:45:23,629-Speed 3279.73 samples/sec   Loss 3.1689   LearningRate 0.0194   Epoch: 11   Global Step: 138890   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:45:26,728-Speed 3305.26 samples/sec   Loss 3.2199   LearningRate 0.0194   Epoch: 11   Global Step: 138900   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:45:29,864-Speed 3265.88 samples/sec   Loss 3.2482   LearningRate 0.0194   Epoch: 11   Global Step: 138910   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:45:32,958-Speed 3311.70 samples/sec   Loss 3.2064   LearningRate 0.0194   Epoch: 11   Global Step: 138920   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:45:36,161-Speed 3197.93 samples/sec   Loss 3.1433   LearningRate 0.0194   Epoch: 11   Global Step: 138930   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:45:39,332-Speed 3230.18 samples/sec   Loss 3.2211   LearningRate 0.0194   Epoch: 11   Global Step: 138940   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:45:42,480-Speed 3253.38 samples/sec   Loss 3.1174   LearningRate 0.0194   Epoch: 11   Global Step: 138950   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:45:45,592-Speed 3292.64 samples/sec   Loss 3.2635   LearningRate 0.0194   Epoch: 11   Global Step: 138960   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:45:48,719-Speed 3275.38 samples/sec   Loss 3.1800   LearningRate 0.0194   Epoch: 11   Global Step: 138970   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:45:51,832-Speed 3290.29 samples/sec   Loss 3.2012   LearningRate 0.0194   Epoch: 11   Global Step: 138980   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:45:54,927-Speed 3309.64 samples/sec   Loss 3.2528   LearningRate 0.0194   Epoch: 11   Global Step: 138990   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:45:58,012-Speed 3319.94 samples/sec   Loss 3.2638   LearningRate 0.0194   Epoch: 11   Global Step: 139000   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:46:01,224-Speed 3189.20 samples/sec   Loss 3.2277   LearningRate 0.0194   Epoch: 11   Global Step: 139010   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:46:04,333-Speed 3295.12 samples/sec   Loss 3.2251   LearningRate 0.0194   Epoch: 11   Global Step: 139020   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:46:07,463-Speed 3272.18 samples/sec   Loss 3.1265   LearningRate 0.0194   Epoch: 11   Global Step: 139030   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:46:10,585-Speed 3281.38 samples/sec   Loss 3.2849   LearningRate 0.0194   Epoch: 11   Global Step: 139040   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:46:13,805-Speed 3181.31 samples/sec   Loss 3.1645   LearningRate 0.0194   Epoch: 11   Global Step: 139050   Fp16 Grad Scale: 131072   Required: 9 hours
Training: 2022-04-27 13:46:16,900-Speed 3309.02 samples/sec   Loss 3.2053   LearningRate 0.0194   Epoch: 11   Global Step: 139060   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:46:19,988-Speed 3317.41 samples/sec   Loss 3.2056   LearningRate 0.0194   Epoch: 11   Global Step: 139070   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:46:23,081-Speed 3311.52 samples/sec   Loss 3.2100   LearningRate 0.0194   Epoch: 11   Global Step: 139080   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:46:26,177-Speed 3308.94 samples/sec   Loss 3.2263   LearningRate 0.0194   Epoch: 11   Global Step: 139090   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:46:29,311-Speed 3267.71 samples/sec   Loss 3.2677   LearningRate 0.0194   Epoch: 11   Global Step: 139100   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:46:32,390-Speed 3327.47 samples/sec   Loss 3.1792   LearningRate 0.0194   Epoch: 11   Global Step: 139110   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:46:35,492-Speed 3301.83 samples/sec   Loss 3.2040   LearningRate 0.0194   Epoch: 11   Global Step: 139120   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:46:38,598-Speed 3298.79 samples/sec   Loss 3.2583   LearningRate 0.0194   Epoch: 11   Global Step: 139130   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:46:41,736-Speed 3263.42 samples/sec   Loss 3.1888   LearningRate 0.0194   Epoch: 11   Global Step: 139140   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:46:44,808-Speed 3334.45 samples/sec   Loss 3.1994   LearningRate 0.0193   Epoch: 11   Global Step: 139150   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:46:47,880-Speed 3334.48 samples/sec   Loss 3.1752   LearningRate 0.0193   Epoch: 11   Global Step: 139160   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:46:50,985-Speed 3298.86 samples/sec   Loss 3.2656   LearningRate 0.0193   Epoch: 11   Global Step: 139170   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:46:54,052-Speed 3339.76 samples/sec   Loss 3.2296   LearningRate 0.0193   Epoch: 11   Global Step: 139180   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:46:57,104-Speed 3357.16 samples/sec   Loss 3.2117   LearningRate 0.0193   Epoch: 11   Global Step: 139190   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:47:00,210-Speed 3297.24 samples/sec   Loss 3.2530   LearningRate 0.0193   Epoch: 11   Global Step: 139200   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:47:03,325-Speed 3289.26 samples/sec   Loss 3.2725   LearningRate 0.0193   Epoch: 11   Global Step: 139210   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:47:06,451-Speed 3276.54 samples/sec   Loss 3.1985   LearningRate 0.0193   Epoch: 11   Global Step: 139220   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:47:09,544-Speed 3311.55 samples/sec   Loss 3.1968   LearningRate 0.0193   Epoch: 11   Global Step: 139230   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:47:12,633-Speed 3316.47 samples/sec   Loss 3.2492   LearningRate 0.0193   Epoch: 11   Global Step: 139240   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:47:15,724-Speed 3313.18 samples/sec   Loss 3.2469   LearningRate 0.0193   Epoch: 11   Global Step: 139250   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:47:18,817-Speed 3312.58 samples/sec   Loss 3.2386   LearningRate 0.0193   Epoch: 11   Global Step: 139260   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:47:21,875-Speed 3349.87 samples/sec   Loss 3.1837   LearningRate 0.0193   Epoch: 11   Global Step: 139270   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:47:25,041-Speed 3234.44 samples/sec   Loss 3.2495   LearningRate 0.0193   Epoch: 11   Global Step: 139280   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:47:28,186-Speed 3257.23 samples/sec   Loss 3.2239   LearningRate 0.0193   Epoch: 11   Global Step: 139290   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:47:31,368-Speed 3219.72 samples/sec   Loss 3.2116   LearningRate 0.0193   Epoch: 11   Global Step: 139300   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:47:34,518-Speed 3252.04 samples/sec   Loss 3.2902   LearningRate 0.0193   Epoch: 11   Global Step: 139310   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:47:37,647-Speed 3273.21 samples/sec   Loss 3.2831   LearningRate 0.0193   Epoch: 11   Global Step: 139320   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:47:40,780-Speed 3269.75 samples/sec   Loss 3.2031   LearningRate 0.0193   Epoch: 11   Global Step: 139330   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:47:43,914-Speed 3268.46 samples/sec   Loss 3.1836   LearningRate 0.0193   Epoch: 11   Global Step: 139340   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:47:47,030-Speed 3286.76 samples/sec   Loss 3.3300   LearningRate 0.0193   Epoch: 11   Global Step: 139350   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:47:50,150-Speed 3282.77 samples/sec   Loss 3.3222   LearningRate 0.0193   Epoch: 11   Global Step: 139360   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:47:53,218-Speed 3340.07 samples/sec   Loss 3.2977   LearningRate 0.0193   Epoch: 11   Global Step: 139370   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:47:56,275-Speed 3350.86 samples/sec   Loss 3.2693   LearningRate 0.0193   Epoch: 11   Global Step: 139380   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:47:59,347-Speed 3333.80 samples/sec   Loss 3.2554   LearningRate 0.0193   Epoch: 11   Global Step: 139390   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:48:02,483-Speed 3265.94 samples/sec   Loss 3.2621   LearningRate 0.0193   Epoch: 11   Global Step: 139400   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:48:05,587-Speed 3301.14 samples/sec   Loss 3.2200   LearningRate 0.0193   Epoch: 11   Global Step: 139410   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:48:08,644-Speed 3349.80 samples/sec   Loss 3.2204   LearningRate 0.0193   Epoch: 11   Global Step: 139420   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:48:11,760-Speed 3287.24 samples/sec   Loss 3.2536   LearningRate 0.0192   Epoch: 11   Global Step: 139430   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:48:14,840-Speed 3325.92 samples/sec   Loss 3.2710   LearningRate 0.0192   Epoch: 11   Global Step: 139440   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:48:18,005-Speed 3236.76 samples/sec   Loss 3.1879   LearningRate 0.0192   Epoch: 11   Global Step: 139450   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:48:21,117-Speed 3291.51 samples/sec   Loss 3.3015   LearningRate 0.0192   Epoch: 11   Global Step: 139460   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:48:24,249-Speed 3270.40 samples/sec   Loss 3.2510   LearningRate 0.0192   Epoch: 11   Global Step: 139470   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:48:27,466-Speed 3183.82 samples/sec   Loss 3.3434   LearningRate 0.0192   Epoch: 11   Global Step: 139480   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:48:30,692-Speed 3175.80 samples/sec   Loss 3.2592   LearningRate 0.0192   Epoch: 11   Global Step: 139490   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:48:33,764-Speed 3334.20 samples/sec   Loss 3.2478   LearningRate 0.0192   Epoch: 11   Global Step: 139500   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:48:36,989-Speed 3176.16 samples/sec   Loss 3.2518   LearningRate 0.0192   Epoch: 11   Global Step: 139510   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:48:40,109-Speed 3282.68 samples/sec   Loss 3.2531   LearningRate 0.0192   Epoch: 11   Global Step: 139520   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:48:43,202-Speed 3312.51 samples/sec   Loss 3.3528   LearningRate 0.0192   Epoch: 11   Global Step: 139530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:48:46,308-Speed 3297.82 samples/sec   Loss 3.2559   LearningRate 0.0192   Epoch: 11   Global Step: 139540   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:48:49,435-Speed 3275.21 samples/sec   Loss 3.2653   LearningRate 0.0192   Epoch: 11   Global Step: 139550   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:48:52,554-Speed 3284.20 samples/sec   Loss 3.2914   LearningRate 0.0192   Epoch: 11   Global Step: 139560   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:48:55,647-Speed 3311.68 samples/sec   Loss 3.1999   LearningRate 0.0192   Epoch: 11   Global Step: 139570   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:48:58,779-Speed 3271.03 samples/sec   Loss 3.2495   LearningRate 0.0192   Epoch: 11   Global Step: 139580   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:49:01,883-Speed 3299.81 samples/sec   Loss 3.3061   LearningRate 0.0192   Epoch: 11   Global Step: 139590   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:49:04,960-Speed 3329.16 samples/sec   Loss 3.2620   LearningRate 0.0192   Epoch: 11   Global Step: 139600   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:49:08,016-Speed 3351.49 samples/sec   Loss 3.2473   LearningRate 0.0192   Epoch: 11   Global Step: 139610   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:49:11,097-Speed 3324.94 samples/sec   Loss 3.4010   LearningRate 0.0192   Epoch: 11   Global Step: 139620   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:49:14,282-Speed 3216.19 samples/sec   Loss 3.2803   LearningRate 0.0192   Epoch: 11   Global Step: 139630   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:49:17,376-Speed 3309.97 samples/sec   Loss 3.3379   LearningRate 0.0192   Epoch: 11   Global Step: 139640   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:49:20,452-Speed 3330.49 samples/sec   Loss 3.3389   LearningRate 0.0192   Epoch: 11   Global Step: 139650   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:49:23,530-Speed 3328.23 samples/sec   Loss 3.2073   LearningRate 0.0192   Epoch: 11   Global Step: 139660   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:49:26,623-Speed 3311.74 samples/sec   Loss 3.2849   LearningRate 0.0192   Epoch: 11   Global Step: 139670   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:49:29,696-Speed 3332.95 samples/sec   Loss 3.3273   LearningRate 0.0192   Epoch: 11   Global Step: 139680   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:49:32,802-Speed 3297.94 samples/sec   Loss 3.2448   LearningRate 0.0192   Epoch: 11   Global Step: 139690   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:49:35,942-Speed 3262.32 samples/sec   Loss 3.3080   LearningRate 0.0192   Epoch: 11   Global Step: 139700   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:49:39,026-Speed 3321.11 samples/sec   Loss 3.3257   LearningRate 0.0191   Epoch: 11   Global Step: 139710   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:49:42,064-Speed 3372.46 samples/sec   Loss 3.2817   LearningRate 0.0191   Epoch: 11   Global Step: 139720   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:49:45,181-Speed 3286.15 samples/sec   Loss 3.3291   LearningRate 0.0191   Epoch: 11   Global Step: 139730   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:49:48,336-Speed 3245.61 samples/sec   Loss 3.3205   LearningRate 0.0191   Epoch: 11   Global Step: 139740   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:49:51,439-Speed 3301.34 samples/sec   Loss 3.3121   LearningRate 0.0191   Epoch: 11   Global Step: 139750   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:49:54,671-Speed 3169.55 samples/sec   Loss 3.3329   LearningRate 0.0191   Epoch: 11   Global Step: 139760   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:49:57,757-Speed 3318.87 samples/sec   Loss 3.3584   LearningRate 0.0191   Epoch: 11   Global Step: 139770   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:50:00,892-Speed 3267.66 samples/sec   Loss 3.2591   LearningRate 0.0191   Epoch: 11   Global Step: 139780   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:50:04,058-Speed 3235.74 samples/sec   Loss 3.2959   LearningRate 0.0191   Epoch: 11   Global Step: 139790   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:50:07,164-Speed 3297.27 samples/sec   Loss 3.3085   LearningRate 0.0191   Epoch: 11   Global Step: 139800   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:50:10,252-Speed 3316.86 samples/sec   Loss 3.2903   LearningRate 0.0191   Epoch: 11   Global Step: 139810   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:50:13,333-Speed 3325.19 samples/sec   Loss 3.2874   LearningRate 0.0191   Epoch: 11   Global Step: 139820   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:50:16,471-Speed 3263.84 samples/sec   Loss 3.2664   LearningRate 0.0191   Epoch: 11   Global Step: 139830   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:50:19,545-Speed 3332.05 samples/sec   Loss 3.2399   LearningRate 0.0191   Epoch: 11   Global Step: 139840   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:50:22,634-Speed 3316.64 samples/sec   Loss 3.2864   LearningRate 0.0191   Epoch: 11   Global Step: 139850   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:50:25,728-Speed 3310.28 samples/sec   Loss 3.3064   LearningRate 0.0191   Epoch: 11   Global Step: 139860   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:50:28,846-Speed 3285.00 samples/sec   Loss 3.2940   LearningRate 0.0191   Epoch: 11   Global Step: 139870   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:50:31,942-Speed 3309.16 samples/sec   Loss 3.2878   LearningRate 0.0191   Epoch: 11   Global Step: 139880   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:50:35,065-Speed 3279.66 samples/sec   Loss 3.3082   LearningRate 0.0191   Epoch: 11   Global Step: 139890   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:50:38,142-Speed 3328.82 samples/sec   Loss 3.2937   LearningRate 0.0191   Epoch: 11   Global Step: 139900   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:50:41,328-Speed 3214.54 samples/sec   Loss 3.2184   LearningRate 0.0191   Epoch: 11   Global Step: 139910   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:50:44,472-Speed 3258.50 samples/sec   Loss 3.2915   LearningRate 0.0191   Epoch: 11   Global Step: 139920   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:50:47,562-Speed 3314.79 samples/sec   Loss 3.3602   LearningRate 0.0191   Epoch: 11   Global Step: 139930   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:50:50,689-Speed 3276.04 samples/sec   Loss 3.3344   LearningRate 0.0191   Epoch: 11   Global Step: 139940   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:50:53,802-Speed 3290.64 samples/sec   Loss 3.3327   LearningRate 0.0191   Epoch: 11   Global Step: 139950   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:50:56,891-Speed 3315.95 samples/sec   Loss 3.2338   LearningRate 0.0191   Epoch: 11   Global Step: 139960   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:50:59,964-Speed 3333.07 samples/sec   Loss 3.2635   LearningRate 0.0191   Epoch: 11   Global Step: 139970   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:51:03,101-Speed 3265.26 samples/sec   Loss 3.2097   LearningRate 0.0191   Epoch: 11   Global Step: 139980   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:51:06,253-Speed 3249.88 samples/sec   Loss 3.3287   LearningRate 0.0191   Epoch: 11   Global Step: 139990   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:51:09,366-Speed 3289.95 samples/sec   Loss 3.3354   LearningRate 0.0190   Epoch: 11   Global Step: 140000   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:51:12,497-Speed 3271.96 samples/sec   Loss 3.3043   LearningRate 0.0190   Epoch: 11   Global Step: 140010   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:51:15,612-Speed 3288.40 samples/sec   Loss 3.3262   LearningRate 0.0190   Epoch: 11   Global Step: 140020   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:51:18,726-Speed 3288.58 samples/sec   Loss 3.3160   LearningRate 0.0190   Epoch: 11   Global Step: 140030   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:51:21,833-Speed 3297.05 samples/sec   Loss 3.2808   LearningRate 0.0190   Epoch: 11   Global Step: 140040   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:51:24,992-Speed 3243.03 samples/sec   Loss 3.3146   LearningRate 0.0190   Epoch: 11   Global Step: 140050   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:51:28,118-Speed 3277.19 samples/sec   Loss 3.3378   LearningRate 0.0190   Epoch: 11   Global Step: 140060   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:51:31,225-Speed 3296.55 samples/sec   Loss 3.3151   LearningRate 0.0190   Epoch: 11   Global Step: 140070   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:51:34,304-Speed 3326.85 samples/sec   Loss 3.3134   LearningRate 0.0190   Epoch: 11   Global Step: 140080   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:51:37,416-Speed 3291.81 samples/sec   Loss 3.3041   LearningRate 0.0190   Epoch: 11   Global Step: 140090   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:51:40,504-Speed 3316.83 samples/sec   Loss 3.3347   LearningRate 0.0190   Epoch: 11   Global Step: 140100   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:51:43,619-Speed 3287.76 samples/sec   Loss 3.3140   LearningRate 0.0190   Epoch: 11   Global Step: 140110   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:51:46,699-Speed 3326.35 samples/sec   Loss 3.3141   LearningRate 0.0190   Epoch: 11   Global Step: 140120   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:51:49,825-Speed 3276.36 samples/sec   Loss 3.3322   LearningRate 0.0190   Epoch: 11   Global Step: 140130   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:51:52,905-Speed 3326.32 samples/sec   Loss 3.3012   LearningRate 0.0190   Epoch: 11   Global Step: 140140   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:51:56,022-Speed 3286.27 samples/sec   Loss 3.2751   LearningRate 0.0190   Epoch: 11   Global Step: 140150   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:51:59,125-Speed 3301.57 samples/sec   Loss 3.3996   LearningRate 0.0190   Epoch: 11   Global Step: 140160   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:52:02,193-Speed 3337.94 samples/sec   Loss 3.3846   LearningRate 0.0190   Epoch: 11   Global Step: 140170   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:52:05,349-Speed 3246.35 samples/sec   Loss 3.2754   LearningRate 0.0190   Epoch: 11   Global Step: 140180   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:52:08,425-Speed 3330.02 samples/sec   Loss 3.3354   LearningRate 0.0190   Epoch: 11   Global Step: 140190   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:52:11,533-Speed 3295.72 samples/sec   Loss 3.2515   LearningRate 0.0190   Epoch: 11   Global Step: 140200   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:52:14,663-Speed 3272.17 samples/sec   Loss 3.3058   LearningRate 0.0190   Epoch: 11   Global Step: 140210   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:52:17,836-Speed 3228.92 samples/sec   Loss 3.2881   LearningRate 0.0190   Epoch: 11   Global Step: 140220   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:52:20,946-Speed 3293.06 samples/sec   Loss 3.3872   LearningRate 0.0190   Epoch: 11   Global Step: 140230   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:52:24,050-Speed 3300.18 samples/sec   Loss 3.3729   LearningRate 0.0190   Epoch: 11   Global Step: 140240   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:52:27,179-Speed 3273.98 samples/sec   Loss 3.3744   LearningRate 0.0190   Epoch: 11   Global Step: 140250   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:52:30,332-Speed 3248.77 samples/sec   Loss 3.2718   LearningRate 0.0190   Epoch: 11   Global Step: 140260   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:52:33,403-Speed 3335.42 samples/sec   Loss 3.2373   LearningRate 0.0190   Epoch: 11   Global Step: 140270   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:52:36,512-Speed 3294.17 samples/sec   Loss 3.2817   LearningRate 0.0189   Epoch: 11   Global Step: 140280   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:52:39,615-Speed 3301.87 samples/sec   Loss 3.2898   LearningRate 0.0189   Epoch: 11   Global Step: 140290   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:52:42,722-Speed 3296.23 samples/sec   Loss 3.3630   LearningRate 0.0189   Epoch: 11   Global Step: 140300   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:52:45,792-Speed 3336.93 samples/sec   Loss 3.3253   LearningRate 0.0189   Epoch: 11   Global Step: 140310   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:52:48,918-Speed 3276.26 samples/sec   Loss 3.2806   LearningRate 0.0189   Epoch: 11   Global Step: 140320   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:52:52,044-Speed 3277.09 samples/sec   Loss 3.3552   LearningRate 0.0189   Epoch: 11   Global Step: 140330   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:52:55,184-Speed 3262.32 samples/sec   Loss 3.2288   LearningRate 0.0189   Epoch: 11   Global Step: 140340   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:52:58,273-Speed 3315.83 samples/sec   Loss 3.2734   LearningRate 0.0189   Epoch: 11   Global Step: 140350   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:53:01,387-Speed 3289.18 samples/sec   Loss 3.3574   LearningRate 0.0189   Epoch: 11   Global Step: 140360   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:53:04,490-Speed 3301.57 samples/sec   Loss 3.3078   LearningRate 0.0189   Epoch: 11   Global Step: 140370   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:53:07,606-Speed 3286.65 samples/sec   Loss 3.3512   LearningRate 0.0189   Epoch: 11   Global Step: 140380   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:53:10,724-Speed 3285.49 samples/sec   Loss 3.3222   LearningRate 0.0189   Epoch: 11   Global Step: 140390   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:53:13,842-Speed 3284.80 samples/sec   Loss 3.3509   LearningRate 0.0189   Epoch: 11   Global Step: 140400   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:53:16,972-Speed 3273.19 samples/sec   Loss 3.3128   LearningRate 0.0189   Epoch: 11   Global Step: 140410   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:53:20,053-Speed 3324.69 samples/sec   Loss 3.3279   LearningRate 0.0189   Epoch: 11   Global Step: 140420   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:53:23,130-Speed 3328.59 samples/sec   Loss 3.2204   LearningRate 0.0189   Epoch: 11   Global Step: 140430   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:53:26,286-Speed 3245.88 samples/sec   Loss 3.4259   LearningRate 0.0189   Epoch: 11   Global Step: 140440   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:53:29,415-Speed 3273.85 samples/sec   Loss 3.3583   LearningRate 0.0189   Epoch: 11   Global Step: 140450   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:53:32,545-Speed 3272.05 samples/sec   Loss 3.3644   LearningRate 0.0189   Epoch: 11   Global Step: 140460   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:53:35,678-Speed 3270.05 samples/sec   Loss 3.2926   LearningRate 0.0189   Epoch: 11   Global Step: 140470   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:53:38,832-Speed 3246.98 samples/sec   Loss 3.4356   LearningRate 0.0189   Epoch: 11   Global Step: 140480   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:53:41,975-Speed 3259.61 samples/sec   Loss 3.2676   LearningRate 0.0189   Epoch: 11   Global Step: 140490   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:53:45,054-Speed 3326.64 samples/sec   Loss 3.3292   LearningRate 0.0189   Epoch: 11   Global Step: 140500   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:53:48,132-Speed 3328.25 samples/sec   Loss 3.3331   LearningRate 0.0189   Epoch: 11   Global Step: 140510   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:53:51,246-Speed 3288.73 samples/sec   Loss 3.3247   LearningRate 0.0189   Epoch: 11   Global Step: 140520   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:53:54,370-Speed 3278.80 samples/sec   Loss 3.2245   LearningRate 0.0189   Epoch: 11   Global Step: 140530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:53:57,453-Speed 3322.60 samples/sec   Loss 3.3307   LearningRate 0.0189   Epoch: 11   Global Step: 140540   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:54:00,546-Speed 3312.18 samples/sec   Loss 3.2594   LearningRate 0.0189   Epoch: 11   Global Step: 140550   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:54:03,700-Speed 3248.01 samples/sec   Loss 3.3387   LearningRate 0.0189   Epoch: 11   Global Step: 140560   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:54:06,863-Speed 3238.45 samples/sec   Loss 3.3514   LearningRate 0.0188   Epoch: 11   Global Step: 140570   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:54:09,937-Speed 3331.75 samples/sec   Loss 3.3935   LearningRate 0.0188   Epoch: 11   Global Step: 140580   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:54:13,090-Speed 3248.78 samples/sec   Loss 3.3691   LearningRate 0.0188   Epoch: 11   Global Step: 140590   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:54:16,213-Speed 3280.18 samples/sec   Loss 3.3079   LearningRate 0.0188   Epoch: 11   Global Step: 140600   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:54:19,394-Speed 3219.91 samples/sec   Loss 3.3778   LearningRate 0.0188   Epoch: 11   Global Step: 140610   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:54:22,486-Speed 3313.69 samples/sec   Loss 3.3103   LearningRate 0.0188   Epoch: 11   Global Step: 140620   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:54:25,600-Speed 3288.87 samples/sec   Loss 3.3289   LearningRate 0.0188   Epoch: 11   Global Step: 140630   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:54:28,730-Speed 3272.14 samples/sec   Loss 3.3453   LearningRate 0.0188   Epoch: 11   Global Step: 140640   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:54:31,904-Speed 3227.82 samples/sec   Loss 3.3634   LearningRate 0.0188   Epoch: 11   Global Step: 140650   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:54:35,073-Speed 3231.76 samples/sec   Loss 3.3660   LearningRate 0.0188   Epoch: 11   Global Step: 140660   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:54:38,219-Speed 3256.70 samples/sec   Loss 3.3232   LearningRate 0.0188   Epoch: 11   Global Step: 140670   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:54:41,349-Speed 3272.54 samples/sec   Loss 3.4350   LearningRate 0.0188   Epoch: 11   Global Step: 140680   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:54:44,529-Speed 3221.11 samples/sec   Loss 3.3090   LearningRate 0.0188   Epoch: 11   Global Step: 140690   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:54:47,683-Speed 3248.40 samples/sec   Loss 3.3347   LearningRate 0.0188   Epoch: 11   Global Step: 140700   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:54:50,789-Speed 3297.41 samples/sec   Loss 3.4283   LearningRate 0.0188   Epoch: 11   Global Step: 140710   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:54:53,891-Speed 3301.62 samples/sec   Loss 3.4109   LearningRate 0.0188   Epoch: 11   Global Step: 140720   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:54:56,983-Speed 3313.32 samples/sec   Loss 3.4534   LearningRate 0.0188   Epoch: 11   Global Step: 140730   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:55:00,099-Speed 3287.69 samples/sec   Loss 3.3945   LearningRate 0.0188   Epoch: 11   Global Step: 140740   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:55:03,221-Speed 3280.36 samples/sec   Loss 3.3059   LearningRate 0.0188   Epoch: 11   Global Step: 140750   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:55:06,373-Speed 3249.51 samples/sec   Loss 3.3833   LearningRate 0.0188   Epoch: 11   Global Step: 140760   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:55:09,472-Speed 3306.07 samples/sec   Loss 3.3397   LearningRate 0.0188   Epoch: 11   Global Step: 140770   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:55:12,621-Speed 3252.11 samples/sec   Loss 3.3200   LearningRate 0.0188   Epoch: 11   Global Step: 140780   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:55:15,757-Speed 3266.45 samples/sec   Loss 3.3351   LearningRate 0.0188   Epoch: 11   Global Step: 140790   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:55:18,827-Speed 3336.28 samples/sec   Loss 3.4107   LearningRate 0.0188   Epoch: 11   Global Step: 140800   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:55:21,907-Speed 3325.76 samples/sec   Loss 3.3494   LearningRate 0.0188   Epoch: 11   Global Step: 140810   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:55:24,995-Speed 3317.59 samples/sec   Loss 3.3483   LearningRate 0.0188   Epoch: 11   Global Step: 140820   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:55:28,118-Speed 3280.17 samples/sec   Loss 3.3453   LearningRate 0.0188   Epoch: 11   Global Step: 140830   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:55:31,240-Speed 3280.02 samples/sec   Loss 3.3244   LearningRate 0.0188   Epoch: 11   Global Step: 140840   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:55:34,387-Speed 3254.95 samples/sec   Loss 3.3906   LearningRate 0.0188   Epoch: 11   Global Step: 140850   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:55:37,531-Speed 3258.22 samples/sec   Loss 3.3771   LearningRate 0.0187   Epoch: 11   Global Step: 140860   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:55:40,674-Speed 3259.26 samples/sec   Loss 3.3885   LearningRate 0.0187   Epoch: 11   Global Step: 140870   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:55:43,744-Speed 3336.26 samples/sec   Loss 3.3664   LearningRate 0.0187   Epoch: 11   Global Step: 140880   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:55:46,829-Speed 3320.37 samples/sec   Loss 3.4062   LearningRate 0.0187   Epoch: 11   Global Step: 140890   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:55:49,919-Speed 3314.87 samples/sec   Loss 3.4396   LearningRate 0.0187   Epoch: 11   Global Step: 140900   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:55:53,087-Speed 3233.11 samples/sec   Loss 3.4236   LearningRate 0.0187   Epoch: 11   Global Step: 140910   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:55:56,232-Speed 3257.10 samples/sec   Loss 3.3612   LearningRate 0.0187   Epoch: 11   Global Step: 140920   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:55:59,335-Speed 3301.76 samples/sec   Loss 3.3196   LearningRate 0.0187   Epoch: 11   Global Step: 140930   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:56:02,411-Speed 3329.25 samples/sec   Loss 3.3603   LearningRate 0.0187   Epoch: 11   Global Step: 140940   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:56:05,573-Speed 3239.70 samples/sec   Loss 3.3415   LearningRate 0.0187   Epoch: 11   Global Step: 140950   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:56:08,667-Speed 3311.03 samples/sec   Loss 3.2635   LearningRate 0.0187   Epoch: 11   Global Step: 140960   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:56:11,759-Speed 3312.09 samples/sec   Loss 3.3190   LearningRate 0.0187   Epoch: 11   Global Step: 140970   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:56:14,838-Speed 3327.66 samples/sec   Loss 3.2943   LearningRate 0.0187   Epoch: 11   Global Step: 140980   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:56:17,935-Speed 3306.79 samples/sec   Loss 3.4303   LearningRate 0.0187   Epoch: 11   Global Step: 140990   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:56:21,019-Speed 3321.89 samples/sec   Loss 3.3828   LearningRate 0.0187   Epoch: 11   Global Step: 141000   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:56:24,134-Speed 3287.72 samples/sec   Loss 3.3761   LearningRate 0.0187   Epoch: 11   Global Step: 141010   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:56:27,290-Speed 3246.03 samples/sec   Loss 3.3834   LearningRate 0.0187   Epoch: 11   Global Step: 141020   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:56:30,517-Speed 3173.53 samples/sec   Loss 3.4165   LearningRate 0.0187   Epoch: 11   Global Step: 141030   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:56:33,586-Speed 3337.73 samples/sec   Loss 3.3504   LearningRate 0.0187   Epoch: 11   Global Step: 141040   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:56:36,745-Speed 3242.88 samples/sec   Loss 3.3311   LearningRate 0.0187   Epoch: 11   Global Step: 141050   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:56:39,819-Speed 3332.28 samples/sec   Loss 3.4006   LearningRate 0.0187   Epoch: 11   Global Step: 141060   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:56:42,932-Speed 3290.09 samples/sec   Loss 3.3289   LearningRate 0.0187   Epoch: 11   Global Step: 141070   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:56:46,023-Speed 3314.54 samples/sec   Loss 3.3632   LearningRate 0.0187   Epoch: 11   Global Step: 141080   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:56:49,093-Speed 3336.87 samples/sec   Loss 3.3765   LearningRate 0.0187   Epoch: 11   Global Step: 141090   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:56:52,197-Speed 3299.51 samples/sec   Loss 3.3436   LearningRate 0.0187   Epoch: 11   Global Step: 141100   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:56:55,408-Speed 3190.22 samples/sec   Loss 3.2790   LearningRate 0.0187   Epoch: 11   Global Step: 141110   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:56:58,478-Speed 3336.68 samples/sec   Loss 3.3496   LearningRate 0.0187   Epoch: 11   Global Step: 141120   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:57:01,565-Speed 3318.66 samples/sec   Loss 3.4417   LearningRate 0.0187   Epoch: 11   Global Step: 141130   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:57:04,704-Speed 3263.02 samples/sec   Loss 3.3097   LearningRate 0.0186   Epoch: 11   Global Step: 141140   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:57:07,760-Speed 3352.12 samples/sec   Loss 3.3391   LearningRate 0.0186   Epoch: 11   Global Step: 141150   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:57:10,834-Speed 3332.30 samples/sec   Loss 3.4423   LearningRate 0.0186   Epoch: 11   Global Step: 141160   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:57:13,984-Speed 3252.16 samples/sec   Loss 3.4114   LearningRate 0.0186   Epoch: 11   Global Step: 141170   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:57:17,193-Speed 3192.14 samples/sec   Loss 3.3152   LearningRate 0.0186   Epoch: 11   Global Step: 141180   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:57:20,269-Speed 3329.77 samples/sec   Loss 3.3376   LearningRate 0.0186   Epoch: 11   Global Step: 141190   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:57:23,525-Speed 3145.79 samples/sec   Loss 3.3786   LearningRate 0.0186   Epoch: 11   Global Step: 141200   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:57:26,664-Speed 3263.56 samples/sec   Loss 3.3984   LearningRate 0.0186   Epoch: 11   Global Step: 141210   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:57:29,779-Speed 3288.29 samples/sec   Loss 3.3971   LearningRate 0.0186   Epoch: 11   Global Step: 141220   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:57:32,881-Speed 3302.61 samples/sec   Loss 3.3154   LearningRate 0.0186   Epoch: 11   Global Step: 141230   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:57:36,012-Speed 3270.53 samples/sec   Loss 3.4299   LearningRate 0.0186   Epoch: 11   Global Step: 141240   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:57:39,124-Speed 3291.72 samples/sec   Loss 3.2813   LearningRate 0.0186   Epoch: 11   Global Step: 141250   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:57:42,231-Speed 3297.62 samples/sec   Loss 3.3475   LearningRate 0.0186   Epoch: 11   Global Step: 141260   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:57:45,286-Speed 3352.97 samples/sec   Loss 3.3740   LearningRate 0.0186   Epoch: 11   Global Step: 141270   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:57:48,391-Speed 3298.30 samples/sec   Loss 3.3622   LearningRate 0.0186   Epoch: 11   Global Step: 141280   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:57:51,510-Speed 3284.75 samples/sec   Loss 3.3288   LearningRate 0.0186   Epoch: 11   Global Step: 141290   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:57:54,653-Speed 3258.86 samples/sec   Loss 3.3647   LearningRate 0.0186   Epoch: 11   Global Step: 141300   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:57:57,721-Speed 3339.10 samples/sec   Loss 3.3862   LearningRate 0.0186   Epoch: 11   Global Step: 141310   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:58:00,854-Speed 3268.80 samples/sec   Loss 3.3894   LearningRate 0.0186   Epoch: 11   Global Step: 141320   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:58:03,976-Speed 3280.68 samples/sec   Loss 3.3030   LearningRate 0.0186   Epoch: 11   Global Step: 141330   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:58:07,108-Speed 3270.51 samples/sec   Loss 3.4233   LearningRate 0.0186   Epoch: 11   Global Step: 141340   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:58:10,191-Speed 3323.17 samples/sec   Loss 3.3524   LearningRate 0.0186   Epoch: 11   Global Step: 141350   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:58:13,259-Speed 3338.27 samples/sec   Loss 3.4972   LearningRate 0.0186   Epoch: 11   Global Step: 141360   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:58:16,322-Speed 3344.52 samples/sec   Loss 3.3834   LearningRate 0.0186   Epoch: 11   Global Step: 141370   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:58:19,368-Speed 3362.53 samples/sec   Loss 3.4401   LearningRate 0.0186   Epoch: 11   Global Step: 141380   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:58:22,454-Speed 3319.70 samples/sec   Loss 3.3647   LearningRate 0.0186   Epoch: 11   Global Step: 141390   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:58:25,527-Speed 3333.61 samples/sec   Loss 3.4403   LearningRate 0.0186   Epoch: 11   Global Step: 141400   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:58:28,638-Speed 3292.63 samples/sec   Loss 3.3089   LearningRate 0.0186   Epoch: 11   Global Step: 141410   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 13:58:31,724-Speed 3319.02 samples/sec   Loss 3.4323   LearningRate 0.0186   Epoch: 11   Global Step: 141420   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:58:34,804-Speed 3326.11 samples/sec   Loss 3.3652   LearningRate 0.0185   Epoch: 11   Global Step: 141430   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:58:37,956-Speed 3249.75 samples/sec   Loss 3.3221   LearningRate 0.0185   Epoch: 11   Global Step: 141440   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:58:41,007-Speed 3356.75 samples/sec   Loss 3.3757   LearningRate 0.0185   Epoch: 11   Global Step: 141450   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:58:44,071-Speed 3342.89 samples/sec   Loss 3.4484   LearningRate 0.0185   Epoch: 11   Global Step: 141460   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:58:47,177-Speed 3298.67 samples/sec   Loss 3.4092   LearningRate 0.0185   Epoch: 11   Global Step: 141470   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:58:50,309-Speed 3270.47 samples/sec   Loss 3.4598   LearningRate 0.0185   Epoch: 11   Global Step: 141480   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:58:53,409-Speed 3304.57 samples/sec   Loss 3.4014   LearningRate 0.0185   Epoch: 11   Global Step: 141490   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:58:56,464-Speed 3352.50 samples/sec   Loss 3.4219   LearningRate 0.0185   Epoch: 11   Global Step: 141500   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:58:59,541-Speed 3328.57 samples/sec   Loss 3.3819   LearningRate 0.0185   Epoch: 11   Global Step: 141510   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:59:02,600-Speed 3348.67 samples/sec   Loss 3.3797   LearningRate 0.0185   Epoch: 11   Global Step: 141520   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:59:05,673-Speed 3333.59 samples/sec   Loss 3.3513   LearningRate 0.0185   Epoch: 11   Global Step: 141530   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:59:08,751-Speed 3328.29 samples/sec   Loss 3.3794   LearningRate 0.0185   Epoch: 11   Global Step: 141540   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:59:11,871-Speed 3282.47 samples/sec   Loss 3.4620   LearningRate 0.0185   Epoch: 11   Global Step: 141550   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:59:15,068-Speed 3204.11 samples/sec   Loss 3.4186   LearningRate 0.0185   Epoch: 11   Global Step: 141560   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:59:18,212-Speed 3258.76 samples/sec   Loss 3.3303   LearningRate 0.0185   Epoch: 11   Global Step: 141570   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:59:21,293-Speed 3324.60 samples/sec   Loss 3.4307   LearningRate 0.0185   Epoch: 11   Global Step: 141580   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:59:24,359-Speed 3340.46 samples/sec   Loss 3.4956   LearningRate 0.0185   Epoch: 11   Global Step: 141590   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:59:27,417-Speed 3349.91 samples/sec   Loss 3.3757   LearningRate 0.0185   Epoch: 11   Global Step: 141600   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:59:30,518-Speed 3303.07 samples/sec   Loss 3.4282   LearningRate 0.0185   Epoch: 11   Global Step: 141610   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:59:33,598-Speed 3325.51 samples/sec   Loss 3.4040   LearningRate 0.0185   Epoch: 11   Global Step: 141620   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:59:36,689-Speed 3313.26 samples/sec   Loss 3.4410   LearningRate 0.0185   Epoch: 11   Global Step: 141630   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:59:39,826-Speed 3265.59 samples/sec   Loss 3.3845   LearningRate 0.0185   Epoch: 11   Global Step: 141640   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 13:59:42,987-Speed 3241.20 samples/sec   Loss 3.3716   LearningRate 0.0185   Epoch: 11   Global Step: 141650   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:59:46,058-Speed 3335.53 samples/sec   Loss 3.4766   LearningRate 0.0185   Epoch: 11   Global Step: 141660   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:59:49,240-Speed 3219.28 samples/sec   Loss 3.4016   LearningRate 0.0185   Epoch: 11   Global Step: 141670   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:59:52,332-Speed 3312.32 samples/sec   Loss 3.3374   LearningRate 0.0185   Epoch: 11   Global Step: 141680   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:59:55,390-Speed 3349.21 samples/sec   Loss 3.4716   LearningRate 0.0185   Epoch: 11   Global Step: 141690   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 13:59:58,468-Speed 3328.31 samples/sec   Loss 3.3621   LearningRate 0.0185   Epoch: 11   Global Step: 141700   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:00:01,572-Speed 3299.90 samples/sec   Loss 3.4454   LearningRate 0.0185   Epoch: 11   Global Step: 141710   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:00:04,661-Speed 3316.63 samples/sec   Loss 3.3634   LearningRate 0.0184   Epoch: 11   Global Step: 141720   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:00:07,721-Speed 3347.35 samples/sec   Loss 3.4366   LearningRate 0.0184   Epoch: 11   Global Step: 141730   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:00:10,773-Speed 3356.49 samples/sec   Loss 3.4710   LearningRate 0.0184   Epoch: 11   Global Step: 141740   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:00:13,927-Speed 3247.00 samples/sec   Loss 3.4175   LearningRate 0.0184   Epoch: 11   Global Step: 141750   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:00:17,083-Speed 3246.59 samples/sec   Loss 3.3677   LearningRate 0.0184   Epoch: 11   Global Step: 141760   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:00:20,240-Speed 3244.01 samples/sec   Loss 3.4163   LearningRate 0.0184   Epoch: 11   Global Step: 141770   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:00:23,302-Speed 3345.44 samples/sec   Loss 3.3902   LearningRate 0.0184   Epoch: 11   Global Step: 141780   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:00:26,412-Speed 3293.51 samples/sec   Loss 3.4842   LearningRate 0.0184   Epoch: 11   Global Step: 141790   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:00:29,528-Speed 3286.95 samples/sec   Loss 3.4244   LearningRate 0.0184   Epoch: 11   Global Step: 141800   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:00:32,608-Speed 3325.81 samples/sec   Loss 3.3593   LearningRate 0.0184   Epoch: 11   Global Step: 141810   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:00:35,709-Speed 3303.77 samples/sec   Loss 3.4071   LearningRate 0.0184   Epoch: 11   Global Step: 141820   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:00:38,784-Speed 3330.64 samples/sec   Loss 3.4430   LearningRate 0.0184   Epoch: 11   Global Step: 141830   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:00:41,844-Speed 3347.35 samples/sec   Loss 3.3025   LearningRate 0.0184   Epoch: 11   Global Step: 141840   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:00:44,873-Speed 3382.36 samples/sec   Loss 3.4277   LearningRate 0.0184   Epoch: 11   Global Step: 141850   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:00:47,972-Speed 3305.39 samples/sec   Loss 3.3127   LearningRate 0.0184   Epoch: 11   Global Step: 141860   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:00:51,093-Speed 3281.64 samples/sec   Loss 3.4903   LearningRate 0.0184   Epoch: 11   Global Step: 141870   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:00:54,239-Speed 3256.04 samples/sec   Loss 3.4390   LearningRate 0.0184   Epoch: 11   Global Step: 141880   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:00:57,307-Speed 3339.07 samples/sec   Loss 3.3270   LearningRate 0.0184   Epoch: 11   Global Step: 141890   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:01:00,411-Speed 3299.66 samples/sec   Loss 3.4901   LearningRate 0.0184   Epoch: 11   Global Step: 141900   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:01:03,592-Speed 3220.45 samples/sec   Loss 3.5091   LearningRate 0.0184   Epoch: 11   Global Step: 141910   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:01:06,644-Speed 3356.46 samples/sec   Loss 3.4035   LearningRate 0.0184   Epoch: 11   Global Step: 141920   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:01:09,753-Speed 3294.20 samples/sec   Loss 3.3630   LearningRate 0.0184   Epoch: 11   Global Step: 141930   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:01:12,856-Speed 3301.65 samples/sec   Loss 3.4004   LearningRate 0.0184   Epoch: 11   Global Step: 141940   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:01:15,983-Speed 3275.37 samples/sec   Loss 3.4221   LearningRate 0.0184   Epoch: 11   Global Step: 141950   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:01:19,118-Speed 3267.37 samples/sec   Loss 3.3251   LearningRate 0.0184   Epoch: 11   Global Step: 141960   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:01:22,164-Speed 3362.97 samples/sec   Loss 3.3122   LearningRate 0.0184   Epoch: 11   Global Step: 141970   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:01:25,281-Speed 3286.13 samples/sec   Loss 3.4548   LearningRate 0.0184   Epoch: 11   Global Step: 141980   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:01:28,386-Speed 3299.29 samples/sec   Loss 3.3949   LearningRate 0.0184   Epoch: 11   Global Step: 141990   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:01:31,484-Speed 3306.14 samples/sec   Loss 3.4791   LearningRate 0.0184   Epoch: 11   Global Step: 142000   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:01:34,652-Speed 3233.32 samples/sec   Loss 3.4414   LearningRate 0.0183   Epoch: 11   Global Step: 142010   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:01:37,852-Speed 3201.62 samples/sec   Loss 3.5097   LearningRate 0.0183   Epoch: 11   Global Step: 142020   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:01:40,943-Speed 3314.22 samples/sec   Loss 3.4678   LearningRate 0.0183   Epoch: 11   Global Step: 142030   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:01:44,010-Speed 3339.82 samples/sec   Loss 3.3994   LearningRate 0.0183   Epoch: 11   Global Step: 142040   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:01:47,126-Speed 3286.86 samples/sec   Loss 3.4569   LearningRate 0.0183   Epoch: 11   Global Step: 142050   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:01:50,200-Speed 3332.04 samples/sec   Loss 3.4155   LearningRate 0.0183   Epoch: 11   Global Step: 142060   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:01:53,290-Speed 3314.67 samples/sec   Loss 3.4020   LearningRate 0.0183   Epoch: 11   Global Step: 142070   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:01:56,367-Speed 3330.24 samples/sec   Loss 3.4327   LearningRate 0.0183   Epoch: 11   Global Step: 142080   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:01:59,431-Speed 3342.28 samples/sec   Loss 3.3997   LearningRate 0.0183   Epoch: 11   Global Step: 142090   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:02:02,513-Speed 3323.47 samples/sec   Loss 3.3889   LearningRate 0.0183   Epoch: 11   Global Step: 142100   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:02:05,608-Speed 3310.26 samples/sec   Loss 3.3316   LearningRate 0.0183   Epoch: 11   Global Step: 142110   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:02:08,677-Speed 3336.88 samples/sec   Loss 3.4565   LearningRate 0.0183   Epoch: 11   Global Step: 142120   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:02:11,740-Speed 3344.17 samples/sec   Loss 3.4385   LearningRate 0.0183   Epoch: 11   Global Step: 142130   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:02:14,822-Speed 3323.75 samples/sec   Loss 3.3938   LearningRate 0.0183   Epoch: 11   Global Step: 142140   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:02:17,918-Speed 3309.16 samples/sec   Loss 3.3667   LearningRate 0.0183   Epoch: 11   Global Step: 142150   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 14:02:20,980-Speed 3344.83 samples/sec   Loss 3.3868   LearningRate 0.0183   Epoch: 11   Global Step: 142160   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:02:24,067-Speed 3319.04 samples/sec   Loss 3.4388   LearningRate 0.0183   Epoch: 11   Global Step: 142170   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:02:27,130-Speed 3343.24 samples/sec   Loss 3.4377   LearningRate 0.0183   Epoch: 11   Global Step: 142180   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:02:30,236-Speed 3297.90 samples/sec   Loss 3.3340   LearningRate 0.0183   Epoch: 11   Global Step: 142190   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:02:33,334-Speed 3306.64 samples/sec   Loss 3.3496   LearningRate 0.0183   Epoch: 11   Global Step: 142200   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:02:36,408-Speed 3331.99 samples/sec   Loss 3.4518   LearningRate 0.0183   Epoch: 11   Global Step: 142210   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:02:39,552-Speed 3257.75 samples/sec   Loss 3.4010   LearningRate 0.0183   Epoch: 11   Global Step: 142220   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:02:42,772-Speed 3181.88 samples/sec   Loss 3.4062   LearningRate 0.0183   Epoch: 11   Global Step: 142230   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:02:45,834-Speed 3345.08 samples/sec   Loss 3.4610   LearningRate 0.0183   Epoch: 11   Global Step: 142240   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:02:48,904-Speed 3336.22 samples/sec   Loss 3.4122   LearningRate 0.0183   Epoch: 11   Global Step: 142250   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:02:52,048-Speed 3258.09 samples/sec   Loss 3.4514   LearningRate 0.0183   Epoch: 11   Global Step: 142260   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:02:55,137-Speed 3316.62 samples/sec   Loss 3.4621   LearningRate 0.0183   Epoch: 11   Global Step: 142270   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:02:58,217-Speed 3325.46 samples/sec   Loss 3.4431   LearningRate 0.0183   Epoch: 11   Global Step: 142280   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:03:01,354-Speed 3264.88 samples/sec   Loss 3.4260   LearningRate 0.0183   Epoch: 11   Global Step: 142290   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:03:04,433-Speed 3326.88 samples/sec   Loss 3.4895   LearningRate 0.0182   Epoch: 11   Global Step: 142300   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:03:07,504-Speed 3335.46 samples/sec   Loss 3.3594   LearningRate 0.0182   Epoch: 11   Global Step: 142310   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:03:10,566-Speed 3345.26 samples/sec   Loss 3.4406   LearningRate 0.0182   Epoch: 11   Global Step: 142320   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:03:13,676-Speed 3292.98 samples/sec   Loss 3.5123   LearningRate 0.0182   Epoch: 11   Global Step: 142330   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:03:16,763-Speed 3318.53 samples/sec   Loss 3.4548   LearningRate 0.0182   Epoch: 11   Global Step: 142340   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:03:19,901-Speed 3264.83 samples/sec   Loss 3.4174   LearningRate 0.0182   Epoch: 11   Global Step: 142350   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:03:22,993-Speed 3312.71 samples/sec   Loss 3.4671   LearningRate 0.0182   Epoch: 11   Global Step: 142360   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:03:26,110-Speed 3286.60 samples/sec   Loss 3.5378   LearningRate 0.0182   Epoch: 11   Global Step: 142370   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:03:29,191-Speed 3324.60 samples/sec   Loss 3.4702   LearningRate 0.0182   Epoch: 11   Global Step: 142380   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:03:32,306-Speed 3288.39 samples/sec   Loss 3.3447   LearningRate 0.0182   Epoch: 11   Global Step: 142390   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:03:35,464-Speed 3243.42 samples/sec   Loss 3.4405   LearningRate 0.0182   Epoch: 11   Global Step: 142400   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:03:38,554-Speed 3315.13 samples/sec   Loss 3.4272   LearningRate 0.0182   Epoch: 11   Global Step: 142410   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 14:03:41,638-Speed 3321.64 samples/sec   Loss 3.4160   LearningRate 0.0182   Epoch: 11   Global Step: 142420   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:03:44,698-Speed 3347.12 samples/sec   Loss 3.4492   LearningRate 0.0182   Epoch: 11   Global Step: 142430   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:03:47,821-Speed 3279.56 samples/sec   Loss 3.3662   LearningRate 0.0182   Epoch: 11   Global Step: 142440   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:03:50,982-Speed 3240.48 samples/sec   Loss 3.4650   LearningRate 0.0182   Epoch: 11   Global Step: 142450   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:03:54,128-Speed 3255.84 samples/sec   Loss 3.4438   LearningRate 0.0182   Epoch: 11   Global Step: 142460   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:03:57,202-Speed 3332.80 samples/sec   Loss 3.4654   LearningRate 0.0182   Epoch: 11   Global Step: 142470   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:04:00,307-Speed 3298.64 samples/sec   Loss 3.3656   LearningRate 0.0182   Epoch: 11   Global Step: 142480   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:04:03,377-Speed 3336.75 samples/sec   Loss 3.3526   LearningRate 0.0182   Epoch: 11   Global Step: 142490   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:04:06,464-Speed 3317.76 samples/sec   Loss 3.4210   LearningRate 0.0182   Epoch: 11   Global Step: 142500   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:04:09,516-Speed 3356.81 samples/sec   Loss 3.5067   LearningRate 0.0182   Epoch: 11   Global Step: 142510   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:04:12,628-Speed 3291.54 samples/sec   Loss 3.4801   LearningRate 0.0182   Epoch: 11   Global Step: 142520   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:04:15,748-Speed 3282.85 samples/sec   Loss 3.4719   LearningRate 0.0182   Epoch: 11   Global Step: 142530   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:04:18,829-Speed 3324.45 samples/sec   Loss 3.4175   LearningRate 0.0182   Epoch: 11   Global Step: 142540   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:04:21,891-Speed 3345.83 samples/sec   Loss 3.4227   LearningRate 0.0182   Epoch: 11   Global Step: 142550   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:04:25,000-Speed 3295.11 samples/sec   Loss 3.4201   LearningRate 0.0182   Epoch: 11   Global Step: 142560   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:04:28,191-Speed 3209.36 samples/sec   Loss 3.4176   LearningRate 0.0182   Epoch: 11   Global Step: 142570   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:04:31,264-Speed 3333.41 samples/sec   Loss 3.3642   LearningRate 0.0182   Epoch: 11   Global Step: 142580   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:04:34,338-Speed 3332.02 samples/sec   Loss 3.4657   LearningRate 0.0181   Epoch: 11   Global Step: 142590   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:04:37,423-Speed 3320.81 samples/sec   Loss 3.4526   LearningRate 0.0181   Epoch: 11   Global Step: 142600   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:04:40,484-Speed 3346.46 samples/sec   Loss 3.4574   LearningRate 0.0181   Epoch: 11   Global Step: 142610   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:04:43,574-Speed 3315.14 samples/sec   Loss 3.4010   LearningRate 0.0181   Epoch: 11   Global Step: 142620   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:04:46,633-Speed 3348.58 samples/sec   Loss 3.4494   LearningRate 0.0181   Epoch: 11   Global Step: 142630   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:04:49,773-Speed 3263.00 samples/sec   Loss 3.4100   LearningRate 0.0181   Epoch: 11   Global Step: 142640   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:04:52,951-Speed 3222.91 samples/sec   Loss 3.4512   LearningRate 0.0181   Epoch: 11   Global Step: 142650   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:04:56,018-Speed 3340.02 samples/sec   Loss 3.4470   LearningRate 0.0181   Epoch: 11   Global Step: 142660   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:04:59,140-Speed 3280.44 samples/sec   Loss 3.4671   LearningRate 0.0181   Epoch: 11   Global Step: 142670   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:05:02,279-Speed 3263.40 samples/sec   Loss 3.4126   LearningRate 0.0181   Epoch: 11   Global Step: 142680   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:05:05,412-Speed 3270.39 samples/sec   Loss 3.4115   LearningRate 0.0181   Epoch: 11   Global Step: 142690   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:05:08,516-Speed 3300.56 samples/sec   Loss 3.4553   LearningRate 0.0181   Epoch: 11   Global Step: 142700   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:05:11,575-Speed 3348.11 samples/sec   Loss 3.4514   LearningRate 0.0181   Epoch: 11   Global Step: 142710   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:05:14,703-Speed 3275.09 samples/sec   Loss 3.4558   LearningRate 0.0181   Epoch: 11   Global Step: 142720   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 14:05:17,814-Speed 3292.70 samples/sec   Loss 3.5082   LearningRate 0.0181   Epoch: 11   Global Step: 142730   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:05:20,879-Speed 3341.80 samples/sec   Loss 3.3899   LearningRate 0.0181   Epoch: 11   Global Step: 142740   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:05:23,952-Speed 3332.66 samples/sec   Loss 3.5035   LearningRate 0.0181   Epoch: 11   Global Step: 142750   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:05:27,113-Speed 3241.07 samples/sec   Loss 3.4264   LearningRate 0.0181   Epoch: 11   Global Step: 142760   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:05:30,187-Speed 3332.05 samples/sec   Loss 3.4435   LearningRate 0.0181   Epoch: 11   Global Step: 142770   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:05:33,310-Speed 3279.59 samples/sec   Loss 3.5174   LearningRate 0.0181   Epoch: 11   Global Step: 142780   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:05:36,467-Speed 3245.38 samples/sec   Loss 3.4739   LearningRate 0.0181   Epoch: 11   Global Step: 142790   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:05:39,588-Speed 3282.34 samples/sec   Loss 3.4570   LearningRate 0.0181   Epoch: 11   Global Step: 142800   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:05:42,763-Speed 3225.96 samples/sec   Loss 3.4070   LearningRate 0.0181   Epoch: 11   Global Step: 142810   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:05:45,840-Speed 3328.24 samples/sec   Loss 3.4903   LearningRate 0.0181   Epoch: 11   Global Step: 142820   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:05:48,931-Speed 3314.30 samples/sec   Loss 3.4987   LearningRate 0.0181   Epoch: 11   Global Step: 142830   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 14:05:52,103-Speed 3229.19 samples/sec   Loss 3.5235   LearningRate 0.0181   Epoch: 11   Global Step: 142840   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 14:05:55,186-Speed 3323.25 samples/sec   Loss 3.4300   LearningRate 0.0181   Epoch: 11   Global Step: 142850   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 14:05:58,246-Speed 3347.48 samples/sec   Loss 3.4352   LearningRate 0.0181   Epoch: 11   Global Step: 142860   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 14:06:01,336-Speed 3315.23 samples/sec   Loss 3.3854   LearningRate 0.0181   Epoch: 11   Global Step: 142870   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:06:04,495-Speed 3242.14 samples/sec   Loss 3.5052   LearningRate 0.0180   Epoch: 11   Global Step: 142880   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:06:07,619-Speed 3278.69 samples/sec   Loss 3.4618   LearningRate 0.0180   Epoch: 11   Global Step: 142890   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:06:10,689-Speed 3337.19 samples/sec   Loss 3.4476   LearningRate 0.0180   Epoch: 11   Global Step: 142900   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:06:13,771-Speed 3324.05 samples/sec   Loss 3.4994   LearningRate 0.0180   Epoch: 11   Global Step: 142910   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:06:16,831-Speed 3347.19 samples/sec   Loss 3.4775   LearningRate 0.0180   Epoch: 11   Global Step: 142920   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:06:19,890-Speed 3349.04 samples/sec   Loss 3.4393   LearningRate 0.0180   Epoch: 11   Global Step: 142930   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:06:22,938-Speed 3359.81 samples/sec   Loss 3.4634   LearningRate 0.0180   Epoch: 11   Global Step: 142940   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:06:26,024-Speed 3319.99 samples/sec   Loss 3.4547   LearningRate 0.0180   Epoch: 11   Global Step: 142950   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:06:29,076-Speed 3356.47 samples/sec   Loss 3.4581   LearningRate 0.0180   Epoch: 11   Global Step: 142960   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:06:32,142-Speed 3340.73 samples/sec   Loss 3.5025   LearningRate 0.0180   Epoch: 11   Global Step: 142970   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:06:35,211-Speed 3337.75 samples/sec   Loss 3.4183   LearningRate 0.0180   Epoch: 11   Global Step: 142980   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:06:38,289-Speed 3327.26 samples/sec   Loss 3.5031   LearningRate 0.0180   Epoch: 11   Global Step: 142990   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:06:41,402-Speed 3290.63 samples/sec   Loss 3.4420   LearningRate 0.0180   Epoch: 11   Global Step: 143000   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:06:44,554-Speed 3249.40 samples/sec   Loss 3.4361   LearningRate 0.0180   Epoch: 11   Global Step: 143010   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:06:47,754-Speed 3200.98 samples/sec   Loss 3.4825   LearningRate 0.0180   Epoch: 11   Global Step: 143020   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:06:50,887-Speed 3270.21 samples/sec   Loss 3.4884   LearningRate 0.0180   Epoch: 11   Global Step: 143030   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:06:54,049-Speed 3238.56 samples/sec   Loss 3.4312   LearningRate 0.0180   Epoch: 11   Global Step: 143040   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:06:57,157-Speed 3296.95 samples/sec   Loss 3.4655   LearningRate 0.0180   Epoch: 11   Global Step: 143050   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:07:00,248-Speed 3313.03 samples/sec   Loss 3.3880   LearningRate 0.0180   Epoch: 11   Global Step: 143060   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:07:03,323-Speed 3331.52 samples/sec   Loss 3.4489   LearningRate 0.0180   Epoch: 11   Global Step: 143070   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:07:06,434-Speed 3293.08 samples/sec   Loss 3.4935   LearningRate 0.0180   Epoch: 11   Global Step: 143080   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:07:09,489-Speed 3352.74 samples/sec   Loss 3.4103   LearningRate 0.0180   Epoch: 11   Global Step: 143090   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:07:12,591-Speed 3302.41 samples/sec   Loss 3.5001   LearningRate 0.0180   Epoch: 11   Global Step: 143100   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:07:15,713-Speed 3280.53 samples/sec   Loss 3.4101   LearningRate 0.0180   Epoch: 11   Global Step: 143110   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:07:18,835-Speed 3280.89 samples/sec   Loss 3.4304   LearningRate 0.0180   Epoch: 11   Global Step: 143120   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:07:21,900-Speed 3342.75 samples/sec   Loss 3.5107   LearningRate 0.0180   Epoch: 11   Global Step: 143130   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:07:24,974-Speed 3332.21 samples/sec   Loss 3.4440   LearningRate 0.0180   Epoch: 11   Global Step: 143140   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:07:28,198-Speed 3176.69 samples/sec   Loss 3.3969   LearningRate 0.0180   Epoch: 11   Global Step: 143150   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:07:31,284-Speed 3319.64 samples/sec   Loss 3.4587   LearningRate 0.0180   Epoch: 11   Global Step: 143160   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:07:34,449-Speed 3236.91 samples/sec   Loss 3.5175   LearningRate 0.0180   Epoch: 11   Global Step: 143170   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:07:37,526-Speed 3328.54 samples/sec   Loss 3.4211   LearningRate 0.0179   Epoch: 11   Global Step: 143180   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:07:40,577-Speed 3357.38 samples/sec   Loss 3.4049   LearningRate 0.0179   Epoch: 11   Global Step: 143190   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:07:43,630-Speed 3355.25 samples/sec   Loss 3.4213   LearningRate 0.0179   Epoch: 11   Global Step: 143200   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:07:46,684-Speed 3354.26 samples/sec   Loss 3.5014   LearningRate 0.0179   Epoch: 11   Global Step: 143210   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:07:49,828-Speed 3257.72 samples/sec   Loss 3.5074   LearningRate 0.0179   Epoch: 11   Global Step: 143220   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:07:52,963-Speed 3267.94 samples/sec   Loss 3.3323   LearningRate 0.0179   Epoch: 11   Global Step: 143230   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:07:56,024-Speed 3345.68 samples/sec   Loss 3.3646   LearningRate 0.0179   Epoch: 11   Global Step: 143240   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:07:59,105-Speed 3325.11 samples/sec   Loss 3.3622   LearningRate 0.0179   Epoch: 11   Global Step: 143250   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:08:02,169-Speed 3343.09 samples/sec   Loss 3.5385   LearningRate 0.0179   Epoch: 11   Global Step: 143260   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:08:05,257-Speed 3317.99 samples/sec   Loss 3.4318   LearningRate 0.0179   Epoch: 11   Global Step: 143270   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:08:08,352-Speed 3309.43 samples/sec   Loss 3.5472   LearningRate 0.0179   Epoch: 11   Global Step: 143280   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:08:11,452-Speed 3304.02 samples/sec   Loss 3.3749   LearningRate 0.0179   Epoch: 11   Global Step: 143290   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:08:14,537-Speed 3319.92 samples/sec   Loss 3.3797   LearningRate 0.0179   Epoch: 11   Global Step: 143300   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:08:17,593-Speed 3351.78 samples/sec   Loss 3.4852   LearningRate 0.0179   Epoch: 11   Global Step: 143310   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:08:20,653-Speed 3348.22 samples/sec   Loss 3.5390   LearningRate 0.0179   Epoch: 11   Global Step: 143320   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:08:23,734-Speed 3323.70 samples/sec   Loss 3.4989   LearningRate 0.0179   Epoch: 11   Global Step: 143330   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:08:26,910-Speed 3225.98 samples/sec   Loss 3.4271   LearningRate 0.0179   Epoch: 11   Global Step: 143340   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 14:08:29,966-Speed 3351.79 samples/sec   Loss 3.5140   LearningRate 0.0179   Epoch: 11   Global Step: 143350   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:08:33,049-Speed 3322.56 samples/sec   Loss 3.4713   LearningRate 0.0179   Epoch: 11   Global Step: 143360   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:08:36,181-Speed 3270.26 samples/sec   Loss 3.4569   LearningRate 0.0179   Epoch: 11   Global Step: 143370   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:08:39,333-Speed 3249.93 samples/sec   Loss 3.4578   LearningRate 0.0179   Epoch: 11   Global Step: 143380   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:08:42,487-Speed 3247.77 samples/sec   Loss 3.3535   LearningRate 0.0179   Epoch: 11   Global Step: 143390   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:08:45,537-Speed 3358.39 samples/sec   Loss 3.4335   LearningRate 0.0179   Epoch: 11   Global Step: 143400   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:08:48,613-Speed 3330.16 samples/sec   Loss 3.4297   LearningRate 0.0179   Epoch: 11   Global Step: 143410   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:08:51,711-Speed 3306.40 samples/sec   Loss 3.4135   LearningRate 0.0179   Epoch: 11   Global Step: 143420   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:08:54,825-Speed 3288.20 samples/sec   Loss 3.4493   LearningRate 0.0179   Epoch: 11   Global Step: 143430   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:08:57,916-Speed 3314.82 samples/sec   Loss 3.4363   LearningRate 0.0179   Epoch: 11   Global Step: 143440   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:09:00,966-Speed 3358.13 samples/sec   Loss 3.4273   LearningRate 0.0179   Epoch: 11   Global Step: 143450   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:09:04,086-Speed 3283.33 samples/sec   Loss 3.4579   LearningRate 0.0179   Epoch: 11   Global Step: 143460   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:09:07,193-Speed 3296.77 samples/sec   Loss 3.4714   LearningRate 0.0178   Epoch: 11   Global Step: 143470   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:09:10,277-Speed 3321.45 samples/sec   Loss 3.5203   LearningRate 0.0178   Epoch: 11   Global Step: 143480   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:09:13,382-Speed 3298.93 samples/sec   Loss 3.5463   LearningRate 0.0178   Epoch: 11   Global Step: 143490   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:09:16,467-Speed 3320.43 samples/sec   Loss 3.5075   LearningRate 0.0178   Epoch: 11   Global Step: 143500   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:09:19,548-Speed 3324.95 samples/sec   Loss 3.4199   LearningRate 0.0178   Epoch: 11   Global Step: 143510   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:09:22,627-Speed 3326.87 samples/sec   Loss 3.4578   LearningRate 0.0178   Epoch: 11   Global Step: 143520   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:09:25,764-Speed 3264.04 samples/sec   Loss 3.5137   LearningRate 0.0178   Epoch: 11   Global Step: 143530   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:09:28,900-Speed 3267.41 samples/sec   Loss 3.4710   LearningRate 0.0178   Epoch: 11   Global Step: 143540   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:09:31,975-Speed 3330.94 samples/sec   Loss 3.4549   LearningRate 0.0178   Epoch: 11   Global Step: 143550   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:09:35,038-Speed 3343.60 samples/sec   Loss 3.4528   LearningRate 0.0178   Epoch: 11   Global Step: 143560   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:09:38,223-Speed 3216.57 samples/sec   Loss 3.4422   LearningRate 0.0178   Epoch: 11   Global Step: 143570   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:09:41,290-Speed 3340.18 samples/sec   Loss 3.5007   LearningRate 0.0178   Epoch: 11   Global Step: 143580   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:09:44,391-Speed 3302.69 samples/sec   Loss 3.4467   LearningRate 0.0178   Epoch: 11   Global Step: 143590   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:09:47,531-Speed 3262.30 samples/sec   Loss 3.4512   LearningRate 0.0178   Epoch: 11   Global Step: 143600   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:09:50,674-Speed 3258.76 samples/sec   Loss 3.4794   LearningRate 0.0178   Epoch: 11   Global Step: 143610   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:09:53,767-Speed 3312.57 samples/sec   Loss 3.4419   LearningRate 0.0178   Epoch: 11   Global Step: 143620   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:09:56,826-Speed 3348.67 samples/sec   Loss 3.4234   LearningRate 0.0178   Epoch: 11   Global Step: 143630   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:09:59,966-Speed 3262.38 samples/sec   Loss 3.4987   LearningRate 0.0178   Epoch: 11   Global Step: 143640   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:10:03,052-Speed 3318.63 samples/sec   Loss 3.4258   LearningRate 0.0178   Epoch: 11   Global Step: 143650   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:10:06,168-Speed 3287.75 samples/sec   Loss 3.5247   LearningRate 0.0178   Epoch: 11   Global Step: 143660   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:10:09,209-Speed 3367.92 samples/sec   Loss 3.4005   LearningRate 0.0178   Epoch: 11   Global Step: 143670   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:10:12,361-Speed 3249.60 samples/sec   Loss 3.4498   LearningRate 0.0178   Epoch: 11   Global Step: 143680   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:10:15,535-Speed 3227.66 samples/sec   Loss 3.5470   LearningRate 0.0178   Epoch: 11   Global Step: 143690   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:10:18,689-Speed 3247.64 samples/sec   Loss 3.4504   LearningRate 0.0178   Epoch: 11   Global Step: 143700   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:10:21,790-Speed 3303.91 samples/sec   Loss 3.4787   LearningRate 0.0178   Epoch: 11   Global Step: 143710   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:10:24,863-Speed 3332.77 samples/sec   Loss 3.4053   LearningRate 0.0178   Epoch: 11   Global Step: 143720   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:10:27,955-Speed 3312.36 samples/sec   Loss 3.4838   LearningRate 0.0178   Epoch: 11   Global Step: 143730   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:10:31,073-Speed 3285.64 samples/sec   Loss 3.4837   LearningRate 0.0178   Epoch: 11   Global Step: 143740   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:10:34,141-Speed 3339.07 samples/sec   Loss 3.4922   LearningRate 0.0178   Epoch: 11   Global Step: 143750   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:10:37,272-Speed 3271.64 samples/sec   Loss 3.5074   LearningRate 0.0177   Epoch: 11   Global Step: 143760   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:10:40,387-Speed 3288.19 samples/sec   Loss 3.5001   LearningRate 0.0177   Epoch: 11   Global Step: 143770   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:10:43,486-Speed 3304.82 samples/sec   Loss 3.4908   LearningRate 0.0177   Epoch: 11   Global Step: 143780   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:10:46,612-Speed 3277.55 samples/sec   Loss 3.5066   LearningRate 0.0177   Epoch: 11   Global Step: 143790   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:10:49,714-Speed 3301.61 samples/sec   Loss 3.3880   LearningRate 0.0177   Epoch: 11   Global Step: 143800   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:10:52,806-Speed 3312.82 samples/sec   Loss 3.4818   LearningRate 0.0177   Epoch: 11   Global Step: 143810   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:10:55,901-Speed 3309.86 samples/sec   Loss 3.5066   LearningRate 0.0177   Epoch: 11   Global Step: 143820   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:10:58,969-Speed 3338.91 samples/sec   Loss 3.4598   LearningRate 0.0177   Epoch: 11   Global Step: 143830   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:11:02,029-Speed 3347.52 samples/sec   Loss 3.4375   LearningRate 0.0177   Epoch: 11   Global Step: 143840   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:11:05,158-Speed 3273.36 samples/sec   Loss 3.4793   LearningRate 0.0177   Epoch: 11   Global Step: 143850   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:11:08,287-Speed 3274.17 samples/sec   Loss 3.5357   LearningRate 0.0177   Epoch: 11   Global Step: 143860   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:11:11,347-Speed 3347.28 samples/sec   Loss 3.4692   LearningRate 0.0177   Epoch: 11   Global Step: 143870   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:11:14,416-Speed 3336.76 samples/sec   Loss 3.4104   LearningRate 0.0177   Epoch: 11   Global Step: 143880   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:11:17,472-Speed 3352.41 samples/sec   Loss 3.5173   LearningRate 0.0177   Epoch: 11   Global Step: 143890   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:11:20,542-Speed 3336.76 samples/sec   Loss 3.4711   LearningRate 0.0177   Epoch: 11   Global Step: 143900   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:11:23,643-Speed 3302.46 samples/sec   Loss 3.4396   LearningRate 0.0177   Epoch: 11   Global Step: 143910   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:11:26,850-Speed 3194.00 samples/sec   Loss 3.4712   LearningRate 0.0177   Epoch: 11   Global Step: 143920   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:11:29,909-Speed 3348.74 samples/sec   Loss 3.5173   LearningRate 0.0177   Epoch: 11   Global Step: 143930   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:11:32,968-Speed 3349.42 samples/sec   Loss 3.4461   LearningRate 0.0177   Epoch: 11   Global Step: 143940   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:11:36,072-Speed 3299.89 samples/sec   Loss 3.5535   LearningRate 0.0177   Epoch: 11   Global Step: 143950   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:11:39,123-Speed 3357.25 samples/sec   Loss 3.4125   LearningRate 0.0177   Epoch: 11   Global Step: 143960   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:11:42,259-Speed 3266.16 samples/sec   Loss 3.3872   LearningRate 0.0177   Epoch: 11   Global Step: 143970   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:11:45,315-Speed 3352.38 samples/sec   Loss 3.4914   LearningRate 0.0177   Epoch: 11   Global Step: 143980   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:11:48,404-Speed 3315.44 samples/sec   Loss 3.5379   LearningRate 0.0177   Epoch: 11   Global Step: 143990   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:11:51,606-Speed 3199.31 samples/sec   Loss 3.4748   LearningRate 0.0177   Epoch: 11   Global Step: 144000   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:11:54,696-Speed 3315.50 samples/sec   Loss 3.4771   LearningRate 0.0177   Epoch: 11   Global Step: 144010   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:11:57,762-Speed 3340.21 samples/sec   Loss 3.4812   LearningRate 0.0177   Epoch: 11   Global Step: 144020   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:12:00,835-Speed 3333.45 samples/sec   Loss 3.4570   LearningRate 0.0177   Epoch: 11   Global Step: 144030   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:12:03,906-Speed 3335.91 samples/sec   Loss 3.5094   LearningRate 0.0177   Epoch: 11   Global Step: 144040   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:12:06,989-Speed 3323.01 samples/sec   Loss 3.6417   LearningRate 0.0177   Epoch: 11   Global Step: 144050   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:12:10,043-Speed 3353.49 samples/sec   Loss 3.5270   LearningRate 0.0176   Epoch: 11   Global Step: 144060   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:12:13,131-Speed 3317.15 samples/sec   Loss 3.5410   LearningRate 0.0176   Epoch: 11   Global Step: 144070   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:12:16,224-Speed 3311.55 samples/sec   Loss 3.4562   LearningRate 0.0176   Epoch: 11   Global Step: 144080   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:12:19,337-Speed 3291.02 samples/sec   Loss 3.5210   LearningRate 0.0176   Epoch: 11   Global Step: 144090   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:12:22,412-Speed 3331.22 samples/sec   Loss 3.5244   LearningRate 0.0176   Epoch: 11   Global Step: 144100   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:12:25,533-Speed 3282.33 samples/sec   Loss 3.4875   LearningRate 0.0176   Epoch: 11   Global Step: 144110   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:12:28,598-Speed 3341.96 samples/sec   Loss 3.5589   LearningRate 0.0176   Epoch: 11   Global Step: 144120   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:12:31,675-Speed 3328.81 samples/sec   Loss 3.4575   LearningRate 0.0176   Epoch: 11   Global Step: 144130   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 14:12:34,716-Speed 3368.54 samples/sec   Loss 3.4641   LearningRate 0.0176   Epoch: 11   Global Step: 144140   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:12:37,848-Speed 3271.02 samples/sec   Loss 3.5017   LearningRate 0.0176   Epoch: 11   Global Step: 144150   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:12:41,071-Speed 3178.01 samples/sec   Loss 3.4352   LearningRate 0.0176   Epoch: 11   Global Step: 144160   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:12:44,182-Speed 3292.66 samples/sec   Loss 3.4580   LearningRate 0.0176   Epoch: 11   Global Step: 144170   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:12:47,256-Speed 3332.36 samples/sec   Loss 3.4950   LearningRate 0.0176   Epoch: 11   Global Step: 144180   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:12:50,330-Speed 3332.07 samples/sec   Loss 3.4321   LearningRate 0.0176   Epoch: 11   Global Step: 144190   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:12:53,470-Speed 3262.49 samples/sec   Loss 3.5473   LearningRate 0.0176   Epoch: 11   Global Step: 144200   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:12:56,529-Speed 3348.21 samples/sec   Loss 3.4690   LearningRate 0.0176   Epoch: 11   Global Step: 144210   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:12:59,631-Speed 3302.38 samples/sec   Loss 3.4046   LearningRate 0.0176   Epoch: 11   Global Step: 144220   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:13:02,788-Speed 3244.80 samples/sec   Loss 3.4992   LearningRate 0.0176   Epoch: 11   Global Step: 144230   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:13:05,883-Speed 3309.02 samples/sec   Loss 3.4972   LearningRate 0.0176   Epoch: 11   Global Step: 144240   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:13:08,978-Speed 3309.81 samples/sec   Loss 3.3693   LearningRate 0.0176   Epoch: 11   Global Step: 144250   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:13:12,188-Speed 3191.09 samples/sec   Loss 3.4357   LearningRate 0.0176   Epoch: 11   Global Step: 144260   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:13:15,319-Speed 3272.17 samples/sec   Loss 3.4234   LearningRate 0.0176   Epoch: 11   Global Step: 144270   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:13:18,395-Speed 3329.33 samples/sec   Loss 3.4992   LearningRate 0.0176   Epoch: 11   Global Step: 144280   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:13:21,452-Speed 3351.39 samples/sec   Loss 3.4662   LearningRate 0.0176   Epoch: 11   Global Step: 144290   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:13:24,566-Speed 3289.15 samples/sec   Loss 3.4313   LearningRate 0.0176   Epoch: 11   Global Step: 144300   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:13:27,678-Speed 3292.09 samples/sec   Loss 3.5409   LearningRate 0.0176   Epoch: 11   Global Step: 144310   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:13:30,841-Speed 3238.63 samples/sec   Loss 3.4527   LearningRate 0.0176   Epoch: 11   Global Step: 144320   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:13:33,910-Speed 3337.54 samples/sec   Loss 3.4776   LearningRate 0.0176   Epoch: 11   Global Step: 144330   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:13:37,056-Speed 3256.20 samples/sec   Loss 3.4112   LearningRate 0.0176   Epoch: 11   Global Step: 144340   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:13:40,198-Speed 3259.58 samples/sec   Loss 3.5649   LearningRate 0.0176   Epoch: 11   Global Step: 144350   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:13:43,356-Speed 3244.24 samples/sec   Loss 3.4942   LearningRate 0.0175   Epoch: 11   Global Step: 144360   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:13:46,409-Speed 3354.45 samples/sec   Loss 3.4650   LearningRate 0.0175   Epoch: 11   Global Step: 144370   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:13:49,467-Speed 3350.29 samples/sec   Loss 3.4012   LearningRate 0.0175   Epoch: 11   Global Step: 144380   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:13:52,553-Speed 3319.51 samples/sec   Loss 3.4591   LearningRate 0.0175   Epoch: 11   Global Step: 144390   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:13:55,631-Speed 3328.02 samples/sec   Loss 3.5062   LearningRate 0.0175   Epoch: 11   Global Step: 144400   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:13:58,703-Speed 3333.55 samples/sec   Loss 3.4464   LearningRate 0.0175   Epoch: 11   Global Step: 144410   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:14:01,799-Speed 3309.16 samples/sec   Loss 3.5826   LearningRate 0.0175   Epoch: 11   Global Step: 144420   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:14:04,886-Speed 3318.71 samples/sec   Loss 3.6252   LearningRate 0.0175   Epoch: 11   Global Step: 144430   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:14:08,015-Speed 3273.66 samples/sec   Loss 3.5049   LearningRate 0.0175   Epoch: 11   Global Step: 144440   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:14:11,074-Speed 3348.74 samples/sec   Loss 3.5815   LearningRate 0.0175   Epoch: 11   Global Step: 144450   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:14:14,159-Speed 3320.14 samples/sec   Loss 3.6119   LearningRate 0.0175   Epoch: 11   Global Step: 144460   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:14:17,323-Speed 3237.73 samples/sec   Loss 3.5000   LearningRate 0.0175   Epoch: 11   Global Step: 144470   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:14:20,443-Speed 3282.83 samples/sec   Loss 3.5409   LearningRate 0.0175   Epoch: 11   Global Step: 144480   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:14:23,611-Speed 3233.88 samples/sec   Loss 3.4509   LearningRate 0.0175   Epoch: 11   Global Step: 144490   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:14:26,795-Speed 3216.88 samples/sec   Loss 3.4963   LearningRate 0.0175   Epoch: 11   Global Step: 144500   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:14:29,866-Speed 3334.97 samples/sec   Loss 3.5666   LearningRate 0.0175   Epoch: 11   Global Step: 144510   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:14:32,936-Speed 3337.03 samples/sec   Loss 3.5257   LearningRate 0.0175   Epoch: 11   Global Step: 144520   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:14:36,065-Speed 3273.62 samples/sec   Loss 3.5295   LearningRate 0.0175   Epoch: 11   Global Step: 144530   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:14:39,124-Speed 3348.62 samples/sec   Loss 3.5363   LearningRate 0.0175   Epoch: 11   Global Step: 144540   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:14:42,276-Speed 3250.18 samples/sec   Loss 3.5464   LearningRate 0.0175   Epoch: 11   Global Step: 144550   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:14:45,340-Speed 3343.22 samples/sec   Loss 3.5231   LearningRate 0.0175   Epoch: 11   Global Step: 144560   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:14:48,496-Speed 3244.71 samples/sec   Loss 3.5556   LearningRate 0.0175   Epoch: 11   Global Step: 144570   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:14:51,664-Speed 3234.04 samples/sec   Loss 3.4248   LearningRate 0.0175   Epoch: 11   Global Step: 144580   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:14:54,799-Speed 3267.09 samples/sec   Loss 3.5185   LearningRate 0.0175   Epoch: 11   Global Step: 144590   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:14:57,858-Speed 3349.00 samples/sec   Loss 3.4765   LearningRate 0.0175   Epoch: 11   Global Step: 144600   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:15:00,909-Speed 3357.43 samples/sec   Loss 3.4884   LearningRate 0.0175   Epoch: 11   Global Step: 144610   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:15:03,984-Speed 3330.36 samples/sec   Loss 3.5467   LearningRate 0.0175   Epoch: 11   Global Step: 144620   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:15:07,060-Speed 3330.25 samples/sec   Loss 3.4460   LearningRate 0.0175   Epoch: 11   Global Step: 144630   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:15:10,163-Speed 3301.20 samples/sec   Loss 3.5051   LearningRate 0.0175   Epoch: 11   Global Step: 144640   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:15:13,274-Speed 3292.38 samples/sec   Loss 3.4999   LearningRate 0.0174   Epoch: 11   Global Step: 144650   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:15:16,417-Speed 3259.91 samples/sec   Loss 3.4928   LearningRate 0.0174   Epoch: 11   Global Step: 144660   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:15:19,540-Speed 3279.92 samples/sec   Loss 3.4862   LearningRate 0.0174   Epoch: 11   Global Step: 144670   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:15:22,610-Speed 3336.53 samples/sec   Loss 3.4411   LearningRate 0.0174   Epoch: 11   Global Step: 144680   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:15:25,737-Speed 3275.33 samples/sec   Loss 3.5118   LearningRate 0.0174   Epoch: 11   Global Step: 144690   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 14:15:28,810-Speed 3333.56 samples/sec   Loss 3.4867   LearningRate 0.0174   Epoch: 11   Global Step: 144700   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 14:15:31,866-Speed 3351.73 samples/sec   Loss 3.4933   LearningRate 0.0174   Epoch: 11   Global Step: 144710   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 14:15:35,014-Speed 3253.61 samples/sec   Loss 3.4995   LearningRate 0.0174   Epoch: 11   Global Step: 144720   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:15:38,166-Speed 3250.37 samples/sec   Loss 3.4625   LearningRate 0.0174   Epoch: 11   Global Step: 144730   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:15:41,257-Speed 3313.68 samples/sec   Loss 3.5980   LearningRate 0.0174   Epoch: 11   Global Step: 144740   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:15:44,388-Speed 3271.70 samples/sec   Loss 3.4495   LearningRate 0.0174   Epoch: 11   Global Step: 144750   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:15:47,538-Speed 3251.94 samples/sec   Loss 3.4899   LearningRate 0.0174   Epoch: 11   Global Step: 144760   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:15:50,643-Speed 3299.48 samples/sec   Loss 3.4252   LearningRate 0.0174   Epoch: 11   Global Step: 144770   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:15:53,715-Speed 3333.26 samples/sec   Loss 3.4640   LearningRate 0.0174   Epoch: 11   Global Step: 144780   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:15:56,865-Speed 3253.06 samples/sec   Loss 3.4870   LearningRate 0.0174   Epoch: 11   Global Step: 144790   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:15:59,940-Speed 3330.25 samples/sec   Loss 3.5096   LearningRate 0.0174   Epoch: 11   Global Step: 144800   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:16:03,167-Speed 3175.26 samples/sec   Loss 3.5442   LearningRate 0.0174   Epoch: 11   Global Step: 144810   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:16:06,238-Speed 3335.31 samples/sec   Loss 3.5264   LearningRate 0.0174   Epoch: 11   Global Step: 144820   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:16:09,311-Speed 3333.01 samples/sec   Loss 3.4718   LearningRate 0.0174   Epoch: 11   Global Step: 144830   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:16:12,398-Speed 3318.62 samples/sec   Loss 3.5521   LearningRate 0.0174   Epoch: 11   Global Step: 144840   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:16:15,465-Speed 3339.52 samples/sec   Loss 3.4907   LearningRate 0.0174   Epoch: 11   Global Step: 144850   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:16:18,585-Speed 3282.59 samples/sec   Loss 3.4090   LearningRate 0.0174   Epoch: 11   Global Step: 144860   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:16:21,642-Speed 3351.07 samples/sec   Loss 3.5299   LearningRate 0.0174   Epoch: 11   Global Step: 144870   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:16:24,776-Speed 3268.80 samples/sec   Loss 3.5433   LearningRate 0.0174   Epoch: 11   Global Step: 144880   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:16:27,933-Speed 3243.96 samples/sec   Loss 3.5624   LearningRate 0.0174   Epoch: 11   Global Step: 144890   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:16:31,053-Speed 3283.71 samples/sec   Loss 3.4852   LearningRate 0.0174   Epoch: 11   Global Step: 144900   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:16:34,133-Speed 3325.43 samples/sec   Loss 3.4560   LearningRate 0.0174   Epoch: 11   Global Step: 144910   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:16:37,236-Speed 3301.02 samples/sec   Loss 3.4783   LearningRate 0.0174   Epoch: 11   Global Step: 144920   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:16:40,353-Speed 3286.84 samples/sec   Loss 3.4694   LearningRate 0.0174   Epoch: 11   Global Step: 144930   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:16:43,477-Speed 3278.03 samples/sec   Loss 3.4754   LearningRate 0.0174   Epoch: 11   Global Step: 144940   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:16:46,562-Speed 3321.31 samples/sec   Loss 3.4901   LearningRate 0.0173   Epoch: 11   Global Step: 144950   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:16:49,673-Speed 3292.56 samples/sec   Loss 3.4867   LearningRate 0.0173   Epoch: 11   Global Step: 144960   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:16:52,790-Speed 3285.06 samples/sec   Loss 3.5273   LearningRate 0.0173   Epoch: 11   Global Step: 144970   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:16:55,901-Speed 3293.40 samples/sec   Loss 3.4415   LearningRate 0.0173   Epoch: 11   Global Step: 144980   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:16:58,986-Speed 3320.33 samples/sec   Loss 3.4795   LearningRate 0.0173   Epoch: 11   Global Step: 144990   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:17:02,172-Speed 3214.91 samples/sec   Loss 3.5497   LearningRate 0.0173   Epoch: 11   Global Step: 145000   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:17:05,294-Speed 3281.75 samples/sec   Loss 3.4938   LearningRate 0.0173   Epoch: 11   Global Step: 145010   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:17:08,380-Speed 3319.47 samples/sec   Loss 3.5146   LearningRate 0.0173   Epoch: 11   Global Step: 145020   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:17:11,473-Speed 3311.77 samples/sec   Loss 3.5200   LearningRate 0.0173   Epoch: 11   Global Step: 145030   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:17:14,594-Speed 3281.85 samples/sec   Loss 3.5009   LearningRate 0.0173   Epoch: 11   Global Step: 145040   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:17:17,785-Speed 3210.40 samples/sec   Loss 3.5196   LearningRate 0.0173   Epoch: 11   Global Step: 145050   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:17:20,877-Speed 3312.47 samples/sec   Loss 3.4813   LearningRate 0.0173   Epoch: 11   Global Step: 145060   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:17:23,994-Speed 3286.50 samples/sec   Loss 3.4533   LearningRate 0.0173   Epoch: 11   Global Step: 145070   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:17:27,111-Speed 3286.44 samples/sec   Loss 3.6106   LearningRate 0.0173   Epoch: 11   Global Step: 145080   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:17:30,249-Speed 3263.76 samples/sec   Loss 3.4790   LearningRate 0.0173   Epoch: 11   Global Step: 145090   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:17:33,350-Speed 3302.92 samples/sec   Loss 3.5029   LearningRate 0.0173   Epoch: 11   Global Step: 145100   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:17:36,481-Speed 3271.74 samples/sec   Loss 3.5358   LearningRate 0.0173   Epoch: 11   Global Step: 145110   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:17:39,589-Speed 3295.78 samples/sec   Loss 3.5227   LearningRate 0.0173   Epoch: 11   Global Step: 145120   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:17:42,683-Speed 3311.41 samples/sec   Loss 3.5324   LearningRate 0.0173   Epoch: 11   Global Step: 145130   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:17:45,767-Speed 3321.11 samples/sec   Loss 3.5919   LearningRate 0.0173   Epoch: 11   Global Step: 145140   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:17:48,833-Speed 3340.60 samples/sec   Loss 3.5468   LearningRate 0.0173   Epoch: 11   Global Step: 145150   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:17:51,919-Speed 3318.89 samples/sec   Loss 3.4523   LearningRate 0.0173   Epoch: 11   Global Step: 145160   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:17:55,059-Speed 3262.63 samples/sec   Loss 3.5687   LearningRate 0.0173   Epoch: 11   Global Step: 145170   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:17:58,126-Speed 3340.54 samples/sec   Loss 3.4714   LearningRate 0.0173   Epoch: 11   Global Step: 145180   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:18:01,223-Speed 3306.57 samples/sec   Loss 3.5805   LearningRate 0.0173   Epoch: 11   Global Step: 145190   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:18:04,343-Speed 3284.20 samples/sec   Loss 3.4282   LearningRate 0.0173   Epoch: 11   Global Step: 145200   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:18:07,439-Speed 3308.69 samples/sec   Loss 3.4683   LearningRate 0.0173   Epoch: 11   Global Step: 145210   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:18:10,550-Speed 3292.28 samples/sec   Loss 3.4669   LearningRate 0.0173   Epoch: 11   Global Step: 145220   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:18:13,650-Speed 3303.73 samples/sec   Loss 3.5368   LearningRate 0.0173   Epoch: 11   Global Step: 145230   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:18:16,764-Speed 3289.78 samples/sec   Loss 3.5076   LearningRate 0.0173   Epoch: 11   Global Step: 145240   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:18:19,883-Speed 3284.33 samples/sec   Loss 3.5185   LearningRate 0.0172   Epoch: 11   Global Step: 145250   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:18:23,014-Speed 3270.68 samples/sec   Loss 3.4180   LearningRate 0.0172   Epoch: 11   Global Step: 145260   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:18:26,118-Speed 3300.03 samples/sec   Loss 3.5346   LearningRate 0.0172   Epoch: 11   Global Step: 145270   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:18:29,267-Speed 3253.12 samples/sec   Loss 3.4705   LearningRate 0.0172   Epoch: 11   Global Step: 145280   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:18:32,339-Speed 3335.00 samples/sec   Loss 3.5537   LearningRate 0.0172   Epoch: 11   Global Step: 145290   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:18:35,418-Speed 3326.82 samples/sec   Loss 3.5318   LearningRate 0.0172   Epoch: 11   Global Step: 145300   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:18:38,526-Speed 3295.75 samples/sec   Loss 3.4495   LearningRate 0.0172   Epoch: 11   Global Step: 145310   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:18:41,621-Speed 3309.69 samples/sec   Loss 3.4073   LearningRate 0.0172   Epoch: 11   Global Step: 145320   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:18:44,714-Speed 3311.50 samples/sec   Loss 3.4843   LearningRate 0.0172   Epoch: 11   Global Step: 145330   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:18:47,799-Speed 3319.78 samples/sec   Loss 3.4639   LearningRate 0.0172   Epoch: 11   Global Step: 145340   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:18:50,895-Speed 3309.18 samples/sec   Loss 3.4624   LearningRate 0.0172   Epoch: 11   Global Step: 145350   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:18:53,993-Speed 3306.52 samples/sec   Loss 3.5628   LearningRate 0.0172   Epoch: 11   Global Step: 145360   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:18:57,062-Speed 3337.83 samples/sec   Loss 3.5364   LearningRate 0.0172   Epoch: 11   Global Step: 145370   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:19:00,121-Speed 3348.39 samples/sec   Loss 3.4832   LearningRate 0.0172   Epoch: 11   Global Step: 145380   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:19:03,199-Speed 3326.83 samples/sec   Loss 3.4062   LearningRate 0.0172   Epoch: 11   Global Step: 145390   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:19:06,301-Speed 3302.28 samples/sec   Loss 3.5309   LearningRate 0.0172   Epoch: 11   Global Step: 145400   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:19:09,393-Speed 3313.34 samples/sec   Loss 3.4808   LearningRate 0.0172   Epoch: 11   Global Step: 145410   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:19:12,536-Speed 3259.37 samples/sec   Loss 3.4748   LearningRate 0.0172   Epoch: 11   Global Step: 145420   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:19:15,636-Speed 3303.84 samples/sec   Loss 3.5670   LearningRate 0.0172   Epoch: 11   Global Step: 145430   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:19:18,730-Speed 3311.21 samples/sec   Loss 3.4565   LearningRate 0.0172   Epoch: 11   Global Step: 145440   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:19:21,786-Speed 3351.85 samples/sec   Loss 3.4981   LearningRate 0.0172   Epoch: 11   Global Step: 145450   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:19:24,956-Speed 3231.48 samples/sec   Loss 3.5093   LearningRate 0.0172   Epoch: 11   Global Step: 145460   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:19:28,058-Speed 3302.52 samples/sec   Loss 3.4671   LearningRate 0.0172   Epoch: 11   Global Step: 145470   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:19:31,131-Speed 3332.90 samples/sec   Loss 3.5766   LearningRate 0.0172   Epoch: 11   Global Step: 145480   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:19:34,189-Speed 3350.07 samples/sec   Loss 3.5325   LearningRate 0.0172   Epoch: 11   Global Step: 145490   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:19:37,281-Speed 3312.80 samples/sec   Loss 3.5883   LearningRate 0.0172   Epoch: 11   Global Step: 145500   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:19:40,489-Speed 3193.33 samples/sec   Loss 3.5632   LearningRate 0.0172   Epoch: 11   Global Step: 145510   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:19:43,618-Speed 3273.65 samples/sec   Loss 3.5034   LearningRate 0.0172   Epoch: 11   Global Step: 145520   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:19:46,665-Speed 3361.28 samples/sec   Loss 3.5474   LearningRate 0.0172   Epoch: 11   Global Step: 145530   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:19:49,770-Speed 3298.99 samples/sec   Loss 3.5011   LearningRate 0.0172   Epoch: 11   Global Step: 145540   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:19:52,911-Speed 3261.56 samples/sec   Loss 3.4741   LearningRate 0.0171   Epoch: 11   Global Step: 145550   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:19:55,967-Speed 3351.95 samples/sec   Loss 3.5442   LearningRate 0.0171   Epoch: 11   Global Step: 145560   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:19:59,062-Speed 3309.02 samples/sec   Loss 3.5392   LearningRate 0.0171   Epoch: 11   Global Step: 145570   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:20:02,159-Speed 3307.61 samples/sec   Loss 3.5495   LearningRate 0.0171   Epoch: 11   Global Step: 145580   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:20:05,223-Speed 3344.35 samples/sec   Loss 3.5283   LearningRate 0.0171   Epoch: 11   Global Step: 145590   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:20:08,323-Speed 3303.60 samples/sec   Loss 3.5344   LearningRate 0.0171   Epoch: 11   Global Step: 145600   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:20:11,441-Speed 3285.58 samples/sec   Loss 3.5911   LearningRate 0.0171   Epoch: 11   Global Step: 145610   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:20:14,540-Speed 3304.95 samples/sec   Loss 3.5004   LearningRate 0.0171   Epoch: 11   Global Step: 145620   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:20:17,612-Speed 3334.78 samples/sec   Loss 3.5359   LearningRate 0.0171   Epoch: 11   Global Step: 145630   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:20:21,408-Speed 2698.03 samples/sec   Loss 3.4735   LearningRate 0.0171   Epoch: 11   Global Step: 145640   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:20:24,594-Speed 3214.69 samples/sec   Loss 3.4934   LearningRate 0.0171   Epoch: 11   Global Step: 145650   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:20:27,713-Speed 3284.27 samples/sec   Loss 3.4796   LearningRate 0.0171   Epoch: 11   Global Step: 145660   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:20:30,819-Speed 3297.86 samples/sec   Loss 3.3514   LearningRate 0.0171   Epoch: 11   Global Step: 145670   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:20:33,921-Speed 3302.64 samples/sec   Loss 3.5703   LearningRate 0.0171   Epoch: 11   Global Step: 145680   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:20:37,033-Speed 3291.11 samples/sec   Loss 3.4111   LearningRate 0.0171   Epoch: 11   Global Step: 145690   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:20:40,212-Speed 3222.53 samples/sec   Loss 3.5077   LearningRate 0.0171   Epoch: 11   Global Step: 145700   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:20:43,327-Speed 3288.27 samples/sec   Loss 3.4634   LearningRate 0.0171   Epoch: 11   Global Step: 145710   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:20:46,432-Speed 3299.35 samples/sec   Loss 3.5351   LearningRate 0.0171   Epoch: 11   Global Step: 145720   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:20:49,487-Speed 3352.51 samples/sec   Loss 3.5027   LearningRate 0.0171   Epoch: 11   Global Step: 145730   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:20:52,614-Speed 3276.21 samples/sec   Loss 3.4955   LearningRate 0.0171   Epoch: 11   Global Step: 145740   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:20:55,754-Speed 3262.49 samples/sec   Loss 3.5495   LearningRate 0.0171   Epoch: 11   Global Step: 145750   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:20:58,852-Speed 3305.98 samples/sec   Loss 3.5434   LearningRate 0.0171   Epoch: 11   Global Step: 145760   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:21:01,979-Speed 3275.52 samples/sec   Loss 3.5556   LearningRate 0.0171   Epoch: 11   Global Step: 145770   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:21:05,116-Speed 3265.18 samples/sec   Loss 3.4668   LearningRate 0.0171   Epoch: 11   Global Step: 145780   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:21:08,219-Speed 3301.69 samples/sec   Loss 3.4544   LearningRate 0.0171   Epoch: 11   Global Step: 145790   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:21:11,289-Speed 3336.91 samples/sec   Loss 3.5836   LearningRate 0.0171   Epoch: 11   Global Step: 145800   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:21:14,381-Speed 3311.67 samples/sec   Loss 3.5510   LearningRate 0.0171   Epoch: 11   Global Step: 145810   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:21:17,474-Speed 3312.09 samples/sec   Loss 3.5310   LearningRate 0.0171   Epoch: 11   Global Step: 145820   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:21:20,580-Speed 3298.05 samples/sec   Loss 3.5786   LearningRate 0.0171   Epoch: 11   Global Step: 145830   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:21:23,647-Speed 3340.38 samples/sec   Loss 3.4772   LearningRate 0.0171   Epoch: 11   Global Step: 145840   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:21:26,857-Speed 3190.76 samples/sec   Loss 3.5346   LearningRate 0.0170   Epoch: 11   Global Step: 145850   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:21:29,939-Speed 3323.96 samples/sec   Loss 3.5030   LearningRate 0.0170   Epoch: 11   Global Step: 145860   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:21:33,135-Speed 3204.67 samples/sec   Loss 3.4622   LearningRate 0.0170   Epoch: 11   Global Step: 145870   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:21:36,266-Speed 3272.07 samples/sec   Loss 3.4781   LearningRate 0.0170   Epoch: 11   Global Step: 145880   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:21:39,355-Speed 3315.65 samples/sec   Loss 3.4824   LearningRate 0.0170   Epoch: 11   Global Step: 145890   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:21:42,501-Speed 3255.72 samples/sec   Loss 3.5003   LearningRate 0.0170   Epoch: 11   Global Step: 145900   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:21:45,589-Speed 3317.66 samples/sec   Loss 3.5027   LearningRate 0.0170   Epoch: 11   Global Step: 145910   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:21:48,722-Speed 3268.71 samples/sec   Loss 3.4968   LearningRate 0.0170   Epoch: 11   Global Step: 145920   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:21:51,818-Speed 3308.73 samples/sec   Loss 3.4762   LearningRate 0.0170   Epoch: 11   Global Step: 145930   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:21:54,987-Speed 3232.69 samples/sec   Loss 3.5610   LearningRate 0.0170   Epoch: 11   Global Step: 145940   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:21:58,052-Speed 3342.68 samples/sec   Loss 3.5755   LearningRate 0.0170   Epoch: 11   Global Step: 145950   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:22:01,177-Speed 3277.51 samples/sec   Loss 3.4914   LearningRate 0.0170   Epoch: 11   Global Step: 145960   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:22:04,339-Speed 3239.66 samples/sec   Loss 3.5309   LearningRate 0.0170   Epoch: 11   Global Step: 145970   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:22:07,470-Speed 3271.56 samples/sec   Loss 3.4701   LearningRate 0.0170   Epoch: 11   Global Step: 145980   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:22:10,579-Speed 3293.87 samples/sec   Loss 3.4996   LearningRate 0.0170   Epoch: 11   Global Step: 145990   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:22:13,716-Speed 3265.18 samples/sec   Loss 3.5189   LearningRate 0.0170   Epoch: 11   Global Step: 146000   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:22:16,834-Speed 3285.60 samples/sec   Loss 3.4853   LearningRate 0.0170   Epoch: 11   Global Step: 146010   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:22:19,954-Speed 3283.25 samples/sec   Loss 3.4508   LearningRate 0.0170   Epoch: 11   Global Step: 146020   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:22:23,065-Speed 3292.33 samples/sec   Loss 3.5791   LearningRate 0.0170   Epoch: 11   Global Step: 146030   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:22:26,162-Speed 3307.76 samples/sec   Loss 3.4951   LearningRate 0.0170   Epoch: 11   Global Step: 146040   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:22:29,301-Speed 3263.69 samples/sec   Loss 3.5335   LearningRate 0.0170   Epoch: 11   Global Step: 146050   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:22:32,480-Speed 3221.40 samples/sec   Loss 3.5250   LearningRate 0.0170   Epoch: 11   Global Step: 146060   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:22:35,632-Speed 3249.92 samples/sec   Loss 3.5803   LearningRate 0.0170   Epoch: 11   Global Step: 146070   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:22:38,726-Speed 3311.17 samples/sec   Loss 3.4718   LearningRate 0.0170   Epoch: 11   Global Step: 146080   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:22:41,860-Speed 3268.42 samples/sec   Loss 3.4943   LearningRate 0.0170   Epoch: 11   Global Step: 146090   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:22:45,011-Speed 3250.15 samples/sec   Loss 3.6001   LearningRate 0.0170   Epoch: 11   Global Step: 146100   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:22:48,128-Speed 3286.42 samples/sec   Loss 3.5148   LearningRate 0.0170   Epoch: 11   Global Step: 146110   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:22:51,278-Speed 3252.30 samples/sec   Loss 3.4975   LearningRate 0.0170   Epoch: 11   Global Step: 146120   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:22:54,351-Speed 3333.50 samples/sec   Loss 3.5799   LearningRate 0.0170   Epoch: 11   Global Step: 146130   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:22:57,432-Speed 3324.57 samples/sec   Loss 3.5193   LearningRate 0.0170   Epoch: 11   Global Step: 146140   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:23:00,555-Speed 3280.08 samples/sec   Loss 3.5126   LearningRate 0.0169   Epoch: 11   Global Step: 146150   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:23:03,735-Speed 3221.10 samples/sec   Loss 3.5709   LearningRate 0.0169   Epoch: 11   Global Step: 146160   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:23:06,863-Speed 3274.44 samples/sec   Loss 3.4741   LearningRate 0.0169   Epoch: 11   Global Step: 146170   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:23:09,969-Speed 3298.40 samples/sec   Loss 3.4714   LearningRate 0.0169   Epoch: 11   Global Step: 146180   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:23:13,242-Speed 3128.59 samples/sec   Loss 3.5414   LearningRate 0.0169   Epoch: 11   Global Step: 146190   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:23:16,417-Speed 3226.58 samples/sec   Loss 3.5642   LearningRate 0.0169   Epoch: 11   Global Step: 146200   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:23:19,516-Speed 3305.75 samples/sec   Loss 3.4899   LearningRate 0.0169   Epoch: 11   Global Step: 146210   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:23:22,603-Speed 3317.71 samples/sec   Loss 3.5778   LearningRate 0.0169   Epoch: 11   Global Step: 146220   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:23:25,708-Speed 3299.72 samples/sec   Loss 3.5236   LearningRate 0.0169   Epoch: 11   Global Step: 146230   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:23:28,853-Speed 3256.33 samples/sec   Loss 3.4730   LearningRate 0.0169   Epoch: 11   Global Step: 146240   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:23:31,967-Speed 3288.75 samples/sec   Loss 3.4586   LearningRate 0.0169   Epoch: 11   Global Step: 146250   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:23:35,051-Speed 3321.89 samples/sec   Loss 3.4458   LearningRate 0.0169   Epoch: 11   Global Step: 146260   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:23:38,210-Speed 3243.16 samples/sec   Loss 3.5597   LearningRate 0.0169   Epoch: 11   Global Step: 146270   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:23:41,304-Speed 3310.63 samples/sec   Loss 3.5075   LearningRate 0.0169   Epoch: 11   Global Step: 146280   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:23:44,364-Speed 3346.49 samples/sec   Loss 3.4982   LearningRate 0.0169   Epoch: 11   Global Step: 146290   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:23:47,446-Speed 3324.21 samples/sec   Loss 3.4708   LearningRate 0.0169   Epoch: 11   Global Step: 146300   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:23:50,543-Speed 3306.97 samples/sec   Loss 3.5425   LearningRate 0.0169   Epoch: 11   Global Step: 146310   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:23:53,668-Speed 3278.39 samples/sec   Loss 3.5383   LearningRate 0.0169   Epoch: 11   Global Step: 146320   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:23:56,744-Speed 3330.26 samples/sec   Loss 3.5056   LearningRate 0.0169   Epoch: 11   Global Step: 146330   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:23:59,844-Speed 3304.62 samples/sec   Loss 3.6085   LearningRate 0.0169   Epoch: 11   Global Step: 146340   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:24:03,032-Speed 3212.44 samples/sec   Loss 3.4795   LearningRate 0.0169   Epoch: 11   Global Step: 146350   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:24:06,185-Speed 3248.85 samples/sec   Loss 3.6198   LearningRate 0.0169   Epoch: 11   Global Step: 146360   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:24:09,249-Speed 3342.73 samples/sec   Loss 3.5278   LearningRate 0.0169   Epoch: 11   Global Step: 146370   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:24:12,379-Speed 3273.01 samples/sec   Loss 3.5749   LearningRate 0.0169   Epoch: 11   Global Step: 146380   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:24:15,476-Speed 3307.79 samples/sec   Loss 3.4046   LearningRate 0.0169   Epoch: 11   Global Step: 146390   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:24:18,632-Speed 3245.43 samples/sec   Loss 3.5019   LearningRate 0.0169   Epoch: 11   Global Step: 146400   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:24:21,686-Speed 3354.52 samples/sec   Loss 3.5414   LearningRate 0.0169   Epoch: 11   Global Step: 146410   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:24:24,791-Speed 3298.78 samples/sec   Loss 3.4552   LearningRate 0.0169   Epoch: 11   Global Step: 146420   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:24:27,909-Speed 3285.39 samples/sec   Loss 3.4999   LearningRate 0.0169   Epoch: 11   Global Step: 146430   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:24:31,009-Speed 3304.04 samples/sec   Loss 3.5142   LearningRate 0.0169   Epoch: 11   Global Step: 146440   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:24:34,050-Speed 3368.93 samples/sec   Loss 3.5767   LearningRate 0.0168   Epoch: 11   Global Step: 146450   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:24:37,108-Speed 3349.92 samples/sec   Loss 3.4657   LearningRate 0.0168   Epoch: 11   Global Step: 146460   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:24:40,189-Speed 3324.02 samples/sec   Loss 3.5562   LearningRate 0.0168   Epoch: 11   Global Step: 146470   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:24:43,295-Speed 3297.62 samples/sec   Loss 3.5508   LearningRate 0.0168   Epoch: 11   Global Step: 146480   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:24:46,347-Speed 3357.06 samples/sec   Loss 3.5571   LearningRate 0.0168   Epoch: 11   Global Step: 146490   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:24:49,440-Speed 3311.97 samples/sec   Loss 3.5685   LearningRate 0.0168   Epoch: 11   Global Step: 146500   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:24:52,551-Speed 3291.99 samples/sec   Loss 3.5223   LearningRate 0.0168   Epoch: 11   Global Step: 146510   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:24:55,609-Speed 3349.87 samples/sec   Loss 3.5825   LearningRate 0.0168   Epoch: 11   Global Step: 146520   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:24:58,691-Speed 3323.39 samples/sec   Loss 3.5421   LearningRate 0.0168   Epoch: 11   Global Step: 146530   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:25:01,805-Speed 3289.60 samples/sec   Loss 3.5253   LearningRate 0.0168   Epoch: 11   Global Step: 146540   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:25:04,900-Speed 3309.66 samples/sec   Loss 3.5742   LearningRate 0.0168   Epoch: 11   Global Step: 146550   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:25:07,991-Speed 3314.61 samples/sec   Loss 3.5621   LearningRate 0.0168   Epoch: 11   Global Step: 146560   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:25:11,023-Speed 3377.32 samples/sec   Loss 3.5212   LearningRate 0.0168   Epoch: 11   Global Step: 146570   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:25:14,085-Speed 3345.75 samples/sec   Loss 3.4995   LearningRate 0.0168   Epoch: 11   Global Step: 146580   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:25:17,187-Speed 3302.65 samples/sec   Loss 3.4200   LearningRate 0.0168   Epoch: 11   Global Step: 146590   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:25:20,256-Speed 3337.70 samples/sec   Loss 3.6831   LearningRate 0.0168   Epoch: 11   Global Step: 146600   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:25:23,400-Speed 3257.55 samples/sec   Loss 3.5916   LearningRate 0.0168   Epoch: 11   Global Step: 146610   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:25:26,606-Speed 3195.81 samples/sec   Loss 3.5634   LearningRate 0.0168   Epoch: 11   Global Step: 146620   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:25:29,742-Speed 3265.55 samples/sec   Loss 3.5302   LearningRate 0.0168   Epoch: 11   Global Step: 146630   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:25:32,813-Speed 3335.53 samples/sec   Loss 3.5093   LearningRate 0.0168   Epoch: 11   Global Step: 146640   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:25:35,925-Speed 3292.37 samples/sec   Loss 3.5048   LearningRate 0.0168   Epoch: 11   Global Step: 146650   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:25:39,020-Speed 3309.61 samples/sec   Loss 3.4411   LearningRate 0.0168   Epoch: 11   Global Step: 146660   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:25:42,085-Speed 3341.66 samples/sec   Loss 3.4894   LearningRate 0.0168   Epoch: 11   Global Step: 146670   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:25:45,142-Speed 3350.61 samples/sec   Loss 3.5491   LearningRate 0.0168   Epoch: 11   Global Step: 146680   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:25:48,248-Speed 3298.35 samples/sec   Loss 3.5505   LearningRate 0.0168   Epoch: 11   Global Step: 146690   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:25:51,336-Speed 3317.29 samples/sec   Loss 3.5181   LearningRate 0.0168   Epoch: 11   Global Step: 146700   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:25:54,427-Speed 3313.93 samples/sec   Loss 3.6137   LearningRate 0.0168   Epoch: 11   Global Step: 146710   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:25:57,539-Speed 3291.10 samples/sec   Loss 3.5583   LearningRate 0.0168   Epoch: 11   Global Step: 146720   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:26:00,636-Speed 3308.13 samples/sec   Loss 3.4937   LearningRate 0.0168   Epoch: 11   Global Step: 146730   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:26:03,812-Speed 3225.20 samples/sec   Loss 3.5115   LearningRate 0.0168   Epoch: 11   Global Step: 146740   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:26:06,877-Speed 3341.96 samples/sec   Loss 3.5775   LearningRate 0.0167   Epoch: 11   Global Step: 146750   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:26:09,915-Speed 3371.97 samples/sec   Loss 3.5558   LearningRate 0.0167   Epoch: 11   Global Step: 146760   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-27 14:26:13,055-Speed 3262.10 samples/sec   Loss 3.4522   LearningRate 0.0167   Epoch: 11   Global Step: 146770   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-27 14:26:16,151-Speed 3308.15 samples/sec   Loss 3.5521   LearningRate 0.0167   Epoch: 11   Global Step: 146780   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-27 14:26:19,273-Speed 3280.61 samples/sec   Loss 3.4916   LearningRate 0.0167   Epoch: 11   Global Step: 146790   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-27 14:26:22,361-Speed 3317.93 samples/sec   Loss 3.4916   LearningRate 0.0167   Epoch: 11   Global Step: 146800   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-27 14:26:25,478-Speed 3286.46 samples/sec   Loss 3.5341   LearningRate 0.0167   Epoch: 11   Global Step: 146810   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-27 14:26:28,666-Speed 3212.21 samples/sec   Loss 3.4978   LearningRate 0.0167   Epoch: 11   Global Step: 146820   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-27 14:26:31,744-Speed 3328.43 samples/sec   Loss 3.4698   LearningRate 0.0167   Epoch: 11   Global Step: 146830   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-27 14:26:34,817-Speed 3333.24 samples/sec   Loss 3.5998   LearningRate 0.0167   Epoch: 11   Global Step: 146840   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-27 14:26:37,912-Speed 3309.50 samples/sec   Loss 3.6002   LearningRate 0.0167   Epoch: 11   Global Step: 146850   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-27 14:26:41,042-Speed 3272.14 samples/sec   Loss 3.4496   LearningRate 0.0167   Epoch: 11   Global Step: 146860   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:26:44,130-Speed 3316.83 samples/sec   Loss 3.5491   LearningRate 0.0167   Epoch: 11   Global Step: 146870   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:26:47,206-Speed 3330.68 samples/sec   Loss 3.5009   LearningRate 0.0167   Epoch: 11   Global Step: 146880   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:26:50,275-Speed 3337.28 samples/sec   Loss 3.5012   LearningRate 0.0167   Epoch: 11   Global Step: 146890   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:26:53,406-Speed 3272.30 samples/sec   Loss 3.5361   LearningRate 0.0167   Epoch: 11   Global Step: 146900   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:26:56,533-Speed 3275.23 samples/sec   Loss 3.4733   LearningRate 0.0167   Epoch: 11   Global Step: 146910   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:26:59,694-Speed 3240.82 samples/sec   Loss 3.4860   LearningRate 0.0167   Epoch: 11   Global Step: 146920   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:27:02,870-Speed 3225.48 samples/sec   Loss 3.5405   LearningRate 0.0167   Epoch: 11   Global Step: 146930   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:27:06,018-Speed 3253.83 samples/sec   Loss 3.5273   LearningRate 0.0167   Epoch: 11   Global Step: 146940   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:27:09,078-Speed 3347.42 samples/sec   Loss 3.5343   LearningRate 0.0167   Epoch: 11   Global Step: 146950   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:27:12,239-Speed 3240.39 samples/sec   Loss 3.4569   LearningRate 0.0167   Epoch: 11   Global Step: 146960   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:27:15,317-Speed 3328.26 samples/sec   Loss 3.5374   LearningRate 0.0167   Epoch: 11   Global Step: 146970   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:27:18,464-Speed 3254.35 samples/sec   Loss 3.5049   LearningRate 0.0167   Epoch: 11   Global Step: 146980   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:27:21,534-Speed 3336.14 samples/sec   Loss 3.5315   LearningRate 0.0167   Epoch: 11   Global Step: 146990   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:27:24,625-Speed 3314.61 samples/sec   Loss 3.6343   LearningRate 0.0167   Epoch: 11   Global Step: 147000   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:27:27,757-Speed 3269.80 samples/sec   Loss 3.4755   LearningRate 0.0167   Epoch: 11   Global Step: 147010   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:27:30,866-Speed 3294.77 samples/sec   Loss 3.4919   LearningRate 0.0167   Epoch: 11   Global Step: 147020   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:27:33,952-Speed 3319.82 samples/sec   Loss 3.5162   LearningRate 0.0167   Epoch: 11   Global Step: 147030   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:27:37,057-Speed 3298.49 samples/sec   Loss 3.5342   LearningRate 0.0167   Epoch: 11   Global Step: 147040   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:27:40,180-Speed 3280.01 samples/sec   Loss 3.5251   LearningRate 0.0167   Epoch: 11   Global Step: 147050   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:27:43,326-Speed 3256.33 samples/sec   Loss 3.5881   LearningRate 0.0166   Epoch: 11   Global Step: 147060   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 14:27:46,390-Speed 3343.16 samples/sec   Loss 3.5126   LearningRate 0.0166   Epoch: 11   Global Step: 147070   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:27:49,546-Speed 3245.48 samples/sec   Loss 3.5535   LearningRate 0.0166   Epoch: 11   Global Step: 147080   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:27:52,683-Speed 3265.07 samples/sec   Loss 3.4166   LearningRate 0.0166   Epoch: 11   Global Step: 147090   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:27:55,788-Speed 3298.91 samples/sec   Loss 3.4308   LearningRate 0.0166   Epoch: 11   Global Step: 147100   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:27:58,832-Speed 3365.70 samples/sec   Loss 3.6180   LearningRate 0.0166   Epoch: 11   Global Step: 147110   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:28:01,986-Speed 3247.81 samples/sec   Loss 3.4876   LearningRate 0.0166   Epoch: 11   Global Step: 147120   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:28:05,077-Speed 3314.10 samples/sec   Loss 3.4752   LearningRate 0.0166   Epoch: 11   Global Step: 147130   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:28:08,143-Speed 3340.75 samples/sec   Loss 3.5571   LearningRate 0.0166   Epoch: 11   Global Step: 147140   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:28:11,326-Speed 3218.48 samples/sec   Loss 3.5377   LearningRate 0.0166   Epoch: 11   Global Step: 147150   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:28:14,424-Speed 3306.33 samples/sec   Loss 3.5572   LearningRate 0.0166   Epoch: 11   Global Step: 147160   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:28:17,494-Speed 3336.79 samples/sec   Loss 3.5546   LearningRate 0.0166   Epoch: 11   Global Step: 147170   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:28:20,548-Speed 3353.66 samples/sec   Loss 3.4559   LearningRate 0.0166   Epoch: 11   Global Step: 147180   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:28:23,668-Speed 3283.08 samples/sec   Loss 3.5302   LearningRate 0.0166   Epoch: 11   Global Step: 147190   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:28:26,747-Speed 3327.18 samples/sec   Loss 3.4757   LearningRate 0.0166   Epoch: 11   Global Step: 147200   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:28:29,844-Speed 3307.08 samples/sec   Loss 3.5930   LearningRate 0.0166   Epoch: 11   Global Step: 147210   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:28:32,923-Speed 3327.20 samples/sec   Loss 3.5378   LearningRate 0.0166   Epoch: 11   Global Step: 147220   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:28:35,981-Speed 3349.95 samples/sec   Loss 3.5971   LearningRate 0.0166   Epoch: 11   Global Step: 147230   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:28:39,090-Speed 3294.17 samples/sec   Loss 3.5215   LearningRate 0.0166   Epoch: 11   Global Step: 147240   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:28:42,238-Speed 3253.94 samples/sec   Loss 3.5457   LearningRate 0.0166   Epoch: 11   Global Step: 147250   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:28:45,325-Speed 3318.03 samples/sec   Loss 3.4896   LearningRate 0.0166   Epoch: 11   Global Step: 147260   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:28:48,476-Speed 3250.42 samples/sec   Loss 3.5232   LearningRate 0.0166   Epoch: 11   Global Step: 147270   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:28:51,633-Speed 3245.34 samples/sec   Loss 3.4300   LearningRate 0.0166   Epoch: 11   Global Step: 147280   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:28:54,780-Speed 3255.40 samples/sec   Loss 3.5819   LearningRate 0.0166   Epoch: 11   Global Step: 147290   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:28:57,858-Speed 3327.24 samples/sec   Loss 3.5605   LearningRate 0.0166   Epoch: 11   Global Step: 147300   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:29:00,948-Speed 3314.77 samples/sec   Loss 3.4326   LearningRate 0.0166   Epoch: 11   Global Step: 147310   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:29:04,076-Speed 3275.47 samples/sec   Loss 3.5513   LearningRate 0.0166   Epoch: 11   Global Step: 147320   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:29:07,175-Speed 3304.41 samples/sec   Loss 3.5196   LearningRate 0.0166   Epoch: 11   Global Step: 147330   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:29:10,290-Speed 3288.42 samples/sec   Loss 3.5254   LearningRate 0.0166   Epoch: 11   Global Step: 147340   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:29:13,425-Speed 3268.48 samples/sec   Loss 3.5782   LearningRate 0.0166   Epoch: 11   Global Step: 147350   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:29:16,493-Speed 3338.36 samples/sec   Loss 3.5945   LearningRate 0.0165   Epoch: 11   Global Step: 147360   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:29:19,580-Speed 3317.65 samples/sec   Loss 3.5886   LearningRate 0.0165   Epoch: 11   Global Step: 147370   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:29:22,668-Speed 3317.45 samples/sec   Loss 3.4799   LearningRate 0.0165   Epoch: 11   Global Step: 147380   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:29:25,766-Speed 3306.16 samples/sec   Loss 3.5587   LearningRate 0.0165   Epoch: 11   Global Step: 147390   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:29:28,828-Speed 3345.54 samples/sec   Loss 3.5028   LearningRate 0.0165   Epoch: 11   Global Step: 147400   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:29:31,974-Speed 3255.43 samples/sec   Loss 3.4910   LearningRate 0.0165   Epoch: 11   Global Step: 147410   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:29:35,054-Speed 3326.37 samples/sec   Loss 3.4823   LearningRate 0.0165   Epoch: 11   Global Step: 147420   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:29:38,234-Speed 3221.25 samples/sec   Loss 3.6347   LearningRate 0.0165   Epoch: 11   Global Step: 147430   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:29:41,322-Speed 3316.54 samples/sec   Loss 3.4752   LearningRate 0.0165   Epoch: 11   Global Step: 147440   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:29:44,377-Speed 3353.87 samples/sec   Loss 3.5374   LearningRate 0.0165   Epoch: 11   Global Step: 147450   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:29:47,507-Speed 3272.03 samples/sec   Loss 3.4836   LearningRate 0.0165   Epoch: 11   Global Step: 147460   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:29:50,697-Speed 3210.96 samples/sec   Loss 3.4663   LearningRate 0.0165   Epoch: 11   Global Step: 147470   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:29:53,827-Speed 3273.37 samples/sec   Loss 3.5404   LearningRate 0.0165   Epoch: 11   Global Step: 147480   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:29:56,930-Speed 3300.85 samples/sec   Loss 3.4895   LearningRate 0.0165   Epoch: 11   Global Step: 147490   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:30:00,014-Speed 3320.99 samples/sec   Loss 3.5848   LearningRate 0.0165   Epoch: 11   Global Step: 147500   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:30:03,104-Speed 3315.65 samples/sec   Loss 3.5993   LearningRate 0.0165   Epoch: 11   Global Step: 147510   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:30:06,261-Speed 3243.91 samples/sec   Loss 3.5202   LearningRate 0.0165   Epoch: 11   Global Step: 147520   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:30:09,327-Speed 3341.55 samples/sec   Loss 3.5365   LearningRate 0.0165   Epoch: 11   Global Step: 147530   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:30:12,470-Speed 3259.30 samples/sec   Loss 3.5136   LearningRate 0.0165   Epoch: 11   Global Step: 147540   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:30:15,603-Speed 3269.55 samples/sec   Loss 3.5390   LearningRate 0.0165   Epoch: 11   Global Step: 147550   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-27 14:30:18,781-Speed 3223.29 samples/sec   Loss 3.5011   LearningRate 0.0165   Epoch: 11   Global Step: 147560   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-27 14:30:21,871-Speed 3314.97 samples/sec   Loss 3.4292   LearningRate 0.0165   Epoch: 11   Global Step: 147570   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-27 14:30:25,037-Speed 3235.63 samples/sec   Loss 3.5163   LearningRate 0.0165   Epoch: 11   Global Step: 147580   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-27 14:30:28,123-Speed 3318.64 samples/sec   Loss 3.5379   LearningRate 0.0165   Epoch: 11   Global Step: 147590   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-27 14:30:31,286-Speed 3238.95 samples/sec   Loss 3.5226   LearningRate 0.0165   Epoch: 11   Global Step: 147600   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-27 14:30:34,364-Speed 3327.81 samples/sec   Loss 3.4312   LearningRate 0.0165   Epoch: 11   Global Step: 147610   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-27 14:30:37,490-Speed 3276.79 samples/sec   Loss 3.5533   LearningRate 0.0165   Epoch: 11   Global Step: 147620   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-27 14:30:40,639-Speed 3253.09 samples/sec   Loss 3.5883   LearningRate 0.0165   Epoch: 11   Global Step: 147630   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-27 14:30:43,749-Speed 3293.48 samples/sec   Loss 3.5904   LearningRate 0.0165   Epoch: 11   Global Step: 147640   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-27 14:30:46,882-Speed 3269.59 samples/sec   Loss 3.4545   LearningRate 0.0165   Epoch: 11   Global Step: 147650   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:30:50,065-Speed 3218.32 samples/sec   Loss 3.5568   LearningRate 0.0165   Epoch: 11   Global Step: 147660   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:30:53,293-Speed 3172.72 samples/sec   Loss 3.5333   LearningRate 0.0164   Epoch: 11   Global Step: 147670   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:30:56,404-Speed 3293.22 samples/sec   Loss 3.5752   LearningRate 0.0164   Epoch: 11   Global Step: 147680   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:30:59,559-Speed 3246.24 samples/sec   Loss 3.5456   LearningRate 0.0164   Epoch: 11   Global Step: 147690   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:31:02,656-Speed 3307.73 samples/sec   Loss 3.5225   LearningRate 0.0164   Epoch: 11   Global Step: 147700   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:31:05,774-Speed 3285.28 samples/sec   Loss 3.5807   LearningRate 0.0164   Epoch: 11   Global Step: 147710   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:31:08,853-Speed 3326.72 samples/sec   Loss 3.5157   LearningRate 0.0164   Epoch: 11   Global Step: 147720   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:31:11,970-Speed 3286.19 samples/sec   Loss 3.5041   LearningRate 0.0164   Epoch: 11   Global Step: 147730   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:31:15,126-Speed 3246.08 samples/sec   Loss 3.4634   LearningRate 0.0164   Epoch: 11   Global Step: 147740   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:31:18,269-Speed 3258.67 samples/sec   Loss 3.4807   LearningRate 0.0164   Epoch: 11   Global Step: 147750   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:31:21,389-Speed 3283.57 samples/sec   Loss 3.5216   LearningRate 0.0164   Epoch: 11   Global Step: 147760   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:31:24,580-Speed 3210.03 samples/sec   Loss 3.5507   LearningRate 0.0164   Epoch: 11   Global Step: 147770   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:31:27,799-Speed 3181.87 samples/sec   Loss 3.4320   LearningRate 0.0164   Epoch: 11   Global Step: 147780   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:31:30,922-Speed 3279.62 samples/sec   Loss 3.5470   LearningRate 0.0164   Epoch: 11   Global Step: 147790   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:31:33,996-Speed 3332.49 samples/sec   Loss 3.4950   LearningRate 0.0164   Epoch: 11   Global Step: 147800   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:31:37,117-Speed 3282.37 samples/sec   Loss 3.4999   LearningRate 0.0164   Epoch: 11   Global Step: 147810   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:31:40,247-Speed 3271.89 samples/sec   Loss 3.4975   LearningRate 0.0164   Epoch: 11   Global Step: 147820   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:31:43,351-Speed 3300.11 samples/sec   Loss 3.5384   LearningRate 0.0164   Epoch: 11   Global Step: 147830   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:31:46,427-Speed 3330.60 samples/sec   Loss 3.5398   LearningRate 0.0164   Epoch: 11   Global Step: 147840   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:31:49,539-Speed 3291.57 samples/sec   Loss 3.4546   LearningRate 0.0164   Epoch: 11   Global Step: 147850   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 14:31:52,609-Speed 3336.86 samples/sec   Loss 3.4865   LearningRate 0.0164   Epoch: 11   Global Step: 147860   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:31:55,698-Speed 3316.51 samples/sec   Loss 3.5066   LearningRate 0.0164   Epoch: 11   Global Step: 147870   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:31:58,814-Speed 3287.09 samples/sec   Loss 3.5009   LearningRate 0.0164   Epoch: 11   Global Step: 147880   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:32:01,943-Speed 3273.09 samples/sec   Loss 3.5200   LearningRate 0.0164   Epoch: 11   Global Step: 147890   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:32:05,130-Speed 3214.01 samples/sec   Loss 3.5138   LearningRate 0.0164   Epoch: 11   Global Step: 147900   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:32:08,208-Speed 3328.15 samples/sec   Loss 3.5505   LearningRate 0.0164   Epoch: 11   Global Step: 147910   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:32:11,338-Speed 3272.77 samples/sec   Loss 3.5289   LearningRate 0.0164   Epoch: 11   Global Step: 147920   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:32:14,445-Speed 3296.45 samples/sec   Loss 3.5601   LearningRate 0.0164   Epoch: 11   Global Step: 147930   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:32:17,530-Speed 3320.28 samples/sec   Loss 3.4179   LearningRate 0.0164   Epoch: 11   Global Step: 147940   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:32:20,600-Speed 3336.56 samples/sec   Loss 3.4891   LearningRate 0.0164   Epoch: 11   Global Step: 147950   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:32:23,760-Speed 3241.55 samples/sec   Loss 3.4607   LearningRate 0.0164   Epoch: 11   Global Step: 147960   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 14:32:26,862-Speed 3302.42 samples/sec   Loss 3.5336   LearningRate 0.0164   Epoch: 11   Global Step: 147970   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 14:32:29,970-Speed 3295.07 samples/sec   Loss 3.4831   LearningRate 0.0163   Epoch: 11   Global Step: 147980   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 14:32:33,096-Speed 3277.16 samples/sec   Loss 3.4907   LearningRate 0.0163   Epoch: 11   Global Step: 147990   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 14:32:36,237-Speed 3261.86 samples/sec   Loss 3.5622   LearningRate 0.0163   Epoch: 11   Global Step: 148000   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 14:32:39,441-Speed 3196.40 samples/sec   Loss 3.5405   LearningRate 0.0163   Epoch: 11   Global Step: 148010   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 14:32:42,586-Speed 3257.13 samples/sec   Loss 3.4861   LearningRate 0.0163   Epoch: 11   Global Step: 148020   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:32:45,651-Speed 3342.35 samples/sec   Loss 3.5962   LearningRate 0.0163   Epoch: 11   Global Step: 148030   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:32:48,819-Speed 3232.98 samples/sec   Loss 3.5107   LearningRate 0.0163   Epoch: 11   Global Step: 148040   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:32:51,954-Speed 3267.60 samples/sec   Loss 3.5634   LearningRate 0.0163   Epoch: 11   Global Step: 148050   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:32:55,071-Speed 3286.52 samples/sec   Loss 3.5642   LearningRate 0.0163   Epoch: 11   Global Step: 148060   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:32:58,134-Speed 3343.61 samples/sec   Loss 3.5394   LearningRate 0.0163   Epoch: 11   Global Step: 148070   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:33:01,192-Speed 3350.39 samples/sec   Loss 3.4466   LearningRate 0.0163   Epoch: 11   Global Step: 148080   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:33:04,261-Speed 3337.37 samples/sec   Loss 3.4605   LearningRate 0.0163   Epoch: 11   Global Step: 148090   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:33:07,330-Speed 3338.54 samples/sec   Loss 3.5109   LearningRate 0.0163   Epoch: 11   Global Step: 148100   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:33:10,383-Speed 3354.23 samples/sec   Loss 3.5139   LearningRate 0.0163   Epoch: 11   Global Step: 148110   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:33:13,531-Speed 3254.10 samples/sec   Loss 3.5091   LearningRate 0.0163   Epoch: 11   Global Step: 148120   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:33:16,704-Speed 3228.35 samples/sec   Loss 3.5695   LearningRate 0.0163   Epoch: 11   Global Step: 148130   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:33:19,816-Speed 3291.34 samples/sec   Loss 3.5277   LearningRate 0.0163   Epoch: 11   Global Step: 148140   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:33:22,902-Speed 3319.17 samples/sec   Loss 3.4796   LearningRate 0.0163   Epoch: 11   Global Step: 148150   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:33:26,009-Speed 3297.27 samples/sec   Loss 3.5473   LearningRate 0.0163   Epoch: 11   Global Step: 148160   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:33:29,125-Speed 3287.44 samples/sec   Loss 3.6371   LearningRate 0.0163   Epoch: 11   Global Step: 148170   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:33:32,225-Speed 3304.09 samples/sec   Loss 3.5921   LearningRate 0.0163   Epoch: 11   Global Step: 148180   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:33:35,312-Speed 3318.57 samples/sec   Loss 3.4375   LearningRate 0.0163   Epoch: 11   Global Step: 148190   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:33:38,474-Speed 3239.01 samples/sec   Loss 3.5612   LearningRate 0.0163   Epoch: 11   Global Step: 148200   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:33:41,614-Speed 3262.73 samples/sec   Loss 3.5431   LearningRate 0.0163   Epoch: 11   Global Step: 148210   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:33:44,704-Speed 3314.77 samples/sec   Loss 3.5323   LearningRate 0.0163   Epoch: 11   Global Step: 148220   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:33:47,782-Speed 3328.44 samples/sec   Loss 3.5663   LearningRate 0.0163   Epoch: 11   Global Step: 148230   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:33:50,839-Speed 3350.78 samples/sec   Loss 3.5152   LearningRate 0.0163   Epoch: 11   Global Step: 148240   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:33:53,929-Speed 3314.66 samples/sec   Loss 3.5467   LearningRate 0.0163   Epoch: 11   Global Step: 148250   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:33:57,017-Speed 3317.52 samples/sec   Loss 3.4600   LearningRate 0.0163   Epoch: 11   Global Step: 148260   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:34:00,153-Speed 3265.75 samples/sec   Loss 3.4091   LearningRate 0.0163   Epoch: 11   Global Step: 148270   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:34:03,215-Speed 3345.09 samples/sec   Loss 3.4179   LearningRate 0.0162   Epoch: 11   Global Step: 148280   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:34:06,301-Speed 3320.29 samples/sec   Loss 3.5812   LearningRate 0.0162   Epoch: 11   Global Step: 148290   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:34:09,369-Speed 3338.32 samples/sec   Loss 3.5266   LearningRate 0.0162   Epoch: 11   Global Step: 148300   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:34:12,482-Speed 3290.73 samples/sec   Loss 3.4590   LearningRate 0.0162   Epoch: 11   Global Step: 148310   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:34:15,564-Speed 3322.72 samples/sec   Loss 3.5018   LearningRate 0.0162   Epoch: 11   Global Step: 148320   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:34:18,620-Speed 3352.11 samples/sec   Loss 3.5488   LearningRate 0.0162   Epoch: 11   Global Step: 148330   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:34:21,697-Speed 3328.76 samples/sec   Loss 3.4670   LearningRate 0.0162   Epoch: 11   Global Step: 148340   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:34:24,750-Speed 3355.88 samples/sec   Loss 3.5260   LearningRate 0.0162   Epoch: 11   Global Step: 148350   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:34:27,865-Speed 3288.59 samples/sec   Loss 3.5335   LearningRate 0.0162   Epoch: 11   Global Step: 148360   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:34:30,951-Speed 3319.57 samples/sec   Loss 3.5382   LearningRate 0.0162   Epoch: 11   Global Step: 148370   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:34:34,033-Speed 3322.99 samples/sec   Loss 3.4568   LearningRate 0.0162   Epoch: 11   Global Step: 148380   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:34:37,108-Speed 3331.79 samples/sec   Loss 3.4956   LearningRate 0.0162   Epoch: 11   Global Step: 148390   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:34:40,243-Speed 3267.05 samples/sec   Loss 3.4900   LearningRate 0.0162   Epoch: 11   Global Step: 148400   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 14:34:43,324-Speed 3325.05 samples/sec   Loss 3.5198   LearningRate 0.0162   Epoch: 11   Global Step: 148410   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:34:46,383-Speed 3348.42 samples/sec   Loss 3.4800   LearningRate 0.0162   Epoch: 11   Global Step: 148420   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:34:49,490-Speed 3296.61 samples/sec   Loss 3.4942   LearningRate 0.0162   Epoch: 11   Global Step: 148430   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:34:52,594-Speed 3300.24 samples/sec   Loss 3.5352   LearningRate 0.0162   Epoch: 11   Global Step: 148440   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:34:55,671-Speed 3328.76 samples/sec   Loss 3.5756   LearningRate 0.0162   Epoch: 11   Global Step: 148450   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:34:58,738-Speed 3339.86 samples/sec   Loss 3.5241   LearningRate 0.0162   Epoch: 11   Global Step: 148460   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:35:01,803-Speed 3341.75 samples/sec   Loss 3.5836   LearningRate 0.0162   Epoch: 11   Global Step: 148470   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:35:04,858-Speed 3353.02 samples/sec   Loss 3.4852   LearningRate 0.0162   Epoch: 11   Global Step: 148480   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:35:07,912-Speed 3353.76 samples/sec   Loss 3.5666   LearningRate 0.0162   Epoch: 11   Global Step: 148490   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:35:10,997-Speed 3320.37 samples/sec   Loss 3.5352   LearningRate 0.0162   Epoch: 11   Global Step: 148500   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:35:14,081-Speed 3321.54 samples/sec   Loss 3.5298   LearningRate 0.0162   Epoch: 11   Global Step: 148510   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:35:17,195-Speed 3289.46 samples/sec   Loss 3.5920   LearningRate 0.0162   Epoch: 11   Global Step: 148520   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:35:20,302-Speed 3296.37 samples/sec   Loss 3.5269   LearningRate 0.0162   Epoch: 11   Global Step: 148530   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:35:23,374-Speed 3335.01 samples/sec   Loss 3.4807   LearningRate 0.0162   Epoch: 11   Global Step: 148540   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:35:26,477-Speed 3301.03 samples/sec   Loss 3.6043   LearningRate 0.0162   Epoch: 11   Global Step: 148550   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:35:29,517-Speed 3369.84 samples/sec   Loss 3.5718   LearningRate 0.0162   Epoch: 11   Global Step: 148560   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:35:32,608-Speed 3314.44 samples/sec   Loss 3.5346   LearningRate 0.0162   Epoch: 11   Global Step: 148570   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:35:35,708-Speed 3304.34 samples/sec   Loss 3.5892   LearningRate 0.0162   Epoch: 11   Global Step: 148580   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:35:38,791-Speed 3321.62 samples/sec   Loss 3.5743   LearningRate 0.0161   Epoch: 11   Global Step: 148590   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:35:41,887-Speed 3309.61 samples/sec   Loss 3.4775   LearningRate 0.0161   Epoch: 11   Global Step: 148600   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:35:44,967-Speed 3325.48 samples/sec   Loss 3.5615   LearningRate 0.0161   Epoch: 11   Global Step: 148610   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:35:48,069-Speed 3302.70 samples/sec   Loss 3.5200   LearningRate 0.0161   Epoch: 11   Global Step: 148620   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:35:51,252-Speed 3217.34 samples/sec   Loss 3.4718   LearningRate 0.0161   Epoch: 11   Global Step: 148630   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:35:54,403-Speed 3250.92 samples/sec   Loss 3.5299   LearningRate 0.0161   Epoch: 11   Global Step: 148640   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:35:57,443-Speed 3369.88 samples/sec   Loss 3.4283   LearningRate 0.0161   Epoch: 11   Global Step: 148650   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-27 14:36:00,531-Speed 3317.55 samples/sec   Loss 3.4633   LearningRate 0.0161   Epoch: 11   Global Step: 148660   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-27 14:36:03,666-Speed 3267.67 samples/sec   Loss 3.4773   LearningRate 0.0161   Epoch: 11   Global Step: 148670   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-27 14:36:06,778-Speed 3290.93 samples/sec   Loss 3.5744   LearningRate 0.0161   Epoch: 11   Global Step: 148680   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-27 14:36:09,855-Speed 3328.90 samples/sec   Loss 3.5567   LearningRate 0.0161   Epoch: 11   Global Step: 148690   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-27 14:36:12,975-Speed 3283.05 samples/sec   Loss 3.4370   LearningRate 0.0161   Epoch: 11   Global Step: 148700   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-27 14:36:16,164-Speed 3212.20 samples/sec   Loss 3.5564   LearningRate 0.0161   Epoch: 11   Global Step: 148710   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-27 14:36:19,248-Speed 3321.24 samples/sec   Loss 3.4553   LearningRate 0.0161   Epoch: 11   Global Step: 148720   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-27 14:36:22,300-Speed 3357.07 samples/sec   Loss 3.6637   LearningRate 0.0161   Epoch: 11   Global Step: 148730   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-27 14:36:25,386-Speed 3318.84 samples/sec   Loss 3.5442   LearningRate 0.0161   Epoch: 11   Global Step: 148740   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-04-27 14:36:28,522-Speed 3266.89 samples/sec   Loss 3.4594   LearningRate 0.0161   Epoch: 11   Global Step: 148750   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:36:31,705-Speed 3217.85 samples/sec   Loss 3.4364   LearningRate 0.0161   Epoch: 11   Global Step: 148760   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:36:34,768-Speed 3343.83 samples/sec   Loss 3.5199   LearningRate 0.0161   Epoch: 11   Global Step: 148770   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:36:37,912-Speed 3258.06 samples/sec   Loss 3.5884   LearningRate 0.0161   Epoch: 11   Global Step: 148780   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:36:41,047-Speed 3267.60 samples/sec   Loss 3.5324   LearningRate 0.0161   Epoch: 11   Global Step: 148790   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:36:44,131-Speed 3321.98 samples/sec   Loss 3.5000   LearningRate 0.0161   Epoch: 11   Global Step: 148800   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:36:47,218-Speed 3318.14 samples/sec   Loss 3.5630   LearningRate 0.0161   Epoch: 11   Global Step: 148810   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:36:50,345-Speed 3275.20 samples/sec   Loss 3.5756   LearningRate 0.0161   Epoch: 11   Global Step: 148820   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:36:53,423-Speed 3328.11 samples/sec   Loss 3.5002   LearningRate 0.0161   Epoch: 11   Global Step: 148830   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:36:56,499-Speed 3330.56 samples/sec   Loss 3.4609   LearningRate 0.0161   Epoch: 11   Global Step: 148840   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:36:59,571-Speed 3333.80 samples/sec   Loss 3.5010   LearningRate 0.0161   Epoch: 11   Global Step: 148850   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:37:02,690-Speed 3285.08 samples/sec   Loss 3.5420   LearningRate 0.0161   Epoch: 11   Global Step: 148860   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:37:05,770-Speed 3325.74 samples/sec   Loss 3.4581   LearningRate 0.0161   Epoch: 11   Global Step: 148870   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:37:08,852-Speed 3323.00 samples/sec   Loss 3.5421   LearningRate 0.0161   Epoch: 11   Global Step: 148880   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:37:11,985-Speed 3269.56 samples/sec   Loss 3.5222   LearningRate 0.0161   Epoch: 11   Global Step: 148890   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:37:15,110-Speed 3278.59 samples/sec   Loss 3.6086   LearningRate 0.0160   Epoch: 11   Global Step: 148900   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:37:18,213-Speed 3300.27 samples/sec   Loss 3.4869   LearningRate 0.0160   Epoch: 11   Global Step: 148910   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:37:21,279-Speed 3341.82 samples/sec   Loss 3.5041   LearningRate 0.0160   Epoch: 11   Global Step: 148920   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:37:24,366-Speed 3317.49 samples/sec   Loss 3.4508   LearningRate 0.0160   Epoch: 11   Global Step: 148930   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:37:27,546-Speed 3221.96 samples/sec   Loss 3.5883   LearningRate 0.0160   Epoch: 11   Global Step: 148940   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:37:30,693-Speed 3254.53 samples/sec   Loss 3.6199   LearningRate 0.0160   Epoch: 11   Global Step: 148950   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 14:37:33,753-Speed 3347.70 samples/sec   Loss 3.5370   LearningRate 0.0160   Epoch: 11   Global Step: 148960   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:37:36,865-Speed 3291.24 samples/sec   Loss 3.4136   LearningRate 0.0160   Epoch: 11   Global Step: 148970   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:37:39,968-Speed 3301.50 samples/sec   Loss 3.5619   LearningRate 0.0160   Epoch: 11   Global Step: 148980   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:37:43,104-Speed 3266.57 samples/sec   Loss 3.5477   LearningRate 0.0160   Epoch: 11   Global Step: 148990   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:37:46,184-Speed 3325.91 samples/sec   Loss 3.5640   LearningRate 0.0160   Epoch: 11   Global Step: 149000   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:37:49,249-Speed 3341.20 samples/sec   Loss 3.5829   LearningRate 0.0160   Epoch: 11   Global Step: 149010   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:37:52,363-Speed 3289.98 samples/sec   Loss 3.5982   LearningRate 0.0160   Epoch: 11   Global Step: 149020   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:37:55,510-Speed 3255.04 samples/sec   Loss 3.6021   LearningRate 0.0160   Epoch: 11   Global Step: 149030   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:37:58,596-Speed 3319.68 samples/sec   Loss 3.4916   LearningRate 0.0160   Epoch: 11   Global Step: 149040   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:38:01,871-Speed 3127.61 samples/sec   Loss 3.4720   LearningRate 0.0160   Epoch: 11   Global Step: 149050   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:38:33,273-Speed 326.11 samples/sec   Loss 2.6755   LearningRate 0.0160   Epoch: 12   Global Step: 149060   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:38:36,731-Speed 2962.98 samples/sec   Loss 2.5341   LearningRate 0.0160   Epoch: 12   Global Step: 149070   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:38:39,810-Speed 3326.22 samples/sec   Loss 2.4833   LearningRate 0.0160   Epoch: 12   Global Step: 149080   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:38:42,863-Speed 3355.59 samples/sec   Loss 2.4804   LearningRate 0.0160   Epoch: 12   Global Step: 149090   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:38:45,919-Speed 3351.44 samples/sec   Loss 2.4738   LearningRate 0.0160   Epoch: 12   Global Step: 149100   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:38:49,056-Speed 3265.14 samples/sec   Loss 2.5349   LearningRate 0.0160   Epoch: 12   Global Step: 149110   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:38:52,217-Speed 3240.65 samples/sec   Loss 2.5276   LearningRate 0.0160   Epoch: 12   Global Step: 149120   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:38:55,324-Speed 3296.72 samples/sec   Loss 2.5013   LearningRate 0.0160   Epoch: 12   Global Step: 149130   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:38:58,403-Speed 3327.63 samples/sec   Loss 2.5715   LearningRate 0.0160   Epoch: 12   Global Step: 149140   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:39:01,485-Speed 3323.12 samples/sec   Loss 2.5366   LearningRate 0.0160   Epoch: 12   Global Step: 149150   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:39:04,618-Speed 3270.05 samples/sec   Loss 2.5781   LearningRate 0.0160   Epoch: 12   Global Step: 149160   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:39:07,934-Speed 3089.70 samples/sec   Loss 2.4740   LearningRate 0.0160   Epoch: 12   Global Step: 149170   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:39:11,019-Speed 3319.79 samples/sec   Loss 2.5162   LearningRate 0.0160   Epoch: 12   Global Step: 149180   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:39:14,113-Speed 3310.90 samples/sec   Loss 2.5539   LearningRate 0.0160   Epoch: 12   Global Step: 149190   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:39:17,287-Speed 3227.05 samples/sec   Loss 2.5289   LearningRate 0.0160   Epoch: 12   Global Step: 149200   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:39:20,363-Speed 3329.45 samples/sec   Loss 2.6049   LearningRate 0.0159   Epoch: 12   Global Step: 149210   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:39:23,492-Speed 3274.01 samples/sec   Loss 2.5852   LearningRate 0.0159   Epoch: 12   Global Step: 149220   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:39:26,540-Speed 3360.83 samples/sec   Loss 2.5555   LearningRate 0.0159   Epoch: 12   Global Step: 149230   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:39:29,625-Speed 3320.63 samples/sec   Loss 2.4744   LearningRate 0.0159   Epoch: 12   Global Step: 149240   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:39:32,683-Speed 3349.53 samples/sec   Loss 2.6112   LearningRate 0.0159   Epoch: 12   Global Step: 149250   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:39:35,845-Speed 3239.36 samples/sec   Loss 2.5254   LearningRate 0.0159   Epoch: 12   Global Step: 149260   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:39:38,979-Speed 3268.30 samples/sec   Loss 2.5550   LearningRate 0.0159   Epoch: 12   Global Step: 149270   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:39:42,097-Speed 3285.59 samples/sec   Loss 2.5683   LearningRate 0.0159   Epoch: 12   Global Step: 149280   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:39:45,296-Speed 3201.34 samples/sec   Loss 2.6110   LearningRate 0.0159   Epoch: 12   Global Step: 149290   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:39:48,495-Speed 3201.91 samples/sec   Loss 2.5136   LearningRate 0.0159   Epoch: 12   Global Step: 149300   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:39:51,641-Speed 3256.16 samples/sec   Loss 2.5668   LearningRate 0.0159   Epoch: 12   Global Step: 149310   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:39:54,956-Speed 3090.40 samples/sec   Loss 2.5980   LearningRate 0.0159   Epoch: 12   Global Step: 149320   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:39:58,041-Speed 3320.72 samples/sec   Loss 2.5242   LearningRate 0.0159   Epoch: 12   Global Step: 149330   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:40:01,220-Speed 3221.94 samples/sec   Loss 2.5081   LearningRate 0.0159   Epoch: 12   Global Step: 149340   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:40:04,352-Speed 3270.25 samples/sec   Loss 2.5137   LearningRate 0.0159   Epoch: 12   Global Step: 149350   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:40:07,474-Speed 3280.87 samples/sec   Loss 2.6445   LearningRate 0.0159   Epoch: 12   Global Step: 149360   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:40:10,566-Speed 3313.36 samples/sec   Loss 2.5727   LearningRate 0.0159   Epoch: 12   Global Step: 149370   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:40:13,669-Speed 3300.46 samples/sec   Loss 2.5159   LearningRate 0.0159   Epoch: 12   Global Step: 149380   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:40:16,836-Speed 3234.65 samples/sec   Loss 2.5346   LearningRate 0.0159   Epoch: 12   Global Step: 149390   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-04-27 14:40:19,973-Speed 3265.54 samples/sec   Loss 2.5757   LearningRate 0.0159   Epoch: 12   Global Step: 149400   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:40:23,060-Speed 3317.38 samples/sec   Loss 2.5638   LearningRate 0.0159   Epoch: 12   Global Step: 149410   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:40:26,164-Speed 3300.58 samples/sec   Loss 2.6100   LearningRate 0.0159   Epoch: 12   Global Step: 149420   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:40:29,331-Speed 3234.70 samples/sec   Loss 2.6836   LearningRate 0.0159   Epoch: 12   Global Step: 149430   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:40:32,461-Speed 3272.88 samples/sec   Loss 2.5942   LearningRate 0.0159   Epoch: 12   Global Step: 149440   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:40:35,563-Speed 3302.08 samples/sec   Loss 2.5539   LearningRate 0.0159   Epoch: 12   Global Step: 149450   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:40:38,717-Speed 3248.02 samples/sec   Loss 2.6428   LearningRate 0.0159   Epoch: 12   Global Step: 149460   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:40:41,802-Speed 3320.36 samples/sec   Loss 2.5981   LearningRate 0.0159   Epoch: 12   Global Step: 149470   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:40:44,914-Speed 3291.43 samples/sec   Loss 2.4691   LearningRate 0.0159   Epoch: 12   Global Step: 149480   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:40:47,997-Speed 3322.93 samples/sec   Loss 2.6118   LearningRate 0.0159   Epoch: 12   Global Step: 149490   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:40:51,101-Speed 3300.52 samples/sec   Loss 2.5127   LearningRate 0.0159   Epoch: 12   Global Step: 149500   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:40:54,270-Speed 3232.07 samples/sec   Loss 2.5873   LearningRate 0.0159   Epoch: 12   Global Step: 149510   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:40:57,361-Speed 3314.38 samples/sec   Loss 2.5743   LearningRate 0.0158   Epoch: 12   Global Step: 149520   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:41:00,447-Speed 3318.68 samples/sec   Loss 2.5986   LearningRate 0.0158   Epoch: 12   Global Step: 149530   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:41:03,579-Speed 3270.73 samples/sec   Loss 2.6285   LearningRate 0.0158   Epoch: 12   Global Step: 149540   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:41:06,721-Speed 3260.06 samples/sec   Loss 2.6537   LearningRate 0.0158   Epoch: 12   Global Step: 149550   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:41:09,793-Speed 3334.52 samples/sec   Loss 2.5708   LearningRate 0.0158   Epoch: 12   Global Step: 149560   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:41:12,909-Speed 3287.28 samples/sec   Loss 2.6036   LearningRate 0.0158   Epoch: 12   Global Step: 149570   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:41:16,042-Speed 3269.62 samples/sec   Loss 2.6144   LearningRate 0.0158   Epoch: 12   Global Step: 149580   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:41:19,113-Speed 3336.16 samples/sec   Loss 2.4931   LearningRate 0.0158   Epoch: 12   Global Step: 149590   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:41:22,212-Speed 3304.85 samples/sec   Loss 2.6585   LearningRate 0.0158   Epoch: 12   Global Step: 149600   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:41:25,385-Speed 3228.91 samples/sec   Loss 2.5799   LearningRate 0.0158   Epoch: 12   Global Step: 149610   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:41:28,487-Speed 3301.21 samples/sec   Loss 2.6443   LearningRate 0.0158   Epoch: 12   Global Step: 149620   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:41:31,633-Speed 3255.72 samples/sec   Loss 2.5573   LearningRate 0.0158   Epoch: 12   Global Step: 149630   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:41:34,764-Speed 3272.52 samples/sec   Loss 2.6034   LearningRate 0.0158   Epoch: 12   Global Step: 149640   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:41:37,932-Speed 3233.08 samples/sec   Loss 2.6587   LearningRate 0.0158   Epoch: 12   Global Step: 149650   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:41:41,106-Speed 3226.95 samples/sec   Loss 2.6124   LearningRate 0.0158   Epoch: 12   Global Step: 149660   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:41:44,187-Speed 3324.55 samples/sec   Loss 2.5559   LearningRate 0.0158   Epoch: 12   Global Step: 149670   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:41:47,273-Speed 3319.84 samples/sec   Loss 2.6015   LearningRate 0.0158   Epoch: 12   Global Step: 149680   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:41:50,334-Speed 3346.45 samples/sec   Loss 2.6087   LearningRate 0.0158   Epoch: 12   Global Step: 149690   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:41:53,419-Speed 3320.39 samples/sec   Loss 2.5735   LearningRate 0.0158   Epoch: 12   Global Step: 149700   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:41:56,527-Speed 3295.48 samples/sec   Loss 2.5861   LearningRate 0.0158   Epoch: 12   Global Step: 149710   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:41:59,647-Speed 3283.19 samples/sec   Loss 2.6347   LearningRate 0.0158   Epoch: 12   Global Step: 149720   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:42:02,759-Speed 3291.04 samples/sec   Loss 2.6256   LearningRate 0.0158   Epoch: 12   Global Step: 149730   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:42:05,876-Speed 3286.44 samples/sec   Loss 2.7044   LearningRate 0.0158   Epoch: 12   Global Step: 149740   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:42:08,963-Speed 3318.08 samples/sec   Loss 2.4721   LearningRate 0.0158   Epoch: 12   Global Step: 149750   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:42:12,036-Speed 3333.91 samples/sec   Loss 2.5965   LearningRate 0.0158   Epoch: 12   Global Step: 149760   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:42:15,190-Speed 3247.42 samples/sec   Loss 2.6915   LearningRate 0.0158   Epoch: 12   Global Step: 149770   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:42:18,337-Speed 3254.38 samples/sec   Loss 2.7093   LearningRate 0.0158   Epoch: 12   Global Step: 149780   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:42:21,427-Speed 3315.53 samples/sec   Loss 2.6494   LearningRate 0.0158   Epoch: 12   Global Step: 149790   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:42:24,543-Speed 3287.23 samples/sec   Loss 2.5196   LearningRate 0.0158   Epoch: 12   Global Step: 149800   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:42:27,689-Speed 3255.32 samples/sec   Loss 2.6350   LearningRate 0.0158   Epoch: 12   Global Step: 149810   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:42:30,817-Speed 3275.20 samples/sec   Loss 2.6363   LearningRate 0.0158   Epoch: 12   Global Step: 149820   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-04-27 14:42:33,891-Speed 3331.97 samples/sec   Loss 2.6200   LearningRate 0.0158   Epoch: 12   Global Step: 149830   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:42:36,983-Speed 3313.10 samples/sec   Loss 2.5637   LearningRate 0.0157   Epoch: 12   Global Step: 149840   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:42:40,059-Speed 3330.12 samples/sec   Loss 2.6433   LearningRate 0.0157   Epoch: 12   Global Step: 149850   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:42:43,159-Speed 3304.34 samples/sec   Loss 2.6481   LearningRate 0.0157   Epoch: 12   Global Step: 149860   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:42:46,274-Speed 3288.40 samples/sec   Loss 2.6866   LearningRate 0.0157   Epoch: 12   Global Step: 149870   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:42:49,365-Speed 3313.23 samples/sec   Loss 2.6151   LearningRate 0.0157   Epoch: 12   Global Step: 149880   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:42:52,490-Speed 3278.33 samples/sec   Loss 2.6177   LearningRate 0.0157   Epoch: 12   Global Step: 149890   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:42:55,570-Speed 3325.53 samples/sec   Loss 2.6366   LearningRate 0.0157   Epoch: 12   Global Step: 149900   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-04-27 14:42:58,682-Speed 3291.38 samples/sec   Loss 2.7103   LearningRate 0.0157   Epoch: 12   Global Step: 149910   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:43:01,787-Speed 3298.65 samples/sec   Loss 2.6489   LearningRate 0.0157   Epoch: 12   Global Step: 149920   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:43:04,911-Speed 3279.68 samples/sec   Loss 2.6033   LearningRate 0.0157   Epoch: 12   Global Step: 149930   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:43:08,014-Speed 3301.18 samples/sec   Loss 2.6772   LearningRate 0.0157   Epoch: 12   Global Step: 149940   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:43:11,109-Speed 3309.38 samples/sec   Loss 2.6661   LearningRate 0.0157   Epoch: 12   Global Step: 149950   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:43:14,252-Speed 3258.30 samples/sec   Loss 2.6317   LearningRate 0.0157   Epoch: 12   Global Step: 149960   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:43:17,381-Speed 3273.46 samples/sec   Loss 2.6774   LearningRate 0.0157   Epoch: 12   Global Step: 149970   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:43:20,463-Speed 3323.60 samples/sec   Loss 2.6760   LearningRate 0.0157   Epoch: 12   Global Step: 149980   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:43:23,513-Speed 3359.23 samples/sec   Loss 2.6869   LearningRate 0.0157   Epoch: 12   Global Step: 149990   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:43:26,665-Speed 3249.55 samples/sec   Loss 2.5819   LearningRate 0.0157   Epoch: 12   Global Step: 150000   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:43:29,808-Speed 3258.79 samples/sec   Loss 2.6161   LearningRate 0.0157   Epoch: 12   Global Step: 150010   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:43:32,883-Speed 3330.62 samples/sec   Loss 2.7070   LearningRate 0.0157   Epoch: 12   Global Step: 150020   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:43:36,014-Speed 3271.59 samples/sec   Loss 2.6663   LearningRate 0.0157   Epoch: 12   Global Step: 150030   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 14:43:39,047-Speed 3377.31 samples/sec   Loss 2.6590   LearningRate 0.0157   Epoch: 12   Global Step: 150040   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:43:42,119-Speed 3334.56 samples/sec   Loss 2.6953   LearningRate 0.0157   Epoch: 12   Global Step: 150050   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:43:45,202-Speed 3323.43 samples/sec   Loss 2.6492   LearningRate 0.0157   Epoch: 12   Global Step: 150060   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:43:48,344-Speed 3259.66 samples/sec   Loss 2.6914   LearningRate 0.0157   Epoch: 12   Global Step: 150070   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:43:51,416-Speed 3334.12 samples/sec   Loss 2.6900   LearningRate 0.0157   Epoch: 12   Global Step: 150080   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:43:54,518-Speed 3302.25 samples/sec   Loss 2.6367   LearningRate 0.0157   Epoch: 12   Global Step: 150090   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:43:57,574-Speed 3351.17 samples/sec   Loss 2.6168   LearningRate 0.0157   Epoch: 12   Global Step: 150100   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:44:00,685-Speed 3292.80 samples/sec   Loss 2.5910   LearningRate 0.0157   Epoch: 12   Global Step: 150110   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:44:03,867-Speed 3219.24 samples/sec   Loss 2.6644   LearningRate 0.0157   Epoch: 12   Global Step: 150120   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:44:06,971-Speed 3299.49 samples/sec   Loss 2.6166   LearningRate 0.0157   Epoch: 12   Global Step: 150130   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:44:10,067-Speed 3309.42 samples/sec   Loss 2.6781   LearningRate 0.0157   Epoch: 12   Global Step: 150140   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:44:13,141-Speed 3331.92 samples/sec   Loss 2.6399   LearningRate 0.0156   Epoch: 12   Global Step: 150150   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:44:16,207-Speed 3341.17 samples/sec   Loss 2.6609   LearningRate 0.0156   Epoch: 12   Global Step: 150160   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:44:19,313-Speed 3297.53 samples/sec   Loss 2.6798   LearningRate 0.0156   Epoch: 12   Global Step: 150170   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:44:22,380-Speed 3339.74 samples/sec   Loss 2.6897   LearningRate 0.0156   Epoch: 12   Global Step: 150180   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:44:25,488-Speed 3295.93 samples/sec   Loss 2.6415   LearningRate 0.0156   Epoch: 12   Global Step: 150190   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:44:28,647-Speed 3242.55 samples/sec   Loss 2.7288   LearningRate 0.0156   Epoch: 12   Global Step: 150200   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:44:31,754-Speed 3296.87 samples/sec   Loss 2.7309   LearningRate 0.0156   Epoch: 12   Global Step: 150210   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:44:34,817-Speed 3343.99 samples/sec   Loss 2.6415   LearningRate 0.0156   Epoch: 12   Global Step: 150220   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:44:37,897-Speed 3325.34 samples/sec   Loss 2.6819   LearningRate 0.0156   Epoch: 12   Global Step: 150230   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:44:40,952-Speed 3353.73 samples/sec   Loss 2.6713   LearningRate 0.0156   Epoch: 12   Global Step: 150240   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 14:44:44,000-Speed 3360.38 samples/sec   Loss 2.6607   LearningRate 0.0156   Epoch: 12   Global Step: 150250   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:44:47,059-Speed 3348.37 samples/sec   Loss 2.6771   LearningRate 0.0156   Epoch: 12   Global Step: 150260   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:44:50,124-Speed 3342.06 samples/sec   Loss 2.6976   LearningRate 0.0156   Epoch: 12   Global Step: 150270   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:44:53,248-Speed 3278.70 samples/sec   Loss 2.6186   LearningRate 0.0156   Epoch: 12   Global Step: 150280   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:44:56,347-Speed 3305.70 samples/sec   Loss 2.6836   LearningRate 0.0156   Epoch: 12   Global Step: 150290   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:44:59,421-Speed 3332.01 samples/sec   Loss 2.7303   LearningRate 0.0156   Epoch: 12   Global Step: 150300   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:45:02,491-Speed 3337.15 samples/sec   Loss 2.6757   LearningRate 0.0156   Epoch: 12   Global Step: 150310   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:45:05,630-Speed 3262.85 samples/sec   Loss 2.7367   LearningRate 0.0156   Epoch: 12   Global Step: 150320   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:45:08,697-Speed 3340.41 samples/sec   Loss 2.6820   LearningRate 0.0156   Epoch: 12   Global Step: 150330   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:45:11,750-Speed 3354.06 samples/sec   Loss 2.6855   LearningRate 0.0156   Epoch: 12   Global Step: 150340   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:45:14,877-Speed 3276.15 samples/sec   Loss 2.6796   LearningRate 0.0156   Epoch: 12   Global Step: 150350   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:45:17,940-Speed 3344.82 samples/sec   Loss 2.7146   LearningRate 0.0156   Epoch: 12   Global Step: 150360   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:45:21,036-Speed 3307.47 samples/sec   Loss 2.5886   LearningRate 0.0156   Epoch: 12   Global Step: 150370   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:45:24,157-Speed 3282.27 samples/sec   Loss 2.7368   LearningRate 0.0156   Epoch: 12   Global Step: 150380   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:45:27,310-Speed 3249.08 samples/sec   Loss 2.7463   LearningRate 0.0156   Epoch: 12   Global Step: 150390   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:45:30,430-Speed 3282.98 samples/sec   Loss 2.6857   LearningRate 0.0156   Epoch: 12   Global Step: 150400   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:45:33,516-Speed 3318.91 samples/sec   Loss 2.6745   LearningRate 0.0156   Epoch: 12   Global Step: 150410   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:45:36,703-Speed 3214.30 samples/sec   Loss 2.6376   LearningRate 0.0156   Epoch: 12   Global Step: 150420   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:45:39,845-Speed 3260.00 samples/sec   Loss 2.6830   LearningRate 0.0156   Epoch: 12   Global Step: 150430   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:45:42,973-Speed 3275.05 samples/sec   Loss 2.7450   LearningRate 0.0156   Epoch: 12   Global Step: 150440   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:45:46,023-Speed 3357.89 samples/sec   Loss 2.6900   LearningRate 0.0156   Epoch: 12   Global Step: 150450   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:45:49,163-Speed 3261.97 samples/sec   Loss 2.7346   LearningRate 0.0155   Epoch: 12   Global Step: 150460   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:45:52,329-Speed 3236.30 samples/sec   Loss 2.7585   LearningRate 0.0155   Epoch: 12   Global Step: 150470   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:45:55,406-Speed 3328.50 samples/sec   Loss 2.7572   LearningRate 0.0155   Epoch: 12   Global Step: 150480   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:45:58,495-Speed 3315.90 samples/sec   Loss 2.7672   LearningRate 0.0155   Epoch: 12   Global Step: 150490   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:46:01,595-Speed 3304.07 samples/sec   Loss 2.6643   LearningRate 0.0155   Epoch: 12   Global Step: 150500   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:46:04,695-Speed 3305.28 samples/sec   Loss 2.6892   LearningRate 0.0155   Epoch: 12   Global Step: 150510   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:46:07,783-Speed 3316.19 samples/sec   Loss 2.7306   LearningRate 0.0155   Epoch: 12   Global Step: 150520   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:46:10,844-Speed 3346.61 samples/sec   Loss 2.7456   LearningRate 0.0155   Epoch: 12   Global Step: 150530   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:46:13,958-Speed 3289.47 samples/sec   Loss 2.7137   LearningRate 0.0155   Epoch: 12   Global Step: 150540   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:46:17,045-Speed 3318.87 samples/sec   Loss 2.7146   LearningRate 0.0155   Epoch: 12   Global Step: 150550   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:46:20,140-Speed 3309.16 samples/sec   Loss 2.7906   LearningRate 0.0155   Epoch: 12   Global Step: 150560   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:46:23,250-Speed 3294.24 samples/sec   Loss 2.8062   LearningRate 0.0155   Epoch: 12   Global Step: 150570   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:46:26,352-Speed 3301.68 samples/sec   Loss 2.7162   LearningRate 0.0155   Epoch: 12   Global Step: 150580   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:46:29,475-Speed 3279.46 samples/sec   Loss 2.7647   LearningRate 0.0155   Epoch: 12   Global Step: 150590   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:46:32,556-Speed 3325.22 samples/sec   Loss 2.7347   LearningRate 0.0155   Epoch: 12   Global Step: 150600   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:46:35,632-Speed 3329.81 samples/sec   Loss 2.6943   LearningRate 0.0155   Epoch: 12   Global Step: 150610   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:46:38,809-Speed 3223.71 samples/sec   Loss 2.6821   LearningRate 0.0155   Epoch: 12   Global Step: 150620   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:46:41,907-Speed 3307.23 samples/sec   Loss 2.7812   LearningRate 0.0155   Epoch: 12   Global Step: 150630   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:46:44,977-Speed 3336.43 samples/sec   Loss 2.7545   LearningRate 0.0155   Epoch: 12   Global Step: 150640   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:46:48,046-Speed 3337.74 samples/sec   Loss 2.7288   LearningRate 0.0155   Epoch: 12   Global Step: 150650   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:46:51,121-Speed 3331.71 samples/sec   Loss 2.6410   LearningRate 0.0155   Epoch: 12   Global Step: 150660   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:46:54,189-Speed 3338.03 samples/sec   Loss 2.7580   LearningRate 0.0155   Epoch: 12   Global Step: 150670   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:46:57,323-Speed 3268.75 samples/sec   Loss 2.7448   LearningRate 0.0155   Epoch: 12   Global Step: 150680   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:47:00,442-Speed 3283.79 samples/sec   Loss 2.7118   LearningRate 0.0155   Epoch: 12   Global Step: 150690   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:47:03,498-Speed 3351.84 samples/sec   Loss 2.8109   LearningRate 0.0155   Epoch: 12   Global Step: 150700   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:47:06,645-Speed 3255.19 samples/sec   Loss 2.7326   LearningRate 0.0155   Epoch: 12   Global Step: 150710   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:47:09,717-Speed 3333.89 samples/sec   Loss 2.7203   LearningRate 0.0155   Epoch: 12   Global Step: 150720   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:47:12,786-Speed 3337.62 samples/sec   Loss 2.7439   LearningRate 0.0155   Epoch: 12   Global Step: 150730   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:47:15,936-Speed 3251.21 samples/sec   Loss 2.7696   LearningRate 0.0155   Epoch: 12   Global Step: 150740   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:47:19,069-Speed 3270.04 samples/sec   Loss 2.7102   LearningRate 0.0155   Epoch: 12   Global Step: 150750   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:47:22,133-Speed 3342.69 samples/sec   Loss 2.6517   LearningRate 0.0155   Epoch: 12   Global Step: 150760   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:47:25,202-Speed 3337.54 samples/sec   Loss 2.7920   LearningRate 0.0155   Epoch: 12   Global Step: 150770   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:47:28,267-Speed 3342.35 samples/sec   Loss 2.7834   LearningRate 0.0154   Epoch: 12   Global Step: 150780   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:47:31,399-Speed 3270.02 samples/sec   Loss 2.7909   LearningRate 0.0154   Epoch: 12   Global Step: 150790   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:47:34,483-Speed 3321.49 samples/sec   Loss 2.7882   LearningRate 0.0154   Epoch: 12   Global Step: 150800   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:47:37,690-Speed 3194.65 samples/sec   Loss 2.7365   LearningRate 0.0154   Epoch: 12   Global Step: 150810   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:47:40,763-Speed 3332.10 samples/sec   Loss 2.8269   LearningRate 0.0154   Epoch: 12   Global Step: 150820   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:47:43,859-Speed 3309.41 samples/sec   Loss 2.7007   LearningRate 0.0154   Epoch: 12   Global Step: 150830   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:47:46,958-Speed 3304.98 samples/sec   Loss 2.7947   LearningRate 0.0154   Epoch: 12   Global Step: 150840   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:47:50,031-Speed 3333.66 samples/sec   Loss 2.8343   LearningRate 0.0154   Epoch: 12   Global Step: 150850   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:47:53,139-Speed 3295.95 samples/sec   Loss 2.7608   LearningRate 0.0154   Epoch: 12   Global Step: 150860   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:47:56,196-Speed 3350.15 samples/sec   Loss 2.7407   LearningRate 0.0154   Epoch: 12   Global Step: 150870   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:47:59,265-Speed 3337.79 samples/sec   Loss 2.6811   LearningRate 0.0154   Epoch: 12   Global Step: 150880   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:48:02,458-Speed 3208.01 samples/sec   Loss 2.7063   LearningRate 0.0154   Epoch: 12   Global Step: 150890   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:48:05,594-Speed 3266.43 samples/sec   Loss 2.7254   LearningRate 0.0154   Epoch: 12   Global Step: 150900   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:48:08,685-Speed 3313.64 samples/sec   Loss 2.8105   LearningRate 0.0154   Epoch: 12   Global Step: 150910   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:48:11,760-Speed 3332.25 samples/sec   Loss 2.8219   LearningRate 0.0154   Epoch: 12   Global Step: 150920   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:48:14,844-Speed 3320.87 samples/sec   Loss 2.8186   LearningRate 0.0154   Epoch: 12   Global Step: 150930   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:48:17,960-Speed 3288.17 samples/sec   Loss 2.7296   LearningRate 0.0154   Epoch: 12   Global Step: 150940   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:48:21,032-Speed 3334.53 samples/sec   Loss 2.7844   LearningRate 0.0154   Epoch: 12   Global Step: 150950   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:48:24,152-Speed 3282.45 samples/sec   Loss 2.7369   LearningRate 0.0154   Epoch: 12   Global Step: 150960   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:48:27,272-Speed 3282.88 samples/sec   Loss 2.7471   LearningRate 0.0154   Epoch: 12   Global Step: 150970   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:48:30,352-Speed 3325.43 samples/sec   Loss 2.7592   LearningRate 0.0154   Epoch: 12   Global Step: 150980   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:48:33,424-Speed 3335.77 samples/sec   Loss 2.7644   LearningRate 0.0154   Epoch: 12   Global Step: 150990   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:48:36,572-Speed 3253.52 samples/sec   Loss 2.7028   LearningRate 0.0154   Epoch: 12   Global Step: 151000   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:48:39,820-Speed 3153.62 samples/sec   Loss 2.7388   LearningRate 0.0154   Epoch: 12   Global Step: 151010   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:48:43,030-Speed 3191.12 samples/sec   Loss 2.7671   LearningRate 0.0154   Epoch: 12   Global Step: 151020   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:48:46,117-Speed 3317.72 samples/sec   Loss 2.7714   LearningRate 0.0154   Epoch: 12   Global Step: 151030   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:48:49,261-Speed 3258.83 samples/sec   Loss 2.7298   LearningRate 0.0154   Epoch: 12   Global Step: 151040   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:48:52,314-Speed 3355.06 samples/sec   Loss 2.7434   LearningRate 0.0154   Epoch: 12   Global Step: 151050   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:48:55,399-Speed 3319.89 samples/sec   Loss 2.7747   LearningRate 0.0154   Epoch: 12   Global Step: 151060   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:48:58,455-Speed 3351.67 samples/sec   Loss 2.7736   LearningRate 0.0154   Epoch: 12   Global Step: 151070   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:49:01,522-Speed 3339.46 samples/sec   Loss 2.8147   LearningRate 0.0154   Epoch: 12   Global Step: 151080   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:49:04,626-Speed 3300.84 samples/sec   Loss 2.7160   LearningRate 0.0154   Epoch: 12   Global Step: 151090   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:49:07,724-Speed 3306.60 samples/sec   Loss 2.8317   LearningRate 0.0153   Epoch: 12   Global Step: 151100   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:49:10,787-Speed 3343.97 samples/sec   Loss 2.7945   LearningRate 0.0153   Epoch: 12   Global Step: 151110   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:49:13,870-Speed 3323.11 samples/sec   Loss 2.8330   LearningRate 0.0153   Epoch: 12   Global Step: 151120   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:49:16,987-Speed 3286.06 samples/sec   Loss 2.7947   LearningRate 0.0153   Epoch: 12   Global Step: 151130   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:49:20,116-Speed 3272.88 samples/sec   Loss 2.7584   LearningRate 0.0153   Epoch: 12   Global Step: 151140   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:49:23,171-Speed 3353.46 samples/sec   Loss 2.8014   LearningRate 0.0153   Epoch: 12   Global Step: 151150   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:49:26,257-Speed 3318.78 samples/sec   Loss 2.7524   LearningRate 0.0153   Epoch: 12   Global Step: 151160   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:49:29,368-Speed 3292.81 samples/sec   Loss 2.8136   LearningRate 0.0153   Epoch: 12   Global Step: 151170   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:49:32,448-Speed 3325.78 samples/sec   Loss 2.7045   LearningRate 0.0153   Epoch: 12   Global Step: 151180   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:49:35,529-Speed 3325.45 samples/sec   Loss 2.7350   LearningRate 0.0153   Epoch: 12   Global Step: 151190   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:49:38,590-Speed 3345.47 samples/sec   Loss 2.8659   LearningRate 0.0153   Epoch: 12   Global Step: 151200   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:49:41,680-Speed 3315.03 samples/sec   Loss 2.8041   LearningRate 0.0153   Epoch: 12   Global Step: 151210   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:49:44,779-Speed 3306.28 samples/sec   Loss 2.7221   LearningRate 0.0153   Epoch: 12   Global Step: 151220   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:49:47,952-Speed 3228.32 samples/sec   Loss 2.7872   LearningRate 0.0153   Epoch: 12   Global Step: 151230   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:49:51,076-Speed 3278.89 samples/sec   Loss 2.7516   LearningRate 0.0153   Epoch: 12   Global Step: 151240   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:49:54,178-Speed 3301.33 samples/sec   Loss 2.8309   LearningRate 0.0153   Epoch: 12   Global Step: 151250   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:49:57,288-Speed 3294.00 samples/sec   Loss 2.7767   LearningRate 0.0153   Epoch: 12   Global Step: 151260   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:50:00,355-Speed 3339.79 samples/sec   Loss 2.8046   LearningRate 0.0153   Epoch: 12   Global Step: 151270   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:50:03,445-Speed 3315.13 samples/sec   Loss 2.7404   LearningRate 0.0153   Epoch: 12   Global Step: 151280   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:50:06,660-Speed 3186.26 samples/sec   Loss 2.8527   LearningRate 0.0153   Epoch: 12   Global Step: 151290   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:50:09,758-Speed 3306.59 samples/sec   Loss 2.7181   LearningRate 0.0153   Epoch: 12   Global Step: 151300   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:50:12,970-Speed 3188.73 samples/sec   Loss 2.7001   LearningRate 0.0153   Epoch: 12   Global Step: 151310   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:50:16,074-Speed 3300.29 samples/sec   Loss 2.8185   LearningRate 0.0153   Epoch: 12   Global Step: 151320   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:50:19,201-Speed 3275.93 samples/sec   Loss 2.7857   LearningRate 0.0153   Epoch: 12   Global Step: 151330   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:50:22,270-Speed 3337.37 samples/sec   Loss 2.7680   LearningRate 0.0153   Epoch: 12   Global Step: 151340   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:50:25,398-Speed 3275.68 samples/sec   Loss 2.6848   LearningRate 0.0153   Epoch: 12   Global Step: 151350   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:50:28,493-Speed 3309.13 samples/sec   Loss 2.8073   LearningRate 0.0153   Epoch: 12   Global Step: 151360   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:50:31,659-Speed 3235.86 samples/sec   Loss 2.8033   LearningRate 0.0153   Epoch: 12   Global Step: 151370   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:50:34,740-Speed 3324.83 samples/sec   Loss 2.7544   LearningRate 0.0153   Epoch: 12   Global Step: 151380   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:50:37,908-Speed 3233.12 samples/sec   Loss 2.8288   LearningRate 0.0153   Epoch: 12   Global Step: 151390   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:50:40,971-Speed 3344.16 samples/sec   Loss 2.8278   LearningRate 0.0153   Epoch: 12   Global Step: 151400   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:50:44,056-Speed 3320.29 samples/sec   Loss 2.8177   LearningRate 0.0152   Epoch: 12   Global Step: 151410   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:50:47,153-Speed 3307.71 samples/sec   Loss 2.7446   LearningRate 0.0152   Epoch: 12   Global Step: 151420   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:50:50,244-Speed 3313.16 samples/sec   Loss 2.7975   LearningRate 0.0152   Epoch: 12   Global Step: 151430   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:50:53,346-Speed 3302.61 samples/sec   Loss 2.8657   LearningRate 0.0152   Epoch: 12   Global Step: 151440   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:50:56,492-Speed 3256.05 samples/sec   Loss 2.8971   LearningRate 0.0152   Epoch: 12   Global Step: 151450   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:50:59,654-Speed 3239.28 samples/sec   Loss 2.7105   LearningRate 0.0152   Epoch: 12   Global Step: 151460   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:51:02,782-Speed 3274.89 samples/sec   Loss 2.7323   LearningRate 0.0152   Epoch: 12   Global Step: 151470   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:51:05,902-Speed 3286.75 samples/sec   Loss 2.7840   LearningRate 0.0152   Epoch: 12   Global Step: 151480   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:51:09,028-Speed 3276.69 samples/sec   Loss 2.8493   LearningRate 0.0152   Epoch: 12   Global Step: 151490   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:51:12,231-Speed 3198.41 samples/sec   Loss 2.7735   LearningRate 0.0152   Epoch: 12   Global Step: 151500   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:51:15,473-Speed 3159.62 samples/sec   Loss 2.8021   LearningRate 0.0152   Epoch: 12   Global Step: 151510   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:51:18,584-Speed 3292.02 samples/sec   Loss 2.7553   LearningRate 0.0152   Epoch: 12   Global Step: 151520   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:51:21,702-Speed 3285.25 samples/sec   Loss 2.7916   LearningRate 0.0152   Epoch: 12   Global Step: 151530   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:51:24,810-Speed 3296.34 samples/sec   Loss 2.8522   LearningRate 0.0152   Epoch: 12   Global Step: 151540   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:51:27,895-Speed 3319.72 samples/sec   Loss 2.8100   LearningRate 0.0152   Epoch: 12   Global Step: 151550   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:51:31,042-Speed 3255.79 samples/sec   Loss 2.8394   LearningRate 0.0152   Epoch: 12   Global Step: 151560   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:51:34,146-Speed 3299.21 samples/sec   Loss 2.8748   LearningRate 0.0152   Epoch: 12   Global Step: 151570   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:51:37,267-Speed 3282.97 samples/sec   Loss 2.8152   LearningRate 0.0152   Epoch: 12   Global Step: 151580   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:51:40,376-Speed 3294.04 samples/sec   Loss 2.8151   LearningRate 0.0152   Epoch: 12   Global Step: 151590   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:51:43,459-Speed 3322.95 samples/sec   Loss 2.8647   LearningRate 0.0152   Epoch: 12   Global Step: 151600   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:51:46,542-Speed 3322.01 samples/sec   Loss 2.8133   LearningRate 0.0152   Epoch: 12   Global Step: 151610   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:51:49,635-Speed 3312.43 samples/sec   Loss 2.7827   LearningRate 0.0152   Epoch: 12   Global Step: 151620   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:51:52,764-Speed 3273.96 samples/sec   Loss 2.7472   LearningRate 0.0152   Epoch: 12   Global Step: 151630   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:51:55,907-Speed 3258.29 samples/sec   Loss 2.8479   LearningRate 0.0152   Epoch: 12   Global Step: 151640   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:51:59,045-Speed 3264.73 samples/sec   Loss 2.8554   LearningRate 0.0152   Epoch: 12   Global Step: 151650   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:52:02,120-Speed 3331.76 samples/sec   Loss 2.8107   LearningRate 0.0152   Epoch: 12   Global Step: 151660   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:52:05,208-Speed 3317.39 samples/sec   Loss 2.8768   LearningRate 0.0152   Epoch: 12   Global Step: 151670   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:52:08,275-Speed 3339.45 samples/sec   Loss 2.8795   LearningRate 0.0152   Epoch: 12   Global Step: 151680   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:52:11,363-Speed 3317.51 samples/sec   Loss 2.8168   LearningRate 0.0152   Epoch: 12   Global Step: 151690   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:52:14,548-Speed 3215.65 samples/sec   Loss 2.8336   LearningRate 0.0152   Epoch: 12   Global Step: 151700   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:52:17,646-Speed 3306.94 samples/sec   Loss 2.7873   LearningRate 0.0152   Epoch: 12   Global Step: 151710   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:52:20,703-Speed 3350.16 samples/sec   Loss 2.7423   LearningRate 0.0152   Epoch: 12   Global Step: 151720   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:52:23,862-Speed 3242.58 samples/sec   Loss 2.8730   LearningRate 0.0151   Epoch: 12   Global Step: 151730   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:52:26,993-Speed 3271.69 samples/sec   Loss 2.8236   LearningRate 0.0151   Epoch: 12   Global Step: 151740   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:52:30,096-Speed 3301.17 samples/sec   Loss 2.8298   LearningRate 0.0151   Epoch: 12   Global Step: 151750   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:52:33,181-Speed 3320.53 samples/sec   Loss 2.7870   LearningRate 0.0151   Epoch: 12   Global Step: 151760   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:52:36,356-Speed 3226.35 samples/sec   Loss 2.8319   LearningRate 0.0151   Epoch: 12   Global Step: 151770   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:52:39,469-Speed 3290.03 samples/sec   Loss 2.8168   LearningRate 0.0151   Epoch: 12   Global Step: 151780   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:52:42,560-Speed 3313.55 samples/sec   Loss 2.7873   LearningRate 0.0151   Epoch: 12   Global Step: 151790   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:52:45,638-Speed 3328.49 samples/sec   Loss 2.7881   LearningRate 0.0151   Epoch: 12   Global Step: 151800   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:52:48,738-Speed 3303.89 samples/sec   Loss 2.8609   LearningRate 0.0151   Epoch: 12   Global Step: 151810   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:52:51,916-Speed 3223.55 samples/sec   Loss 2.8688   LearningRate 0.0151   Epoch: 12   Global Step: 151820   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:52:55,002-Speed 3319.49 samples/sec   Loss 2.7685   LearningRate 0.0151   Epoch: 12   Global Step: 151830   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:52:58,060-Speed 3349.50 samples/sec   Loss 2.8403   LearningRate 0.0151   Epoch: 12   Global Step: 151840   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:53:01,230-Speed 3230.73 samples/sec   Loss 2.8537   LearningRate 0.0151   Epoch: 12   Global Step: 151850   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:53:04,351-Speed 3282.13 samples/sec   Loss 2.7659   LearningRate 0.0151   Epoch: 12   Global Step: 151860   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:53:07,435-Speed 3322.46 samples/sec   Loss 2.8397   LearningRate 0.0151   Epoch: 12   Global Step: 151870   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:53:10,547-Speed 3291.06 samples/sec   Loss 2.8588   LearningRate 0.0151   Epoch: 12   Global Step: 151880   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:53:13,627-Speed 3326.02 samples/sec   Loss 2.8311   LearningRate 0.0151   Epoch: 12   Global Step: 151890   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:53:16,700-Speed 3332.64 samples/sec   Loss 2.8452   LearningRate 0.0151   Epoch: 12   Global Step: 151900   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:53:19,775-Speed 3331.57 samples/sec   Loss 2.8033   LearningRate 0.0151   Epoch: 12   Global Step: 151910   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:53:22,842-Speed 3339.49 samples/sec   Loss 2.8459   LearningRate 0.0151   Epoch: 12   Global Step: 151920   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:53:25,964-Speed 3280.89 samples/sec   Loss 2.8802   LearningRate 0.0151   Epoch: 12   Global Step: 151930   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:53:29,066-Speed 3302.86 samples/sec   Loss 2.8591   LearningRate 0.0151   Epoch: 12   Global Step: 151940   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:53:32,163-Speed 3307.45 samples/sec   Loss 2.8708   LearningRate 0.0151   Epoch: 12   Global Step: 151950   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:53:35,311-Speed 3253.61 samples/sec   Loss 2.8916   LearningRate 0.0151   Epoch: 12   Global Step: 151960   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:53:38,405-Speed 3310.36 samples/sec   Loss 2.8598   LearningRate 0.0151   Epoch: 12   Global Step: 151970   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:53:41,483-Speed 3327.65 samples/sec   Loss 2.8656   LearningRate 0.0151   Epoch: 12   Global Step: 151980   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:53:44,545-Speed 3344.74 samples/sec   Loss 2.8375   LearningRate 0.0151   Epoch: 12   Global Step: 151990   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:53:47,626-Speed 3325.20 samples/sec   Loss 2.8375   LearningRate 0.0151   Epoch: 12   Global Step: 152000   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:53:50,733-Speed 3297.40 samples/sec   Loss 2.8227   LearningRate 0.0151   Epoch: 12   Global Step: 152010   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:53:53,885-Speed 3248.67 samples/sec   Loss 2.7921   LearningRate 0.0151   Epoch: 12   Global Step: 152020   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:53:56,965-Speed 3325.76 samples/sec   Loss 2.8187   LearningRate 0.0151   Epoch: 12   Global Step: 152030   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:54:00,092-Speed 3276.26 samples/sec   Loss 2.8135   LearningRate 0.0151   Epoch: 12   Global Step: 152040   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:54:03,139-Speed 3361.30 samples/sec   Loss 2.8861   LearningRate 0.0150   Epoch: 12   Global Step: 152050   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:54:06,306-Speed 3234.78 samples/sec   Loss 2.7861   LearningRate 0.0150   Epoch: 12   Global Step: 152060   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:54:09,369-Speed 3344.50 samples/sec   Loss 2.8452   LearningRate 0.0150   Epoch: 12   Global Step: 152070   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:54:12,526-Speed 3244.24 samples/sec   Loss 2.8664   LearningRate 0.0150   Epoch: 12   Global Step: 152080   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:54:15,702-Speed 3224.41 samples/sec   Loss 2.8529   LearningRate 0.0150   Epoch: 12   Global Step: 152090   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:54:18,816-Speed 3289.74 samples/sec   Loss 2.8739   LearningRate 0.0150   Epoch: 12   Global Step: 152100   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:54:21,914-Speed 3306.10 samples/sec   Loss 2.9168   LearningRate 0.0150   Epoch: 12   Global Step: 152110   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:54:24,984-Speed 3336.28 samples/sec   Loss 2.8630   LearningRate 0.0150   Epoch: 12   Global Step: 152120   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:54:28,093-Speed 3295.48 samples/sec   Loss 2.8410   LearningRate 0.0150   Epoch: 12   Global Step: 152130   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:54:31,265-Speed 3229.19 samples/sec   Loss 2.8154   LearningRate 0.0150   Epoch: 12   Global Step: 152140   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:54:34,364-Speed 3304.75 samples/sec   Loss 2.8646   LearningRate 0.0150   Epoch: 12   Global Step: 152150   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:54:37,514-Speed 3252.26 samples/sec   Loss 2.8917   LearningRate 0.0150   Epoch: 12   Global Step: 152160   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:54:40,653-Speed 3262.94 samples/sec   Loss 2.8929   LearningRate 0.0150   Epoch: 12   Global Step: 152170   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:54:43,724-Speed 3336.05 samples/sec   Loss 2.8105   LearningRate 0.0150   Epoch: 12   Global Step: 152180   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:54:46,831-Speed 3296.42 samples/sec   Loss 2.8633   LearningRate 0.0150   Epoch: 12   Global Step: 152190   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:54:49,897-Speed 3341.45 samples/sec   Loss 2.8343   LearningRate 0.0150   Epoch: 12   Global Step: 152200   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:54:53,008-Speed 3292.42 samples/sec   Loss 2.8224   LearningRate 0.0150   Epoch: 12   Global Step: 152210   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:54:56,083-Speed 3330.61 samples/sec   Loss 2.8445   LearningRate 0.0150   Epoch: 12   Global Step: 152220   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:54:59,168-Speed 3320.87 samples/sec   Loss 2.9045   LearningRate 0.0150   Epoch: 12   Global Step: 152230   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:55:02,285-Speed 3285.59 samples/sec   Loss 2.8623   LearningRate 0.0150   Epoch: 12   Global Step: 152240   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:55:05,374-Speed 3316.49 samples/sec   Loss 2.9289   LearningRate 0.0150   Epoch: 12   Global Step: 152250   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:55:08,491-Speed 3286.33 samples/sec   Loss 2.8978   LearningRate 0.0150   Epoch: 12   Global Step: 152260   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:55:11,575-Speed 3321.37 samples/sec   Loss 2.8032   LearningRate 0.0150   Epoch: 12   Global Step: 152270   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:55:14,675-Speed 3304.62 samples/sec   Loss 2.8512   LearningRate 0.0150   Epoch: 12   Global Step: 152280   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:55:17,845-Speed 3231.14 samples/sec   Loss 2.9156   LearningRate 0.0150   Epoch: 12   Global Step: 152290   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:55:20,964-Speed 3283.54 samples/sec   Loss 2.8940   LearningRate 0.0150   Epoch: 12   Global Step: 152300   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:55:24,111-Speed 3255.93 samples/sec   Loss 2.8199   LearningRate 0.0150   Epoch: 12   Global Step: 152310   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:55:27,225-Speed 3288.86 samples/sec   Loss 2.8588   LearningRate 0.0150   Epoch: 12   Global Step: 152320   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:55:30,300-Speed 3330.86 samples/sec   Loss 2.8145   LearningRate 0.0150   Epoch: 12   Global Step: 152330   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:55:33,394-Speed 3310.81 samples/sec   Loss 2.9065   LearningRate 0.0150   Epoch: 12   Global Step: 152340   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:55:36,480-Speed 3318.67 samples/sec   Loss 2.9017   LearningRate 0.0150   Epoch: 12   Global Step: 152350   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:55:39,589-Speed 3295.04 samples/sec   Loss 2.9460   LearningRate 0.0150   Epoch: 12   Global Step: 152360   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:55:42,749-Speed 3242.25 samples/sec   Loss 2.9261   LearningRate 0.0149   Epoch: 12   Global Step: 152370   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:55:45,872-Speed 3279.05 samples/sec   Loss 2.8994   LearningRate 0.0149   Epoch: 12   Global Step: 152380   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:55:49,062-Speed 3210.92 samples/sec   Loss 2.9056   LearningRate 0.0149   Epoch: 12   Global Step: 152390   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:55:52,202-Speed 3262.09 samples/sec   Loss 2.8751   LearningRate 0.0149   Epoch: 12   Global Step: 152400   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:55:55,287-Speed 3320.21 samples/sec   Loss 2.8892   LearningRate 0.0149   Epoch: 12   Global Step: 152410   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:55:58,362-Speed 3331.50 samples/sec   Loss 2.8440   LearningRate 0.0149   Epoch: 12   Global Step: 152420   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:56:01,508-Speed 3255.92 samples/sec   Loss 2.8760   LearningRate 0.0149   Epoch: 12   Global Step: 152430   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:56:04,695-Speed 3213.94 samples/sec   Loss 2.8338   LearningRate 0.0149   Epoch: 12   Global Step: 152440   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:56:08,375-Speed 2783.67 samples/sec   Loss 2.8552   LearningRate 0.0149   Epoch: 12   Global Step: 152450   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:56:11,483-Speed 3294.84 samples/sec   Loss 2.8585   LearningRate 0.0149   Epoch: 12   Global Step: 152460   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:56:14,654-Speed 3230.81 samples/sec   Loss 2.8308   LearningRate 0.0149   Epoch: 12   Global Step: 152470   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:56:17,715-Speed 3346.49 samples/sec   Loss 2.8852   LearningRate 0.0149   Epoch: 12   Global Step: 152480   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:56:20,836-Speed 3281.36 samples/sec   Loss 2.8870   LearningRate 0.0149   Epoch: 12   Global Step: 152490   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:56:23,964-Speed 3274.73 samples/sec   Loss 2.9026   LearningRate 0.0149   Epoch: 12   Global Step: 152500   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:56:27,163-Speed 3202.08 samples/sec   Loss 2.9009   LearningRate 0.0149   Epoch: 12   Global Step: 152510   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:56:30,265-Speed 3302.48 samples/sec   Loss 2.9002   LearningRate 0.0149   Epoch: 12   Global Step: 152520   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:56:33,334-Speed 3336.60 samples/sec   Loss 2.8216   LearningRate 0.0149   Epoch: 12   Global Step: 152530   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:56:36,484-Speed 3251.69 samples/sec   Loss 2.9389   LearningRate 0.0149   Epoch: 12   Global Step: 152540   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:56:39,588-Speed 3300.20 samples/sec   Loss 2.9181   LearningRate 0.0149   Epoch: 12   Global Step: 152550   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:56:42,742-Speed 3247.75 samples/sec   Loss 2.9402   LearningRate 0.0149   Epoch: 12   Global Step: 152560   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:56:45,850-Speed 3296.23 samples/sec   Loss 2.9247   LearningRate 0.0149   Epoch: 12   Global Step: 152570   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:56:48,966-Speed 3287.17 samples/sec   Loss 2.9352   LearningRate 0.0149   Epoch: 12   Global Step: 152580   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:56:52,135-Speed 3231.99 samples/sec   Loss 2.8381   LearningRate 0.0149   Epoch: 12   Global Step: 152590   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:56:55,241-Speed 3297.36 samples/sec   Loss 2.8303   LearningRate 0.0149   Epoch: 12   Global Step: 152600   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:56:58,384-Speed 3259.42 samples/sec   Loss 2.9340   LearningRate 0.0149   Epoch: 12   Global Step: 152610   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:57:01,500-Speed 3287.35 samples/sec   Loss 2.9464   LearningRate 0.0149   Epoch: 12   Global Step: 152620   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:57:04,676-Speed 3225.18 samples/sec   Loss 2.7991   LearningRate 0.0149   Epoch: 12   Global Step: 152630   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:57:07,777-Speed 3302.95 samples/sec   Loss 2.8849   LearningRate 0.0149   Epoch: 12   Global Step: 152640   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:57:10,874-Speed 3307.72 samples/sec   Loss 2.9127   LearningRate 0.0149   Epoch: 12   Global Step: 152650   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 14:57:14,001-Speed 3275.06 samples/sec   Loss 2.9224   LearningRate 0.0149   Epoch: 12   Global Step: 152660   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 14:57:17,143-Speed 3261.16 samples/sec   Loss 2.8330   LearningRate 0.0149   Epoch: 12   Global Step: 152670   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 14:57:20,219-Speed 3329.36 samples/sec   Loss 2.8876   LearningRate 0.0149   Epoch: 12   Global Step: 152680   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 14:57:23,300-Speed 3325.17 samples/sec   Loss 2.8786   LearningRate 0.0148   Epoch: 12   Global Step: 152690   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 14:57:26,454-Speed 3247.47 samples/sec   Loss 2.9300   LearningRate 0.0148   Epoch: 12   Global Step: 152700   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 14:57:29,633-Speed 3222.08 samples/sec   Loss 2.9057   LearningRate 0.0148   Epoch: 12   Global Step: 152710   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 14:57:32,781-Speed 3254.09 samples/sec   Loss 2.8804   LearningRate 0.0148   Epoch: 12   Global Step: 152720   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 14:57:35,955-Speed 3227.46 samples/sec   Loss 2.9216   LearningRate 0.0148   Epoch: 12   Global Step: 152730   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 14:57:39,071-Speed 3286.60 samples/sec   Loss 2.8949   LearningRate 0.0148   Epoch: 12   Global Step: 152740   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 14:57:42,830-Speed 2725.14 samples/sec   Loss 2.8981   LearningRate 0.0148   Epoch: 12   Global Step: 152750   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:57:45,921-Speed 3314.54 samples/sec   Loss 2.9200   LearningRate 0.0148   Epoch: 12   Global Step: 152760   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:57:49,152-Speed 3170.04 samples/sec   Loss 2.8974   LearningRate 0.0148   Epoch: 12   Global Step: 152770   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:57:53,583-Speed 2311.31 samples/sec   Loss 2.8456   LearningRate 0.0148   Epoch: 12   Global Step: 152780   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:57:56,661-Speed 3327.94 samples/sec   Loss 2.9114   LearningRate 0.0148   Epoch: 12   Global Step: 152790   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 14:57:59,766-Speed 3299.02 samples/sec   Loss 2.9025   LearningRate 0.0148   Epoch: 12   Global Step: 152800   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 14:58:02,903-Speed 3265.28 samples/sec   Loss 2.9040   LearningRate 0.0148   Epoch: 12   Global Step: 152810   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 14:58:06,020-Speed 3286.63 samples/sec   Loss 2.8727   LearningRate 0.0148   Epoch: 12   Global Step: 152820   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 14:58:09,132-Speed 3291.02 samples/sec   Loss 2.8479   LearningRate 0.0148   Epoch: 12   Global Step: 152830   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 14:58:12,293-Speed 3240.89 samples/sec   Loss 2.9218   LearningRate 0.0148   Epoch: 12   Global Step: 152840   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 14:58:15,437-Speed 3258.45 samples/sec   Loss 2.8570   LearningRate 0.0148   Epoch: 12   Global Step: 152850   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 14:58:18,528-Speed 3313.17 samples/sec   Loss 2.9177   LearningRate 0.0148   Epoch: 12   Global Step: 152860   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 14:58:21,634-Speed 3297.34 samples/sec   Loss 2.9247   LearningRate 0.0148   Epoch: 12   Global Step: 152870   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 14:58:24,722-Speed 3318.14 samples/sec   Loss 2.9624   LearningRate 0.0148   Epoch: 12   Global Step: 152880   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 14:58:27,845-Speed 3279.14 samples/sec   Loss 2.9230   LearningRate 0.0148   Epoch: 12   Global Step: 152890   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:58:31,065-Speed 3181.75 samples/sec   Loss 2.9267   LearningRate 0.0148   Epoch: 12   Global Step: 152900   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:58:34,140-Speed 3331.45 samples/sec   Loss 2.9326   LearningRate 0.0148   Epoch: 12   Global Step: 152910   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:58:37,361-Speed 3180.15 samples/sec   Loss 2.8416   LearningRate 0.0148   Epoch: 12   Global Step: 152920   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:58:40,480-Speed 3283.34 samples/sec   Loss 2.8491   LearningRate 0.0148   Epoch: 12   Global Step: 152930   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:58:43,649-Speed 3232.21 samples/sec   Loss 2.9422   LearningRate 0.0148   Epoch: 12   Global Step: 152940   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:58:46,741-Speed 3313.08 samples/sec   Loss 2.9659   LearningRate 0.0148   Epoch: 12   Global Step: 152950   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:58:49,843-Speed 3302.55 samples/sec   Loss 2.8722   LearningRate 0.0148   Epoch: 12   Global Step: 152960   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:58:52,956-Speed 3290.60 samples/sec   Loss 2.9684   LearningRate 0.0148   Epoch: 12   Global Step: 152970   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:58:56,063-Speed 3296.92 samples/sec   Loss 2.9615   LearningRate 0.0148   Epoch: 12   Global Step: 152980   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:58:59,192-Speed 3273.81 samples/sec   Loss 2.9545   LearningRate 0.0148   Epoch: 12   Global Step: 152990   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:59:02,330-Speed 3263.86 samples/sec   Loss 2.9204   LearningRate 0.0148   Epoch: 12   Global Step: 153000   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:59:05,463-Speed 3269.43 samples/sec   Loss 2.9271   LearningRate 0.0148   Epoch: 12   Global Step: 153010   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:59:08,537-Speed 3331.73 samples/sec   Loss 2.9418   LearningRate 0.0147   Epoch: 12   Global Step: 153020   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:59:11,626-Speed 3316.19 samples/sec   Loss 2.9584   LearningRate 0.0147   Epoch: 12   Global Step: 153030   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:59:14,776-Speed 3251.90 samples/sec   Loss 2.8915   LearningRate 0.0147   Epoch: 12   Global Step: 153040   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:59:17,878-Speed 3302.47 samples/sec   Loss 2.9275   LearningRate 0.0147   Epoch: 12   Global Step: 153050   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:59:20,965-Speed 3318.37 samples/sec   Loss 2.9173   LearningRate 0.0147   Epoch: 12   Global Step: 153060   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:59:24,049-Speed 3321.40 samples/sec   Loss 2.9011   LearningRate 0.0147   Epoch: 12   Global Step: 153070   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:59:27,195-Speed 3255.74 samples/sec   Loss 2.9432   LearningRate 0.0147   Epoch: 12   Global Step: 153080   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:59:30,326-Speed 3270.96 samples/sec   Loss 2.8278   LearningRate 0.0147   Epoch: 12   Global Step: 153090   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:59:33,403-Speed 3329.34 samples/sec   Loss 2.8420   LearningRate 0.0147   Epoch: 12   Global Step: 153100   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:59:36,566-Speed 3238.96 samples/sec   Loss 2.9706   LearningRate 0.0147   Epoch: 12   Global Step: 153110   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:59:39,737-Speed 3229.54 samples/sec   Loss 2.9533   LearningRate 0.0147   Epoch: 12   Global Step: 153120   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:59:42,837-Speed 3304.35 samples/sec   Loss 2.9831   LearningRate 0.0147   Epoch: 12   Global Step: 153130   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:59:45,938-Speed 3304.21 samples/sec   Loss 2.9093   LearningRate 0.0147   Epoch: 12   Global Step: 153140   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:59:49,138-Speed 3200.28 samples/sec   Loss 2.9607   LearningRate 0.0147   Epoch: 12   Global Step: 153150   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:59:52,259-Speed 3282.59 samples/sec   Loss 2.9433   LearningRate 0.0147   Epoch: 12   Global Step: 153160   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 14:59:55,326-Speed 3339.33 samples/sec   Loss 2.9568   LearningRate 0.0147   Epoch: 12   Global Step: 153170   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 14:59:58,422-Speed 3308.78 samples/sec   Loss 2.9759   LearningRate 0.0147   Epoch: 12   Global Step: 153180   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:00:01,589-Speed 3234.94 samples/sec   Loss 2.9355   LearningRate 0.0147   Epoch: 12   Global Step: 153190   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:00:04,821-Speed 3169.76 samples/sec   Loss 2.9075   LearningRate 0.0147   Epoch: 12   Global Step: 153200   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:00:07,927-Speed 3298.02 samples/sec   Loss 2.9691   LearningRate 0.0147   Epoch: 12   Global Step: 153210   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:00:10,987-Speed 3347.40 samples/sec   Loss 2.8259   LearningRate 0.0147   Epoch: 12   Global Step: 153220   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:00:14,163-Speed 3225.51 samples/sec   Loss 2.9224   LearningRate 0.0147   Epoch: 12   Global Step: 153230   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:00:17,316-Speed 3248.57 samples/sec   Loss 2.8922   LearningRate 0.0147   Epoch: 12   Global Step: 153240   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:00:20,470-Speed 3247.42 samples/sec   Loss 2.9800   LearningRate 0.0147   Epoch: 12   Global Step: 153250   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:00:23,657-Speed 3214.20 samples/sec   Loss 2.9169   LearningRate 0.0147   Epoch: 12   Global Step: 153260   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:00:26,780-Speed 3280.97 samples/sec   Loss 2.9580   LearningRate 0.0147   Epoch: 12   Global Step: 153270   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:00:29,920-Speed 3261.73 samples/sec   Loss 2.9266   LearningRate 0.0147   Epoch: 12   Global Step: 153280   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:00:33,034-Speed 3289.31 samples/sec   Loss 2.9285   LearningRate 0.0147   Epoch: 12   Global Step: 153290   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:00:36,199-Speed 3235.61 samples/sec   Loss 2.9152   LearningRate 0.0147   Epoch: 12   Global Step: 153300   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:00:39,327-Speed 3275.42 samples/sec   Loss 2.8791   LearningRate 0.0147   Epoch: 12   Global Step: 153310   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:00:42,424-Speed 3307.10 samples/sec   Loss 2.9532   LearningRate 0.0147   Epoch: 12   Global Step: 153320   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:00:45,492-Speed 3338.96 samples/sec   Loss 2.8396   LearningRate 0.0147   Epoch: 12   Global Step: 153330   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:00:48,641-Speed 3252.75 samples/sec   Loss 2.9231   LearningRate 0.0146   Epoch: 12   Global Step: 153340   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:00:51,764-Speed 3280.01 samples/sec   Loss 2.9754   LearningRate 0.0146   Epoch: 12   Global Step: 153350   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:00:54,890-Speed 3276.67 samples/sec   Loss 2.9531   LearningRate 0.0146   Epoch: 12   Global Step: 153360   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:00:57,943-Speed 3355.74 samples/sec   Loss 2.9948   LearningRate 0.0146   Epoch: 12   Global Step: 153370   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:01:01,095-Speed 3248.79 samples/sec   Loss 2.8603   LearningRate 0.0146   Epoch: 12   Global Step: 153380   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:01:04,302-Speed 3194.03 samples/sec   Loss 2.9355   LearningRate 0.0146   Epoch: 12   Global Step: 153390   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:01:07,436-Speed 3268.26 samples/sec   Loss 3.0195   LearningRate 0.0146   Epoch: 12   Global Step: 153400   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:01:10,549-Speed 3290.73 samples/sec   Loss 2.9797   LearningRate 0.0146   Epoch: 12   Global Step: 153410   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:01:13,685-Speed 3266.79 samples/sec   Loss 2.8722   LearningRate 0.0146   Epoch: 12   Global Step: 153420   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:01:16,783-Speed 3305.66 samples/sec   Loss 2.9318   LearningRate 0.0146   Epoch: 12   Global Step: 153430   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:01:19,926-Speed 3259.80 samples/sec   Loss 2.9932   LearningRate 0.0146   Epoch: 12   Global Step: 153440   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:01:23,033-Speed 3296.87 samples/sec   Loss 2.9609   LearningRate 0.0146   Epoch: 12   Global Step: 153450   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:01:26,142-Speed 3294.67 samples/sec   Loss 2.9183   LearningRate 0.0146   Epoch: 12   Global Step: 153460   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:01:29,270-Speed 3273.89 samples/sec   Loss 2.8915   LearningRate 0.0146   Epoch: 12   Global Step: 153470   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:01:32,411-Speed 3262.07 samples/sec   Loss 3.0308   LearningRate 0.0146   Epoch: 12   Global Step: 153480   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:01:35,612-Speed 3199.16 samples/sec   Loss 2.9090   LearningRate 0.0146   Epoch: 12   Global Step: 153490   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 15:01:38,733-Speed 3282.14 samples/sec   Loss 2.9843   LearningRate 0.0146   Epoch: 12   Global Step: 153500   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:01:41,855-Speed 3281.51 samples/sec   Loss 2.9186   LearningRate 0.0146   Epoch: 12   Global Step: 153510   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:01:44,997-Speed 3260.00 samples/sec   Loss 3.0105   LearningRate 0.0146   Epoch: 12   Global Step: 153520   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:01:48,072-Speed 3330.84 samples/sec   Loss 2.9865   LearningRate 0.0146   Epoch: 12   Global Step: 153530   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:01:51,210-Speed 3263.82 samples/sec   Loss 2.9201   LearningRate 0.0146   Epoch: 12   Global Step: 153540   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:01:54,319-Speed 3295.17 samples/sec   Loss 2.9301   LearningRate 0.0146   Epoch: 12   Global Step: 153550   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:01:57,375-Speed 3351.81 samples/sec   Loss 2.8869   LearningRate 0.0146   Epoch: 12   Global Step: 153560   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:02:00,469-Speed 3311.62 samples/sec   Loss 2.9805   LearningRate 0.0146   Epoch: 12   Global Step: 153570   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:02:03,528-Speed 3348.20 samples/sec   Loss 2.8514   LearningRate 0.0146   Epoch: 12   Global Step: 153580   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:02:06,637-Speed 3294.37 samples/sec   Loss 2.9823   LearningRate 0.0146   Epoch: 12   Global Step: 153590   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:02:09,716-Speed 3326.95 samples/sec   Loss 2.9294   LearningRate 0.0146   Epoch: 12   Global Step: 153600   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:02:12,846-Speed 3272.43 samples/sec   Loss 2.9485   LearningRate 0.0146   Epoch: 12   Global Step: 153610   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:02:15,925-Speed 3326.78 samples/sec   Loss 2.8968   LearningRate 0.0146   Epoch: 12   Global Step: 153620   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:02:19,056-Speed 3272.13 samples/sec   Loss 2.9100   LearningRate 0.0146   Epoch: 12   Global Step: 153630   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:02:22,130-Speed 3331.69 samples/sec   Loss 2.9403   LearningRate 0.0146   Epoch: 12   Global Step: 153640   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:02:25,292-Speed 3239.25 samples/sec   Loss 2.9585   LearningRate 0.0146   Epoch: 12   Global Step: 153650   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:02:28,375-Speed 3323.16 samples/sec   Loss 2.9125   LearningRate 0.0146   Epoch: 12   Global Step: 153660   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:02:31,471-Speed 3307.65 samples/sec   Loss 3.0047   LearningRate 0.0145   Epoch: 12   Global Step: 153670   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:02:34,550-Speed 3326.94 samples/sec   Loss 2.9273   LearningRate 0.0145   Epoch: 12   Global Step: 153680   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:02:37,701-Speed 3250.90 samples/sec   Loss 2.9971   LearningRate 0.0145   Epoch: 12   Global Step: 153690   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:02:40,889-Speed 3212.89 samples/sec   Loss 2.9366   LearningRate 0.0145   Epoch: 12   Global Step: 153700   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:02:43,968-Speed 3326.73 samples/sec   Loss 2.9213   LearningRate 0.0145   Epoch: 12   Global Step: 153710   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:02:47,066-Speed 3306.55 samples/sec   Loss 2.9635   LearningRate 0.0145   Epoch: 12   Global Step: 153720   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:02:50,176-Speed 3294.11 samples/sec   Loss 3.0042   LearningRate 0.0145   Epoch: 12   Global Step: 153730   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:02:53,331-Speed 3246.42 samples/sec   Loss 2.9675   LearningRate 0.0145   Epoch: 12   Global Step: 153740   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:02:56,457-Speed 3276.77 samples/sec   Loss 2.9444   LearningRate 0.0145   Epoch: 12   Global Step: 153750   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:02:59,534-Speed 3329.44 samples/sec   Loss 2.9039   LearningRate 0.0145   Epoch: 12   Global Step: 153760   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:03:02,639-Speed 3298.28 samples/sec   Loss 2.9209   LearningRate 0.0145   Epoch: 12   Global Step: 153770   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:03:05,705-Speed 3340.82 samples/sec   Loss 2.8491   LearningRate 0.0145   Epoch: 12   Global Step: 153780   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:03:08,808-Speed 3301.88 samples/sec   Loss 2.9485   LearningRate 0.0145   Epoch: 12   Global Step: 153790   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:03:11,932-Speed 3278.30 samples/sec   Loss 3.0031   LearningRate 0.0145   Epoch: 12   Global Step: 153800   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:03:15,075-Speed 3259.46 samples/sec   Loss 2.9722   LearningRate 0.0145   Epoch: 12   Global Step: 153810   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:03:18,130-Speed 3353.01 samples/sec   Loss 2.9764   LearningRate 0.0145   Epoch: 12   Global Step: 153820   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:03:21,177-Speed 3361.30 samples/sec   Loss 2.9410   LearningRate 0.0145   Epoch: 12   Global Step: 153830   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:03:24,242-Speed 3342.14 samples/sec   Loss 2.9326   LearningRate 0.0145   Epoch: 12   Global Step: 153840   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:03:27,348-Speed 3297.84 samples/sec   Loss 3.0026   LearningRate 0.0145   Epoch: 12   Global Step: 153850   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:03:30,496-Speed 3254.15 samples/sec   Loss 3.0166   LearningRate 0.0145   Epoch: 12   Global Step: 153860   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:03:33,663-Speed 3234.59 samples/sec   Loss 2.9555   LearningRate 0.0145   Epoch: 12   Global Step: 153870   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:03:36,832-Speed 3232.58 samples/sec   Loss 2.9790   LearningRate 0.0145   Epoch: 12   Global Step: 153880   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:03:39,948-Speed 3287.14 samples/sec   Loss 2.8587   LearningRate 0.0145   Epoch: 12   Global Step: 153890   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:03:43,084-Speed 3265.62 samples/sec   Loss 2.9901   LearningRate 0.0145   Epoch: 12   Global Step: 153900   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:03:46,157-Speed 3333.69 samples/sec   Loss 2.9952   LearningRate 0.0145   Epoch: 12   Global Step: 153910   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:03:49,256-Speed 3305.74 samples/sec   Loss 3.0375   LearningRate 0.0145   Epoch: 12   Global Step: 153920   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:03:52,385-Speed 3273.42 samples/sec   Loss 2.9296   LearningRate 0.0145   Epoch: 12   Global Step: 153930   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:03:55,562-Speed 3223.92 samples/sec   Loss 2.9149   LearningRate 0.0145   Epoch: 12   Global Step: 153940   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:03:58,652-Speed 3315.62 samples/sec   Loss 2.9767   LearningRate 0.0145   Epoch: 12   Global Step: 153950   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:04:01,823-Speed 3229.43 samples/sec   Loss 2.9495   LearningRate 0.0145   Epoch: 12   Global Step: 153960   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:04:04,911-Speed 3317.23 samples/sec   Loss 2.9632   LearningRate 0.0145   Epoch: 12   Global Step: 153970   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:04:07,996-Speed 3320.58 samples/sec   Loss 2.9969   LearningRate 0.0145   Epoch: 12   Global Step: 153980   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:04:11,145-Speed 3253.27 samples/sec   Loss 2.9558   LearningRate 0.0144   Epoch: 12   Global Step: 153990   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:04:14,220-Speed 3330.43 samples/sec   Loss 2.9930   LearningRate 0.0144   Epoch: 12   Global Step: 154000   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:04:17,390-Speed 3231.40 samples/sec   Loss 2.9950   LearningRate 0.0144   Epoch: 12   Global Step: 154010   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:04:20,463-Speed 3333.76 samples/sec   Loss 3.0445   LearningRate 0.0144   Epoch: 12   Global Step: 154020   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:04:23,596-Speed 3269.01 samples/sec   Loss 2.9589   LearningRate 0.0144   Epoch: 12   Global Step: 154030   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:04:26,719-Speed 3280.39 samples/sec   Loss 2.9654   LearningRate 0.0144   Epoch: 12   Global Step: 154040   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:04:29,850-Speed 3271.54 samples/sec   Loss 2.9649   LearningRate 0.0144   Epoch: 12   Global Step: 154050   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:04:32,928-Speed 3327.90 samples/sec   Loss 3.0015   LearningRate 0.0144   Epoch: 12   Global Step: 154060   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:04:36,010-Speed 3323.97 samples/sec   Loss 3.0082   LearningRate 0.0144   Epoch: 12   Global Step: 154070   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:04:39,124-Speed 3288.90 samples/sec   Loss 2.9804   LearningRate 0.0144   Epoch: 12   Global Step: 154080   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:04:42,291-Speed 3234.10 samples/sec   Loss 2.9858   LearningRate 0.0144   Epoch: 12   Global Step: 154090   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:04:45,393-Speed 3302.83 samples/sec   Loss 2.9917   LearningRate 0.0144   Epoch: 12   Global Step: 154100   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:04:48,464-Speed 3334.79 samples/sec   Loss 2.9992   LearningRate 0.0144   Epoch: 12   Global Step: 154110   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:04:51,536-Speed 3335.22 samples/sec   Loss 2.9562   LearningRate 0.0144   Epoch: 12   Global Step: 154120   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:04:54,695-Speed 3242.40 samples/sec   Loss 3.0040   LearningRate 0.0144   Epoch: 12   Global Step: 154130   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:04:57,772-Speed 3328.52 samples/sec   Loss 2.8458   LearningRate 0.0144   Epoch: 12   Global Step: 154140   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:05:00,904-Speed 3270.52 samples/sec   Loss 2.9450   LearningRate 0.0144   Epoch: 12   Global Step: 154150   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:05:04,011-Speed 3296.59 samples/sec   Loss 2.9397   LearningRate 0.0144   Epoch: 12   Global Step: 154160   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:05:07,167-Speed 3246.42 samples/sec   Loss 2.9556   LearningRate 0.0144   Epoch: 12   Global Step: 154170   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 15:05:10,231-Speed 3343.06 samples/sec   Loss 3.0105   LearningRate 0.0144   Epoch: 12   Global Step: 154180   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:05:13,317-Speed 3318.82 samples/sec   Loss 2.9266   LearningRate 0.0144   Epoch: 12   Global Step: 154190   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:05:16,479-Speed 3239.74 samples/sec   Loss 2.9310   LearningRate 0.0144   Epoch: 12   Global Step: 154200   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:05:19,551-Speed 3334.68 samples/sec   Loss 2.9808   LearningRate 0.0144   Epoch: 12   Global Step: 154210   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:05:22,626-Speed 3330.69 samples/sec   Loss 2.9912   LearningRate 0.0144   Epoch: 12   Global Step: 154220   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:05:25,758-Speed 3271.04 samples/sec   Loss 2.9737   LearningRate 0.0144   Epoch: 12   Global Step: 154230   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:05:28,890-Speed 3269.62 samples/sec   Loss 3.0587   LearningRate 0.0144   Epoch: 12   Global Step: 154240   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:05:31,998-Speed 3296.07 samples/sec   Loss 2.9346   LearningRate 0.0144   Epoch: 12   Global Step: 154250   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:05:35,070-Speed 3333.97 samples/sec   Loss 2.9805   LearningRate 0.0144   Epoch: 12   Global Step: 154260   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:05:38,197-Speed 3276.79 samples/sec   Loss 2.9464   LearningRate 0.0144   Epoch: 12   Global Step: 154270   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:05:41,307-Speed 3293.70 samples/sec   Loss 2.9616   LearningRate 0.0144   Epoch: 12   Global Step: 154280   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:05:44,420-Speed 3289.99 samples/sec   Loss 2.9312   LearningRate 0.0144   Epoch: 12   Global Step: 154290   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:05:47,495-Speed 3331.81 samples/sec   Loss 3.0491   LearningRate 0.0144   Epoch: 12   Global Step: 154300   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:05:50,627-Speed 3269.56 samples/sec   Loss 2.9908   LearningRate 0.0144   Epoch: 12   Global Step: 154310   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:05:53,704-Speed 3330.11 samples/sec   Loss 2.9494   LearningRate 0.0143   Epoch: 12   Global Step: 154320   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:05:56,766-Speed 3344.90 samples/sec   Loss 3.0089   LearningRate 0.0143   Epoch: 12   Global Step: 154330   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:05:59,922-Speed 3244.91 samples/sec   Loss 2.9803   LearningRate 0.0143   Epoch: 12   Global Step: 154340   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:06:03,140-Speed 3183.60 samples/sec   Loss 3.0495   LearningRate 0.0143   Epoch: 12   Global Step: 154350   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:06:06,315-Speed 3225.93 samples/sec   Loss 3.0155   LearningRate 0.0143   Epoch: 12   Global Step: 154360   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:06:09,387-Speed 3334.60 samples/sec   Loss 2.8796   LearningRate 0.0143   Epoch: 12   Global Step: 154370   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:06:12,499-Speed 3291.48 samples/sec   Loss 3.0509   LearningRate 0.0143   Epoch: 12   Global Step: 154380   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:06:15,565-Speed 3341.52 samples/sec   Loss 2.9178   LearningRate 0.0143   Epoch: 12   Global Step: 154390   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:06:18,614-Speed 3359.20 samples/sec   Loss 2.9908   LearningRate 0.0143   Epoch: 12   Global Step: 154400   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:06:21,681-Speed 3339.05 samples/sec   Loss 2.9607   LearningRate 0.0143   Epoch: 12   Global Step: 154410   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:06:24,752-Speed 3335.76 samples/sec   Loss 3.0076   LearningRate 0.0143   Epoch: 12   Global Step: 154420   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:06:27,825-Speed 3333.16 samples/sec   Loss 2.9919   LearningRate 0.0143   Epoch: 12   Global Step: 154430   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:06:31,002-Speed 3224.49 samples/sec   Loss 2.9217   LearningRate 0.0143   Epoch: 12   Global Step: 154440   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:06:34,090-Speed 3316.87 samples/sec   Loss 2.9793   LearningRate 0.0143   Epoch: 12   Global Step: 154450   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:06:37,166-Speed 3330.58 samples/sec   Loss 2.9411   LearningRate 0.0143   Epoch: 12   Global Step: 154460   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:06:40,287-Speed 3281.78 samples/sec   Loss 2.9963   LearningRate 0.0143   Epoch: 12   Global Step: 154470   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:06:43,462-Speed 3225.97 samples/sec   Loss 2.9615   LearningRate 0.0143   Epoch: 12   Global Step: 154480   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:06:46,569-Speed 3296.82 samples/sec   Loss 2.9085   LearningRate 0.0143   Epoch: 12   Global Step: 154490   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:06:49,742-Speed 3228.29 samples/sec   Loss 2.9251   LearningRate 0.0143   Epoch: 12   Global Step: 154500   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:06:52,866-Speed 3279.11 samples/sec   Loss 2.9950   LearningRate 0.0143   Epoch: 12   Global Step: 154510   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:06:55,947-Speed 3325.04 samples/sec   Loss 3.0188   LearningRate 0.0143   Epoch: 12   Global Step: 154520   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:06:59,021-Speed 3332.05 samples/sec   Loss 2.9529   LearningRate 0.0143   Epoch: 12   Global Step: 154530   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:07:02,099-Speed 3328.15 samples/sec   Loss 2.9359   LearningRate 0.0143   Epoch: 12   Global Step: 154540   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:07:05,235-Speed 3266.28 samples/sec   Loss 2.9624   LearningRate 0.0143   Epoch: 12   Global Step: 154550   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:07:08,357-Speed 3281.04 samples/sec   Loss 3.0264   LearningRate 0.0143   Epoch: 12   Global Step: 154560   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:07:11,533-Speed 3225.39 samples/sec   Loss 2.9542   LearningRate 0.0143   Epoch: 12   Global Step: 154570   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:07:14,695-Speed 3239.53 samples/sec   Loss 3.0073   LearningRate 0.0143   Epoch: 12   Global Step: 154580   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:07:17,786-Speed 3313.16 samples/sec   Loss 2.9730   LearningRate 0.0143   Epoch: 12   Global Step: 154590   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:07:20,836-Speed 3359.32 samples/sec   Loss 3.0456   LearningRate 0.0143   Epoch: 12   Global Step: 154600   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:07:23,962-Speed 3276.42 samples/sec   Loss 2.9225   LearningRate 0.0143   Epoch: 12   Global Step: 154610   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:07:27,130-Speed 3232.88 samples/sec   Loss 3.0799   LearningRate 0.0143   Epoch: 12   Global Step: 154620   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:07:30,278-Speed 3254.01 samples/sec   Loss 2.9830   LearningRate 0.0143   Epoch: 12   Global Step: 154630   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:07:33,392-Speed 3289.23 samples/sec   Loss 2.9991   LearningRate 0.0143   Epoch: 12   Global Step: 154640   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:07:36,510-Speed 3285.56 samples/sec   Loss 2.9239   LearningRate 0.0142   Epoch: 12   Global Step: 154650   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:07:39,592-Speed 3324.15 samples/sec   Loss 2.9790   LearningRate 0.0142   Epoch: 12   Global Step: 154660   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:07:42,656-Speed 3342.55 samples/sec   Loss 2.9773   LearningRate 0.0142   Epoch: 12   Global Step: 154670   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:07:45,719-Speed 3344.32 samples/sec   Loss 2.9500   LearningRate 0.0142   Epoch: 12   Global Step: 154680   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 15:07:48,848-Speed 3274.31 samples/sec   Loss 3.0198   LearningRate 0.0142   Epoch: 12   Global Step: 154690   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:07:52,028-Speed 3221.52 samples/sec   Loss 3.0230   LearningRate 0.0142   Epoch: 12   Global Step: 154700   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:07:55,142-Speed 3290.38 samples/sec   Loss 2.9745   LearningRate 0.0142   Epoch: 12   Global Step: 154710   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:07:58,243-Speed 3302.77 samples/sec   Loss 2.9453   LearningRate 0.0142   Epoch: 12   Global Step: 154720   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:08:01,327-Speed 3321.50 samples/sec   Loss 2.9840   LearningRate 0.0142   Epoch: 12   Global Step: 154730   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:08:04,430-Speed 3300.85 samples/sec   Loss 2.9743   LearningRate 0.0142   Epoch: 12   Global Step: 154740   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:08:07,567-Speed 3265.90 samples/sec   Loss 2.9667   LearningRate 0.0142   Epoch: 12   Global Step: 154750   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:08:10,671-Speed 3299.42 samples/sec   Loss 3.1305   LearningRate 0.0142   Epoch: 12   Global Step: 154760   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:08:13,790-Speed 3284.26 samples/sec   Loss 2.9557   LearningRate 0.0142   Epoch: 12   Global Step: 154770   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:08:16,931-Speed 3261.72 samples/sec   Loss 3.0122   LearningRate 0.0142   Epoch: 12   Global Step: 154780   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:08:19,957-Speed 3384.51 samples/sec   Loss 2.9701   LearningRate 0.0142   Epoch: 12   Global Step: 154790   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:08:23,043-Speed 3319.39 samples/sec   Loss 2.9412   LearningRate 0.0142   Epoch: 12   Global Step: 154800   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:08:26,108-Speed 3342.01 samples/sec   Loss 2.9527   LearningRate 0.0142   Epoch: 12   Global Step: 154810   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:08:29,254-Speed 3255.53 samples/sec   Loss 2.9865   LearningRate 0.0142   Epoch: 12   Global Step: 154820   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:08:32,327-Speed 3333.18 samples/sec   Loss 2.9437   LearningRate 0.0142   Epoch: 12   Global Step: 154830   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:08:35,439-Speed 3291.66 samples/sec   Loss 2.9572   LearningRate 0.0142   Epoch: 12   Global Step: 154840   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:08:38,576-Speed 3265.83 samples/sec   Loss 3.0052   LearningRate 0.0142   Epoch: 12   Global Step: 154850   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:08:41,775-Speed 3201.89 samples/sec   Loss 2.9817   LearningRate 0.0142   Epoch: 12   Global Step: 154860   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:08:44,856-Speed 3324.10 samples/sec   Loss 3.0127   LearningRate 0.0142   Epoch: 12   Global Step: 154870   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:08:47,952-Speed 3309.04 samples/sec   Loss 2.9882   LearningRate 0.0142   Epoch: 12   Global Step: 154880   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:08:51,138-Speed 3214.49 samples/sec   Loss 2.9687   LearningRate 0.0142   Epoch: 12   Global Step: 154890   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:08:54,337-Speed 3202.02 samples/sec   Loss 3.0027   LearningRate 0.0142   Epoch: 12   Global Step: 154900   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:08:57,441-Speed 3300.11 samples/sec   Loss 2.9884   LearningRate 0.0142   Epoch: 12   Global Step: 154910   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:09:00,554-Speed 3290.62 samples/sec   Loss 3.0310   LearningRate 0.0142   Epoch: 12   Global Step: 154920   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:09:03,675-Speed 3281.45 samples/sec   Loss 2.9620   LearningRate 0.0142   Epoch: 12   Global Step: 154930   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:09:06,852-Speed 3224.74 samples/sec   Loss 2.9438   LearningRate 0.0142   Epoch: 12   Global Step: 154940   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:09:09,949-Speed 3306.63 samples/sec   Loss 3.0027   LearningRate 0.0142   Epoch: 12   Global Step: 154950   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:09:13,109-Speed 3242.09 samples/sec   Loss 3.1157   LearningRate 0.0142   Epoch: 12   Global Step: 154960   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:09:16,235-Speed 3276.59 samples/sec   Loss 3.0397   LearningRate 0.0142   Epoch: 12   Global Step: 154970   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:09:19,344-Speed 3294.58 samples/sec   Loss 2.9535   LearningRate 0.0141   Epoch: 12   Global Step: 154980   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:09:22,431-Speed 3318.28 samples/sec   Loss 3.0820   LearningRate 0.0141   Epoch: 12   Global Step: 154990   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:09:25,615-Speed 3217.29 samples/sec   Loss 3.0154   LearningRate 0.0141   Epoch: 12   Global Step: 155000   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:09:28,753-Speed 3264.09 samples/sec   Loss 2.9807   LearningRate 0.0141   Epoch: 12   Global Step: 155010   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:09:31,849-Speed 3309.36 samples/sec   Loss 3.0629   LearningRate 0.0141   Epoch: 12   Global Step: 155020   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:09:34,977-Speed 3273.90 samples/sec   Loss 2.9895   LearningRate 0.0141   Epoch: 12   Global Step: 155030   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:09:38,119-Speed 3260.28 samples/sec   Loss 2.9804   LearningRate 0.0141   Epoch: 12   Global Step: 155040   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:09:41,248-Speed 3273.38 samples/sec   Loss 2.9143   LearningRate 0.0141   Epoch: 12   Global Step: 155050   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:09:44,354-Speed 3298.22 samples/sec   Loss 2.9577   LearningRate 0.0141   Epoch: 12   Global Step: 155060   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:09:47,482-Speed 3274.52 samples/sec   Loss 3.0744   LearningRate 0.0141   Epoch: 12   Global Step: 155070   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:09:50,619-Speed 3265.51 samples/sec   Loss 2.9752   LearningRate 0.0141   Epoch: 12   Global Step: 155080   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:09:53,725-Speed 3297.77 samples/sec   Loss 3.0156   LearningRate 0.0141   Epoch: 12   Global Step: 155090   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 15:09:56,838-Speed 3290.23 samples/sec   Loss 2.9799   LearningRate 0.0141   Epoch: 12   Global Step: 155100   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:09:59,971-Speed 3270.04 samples/sec   Loss 3.0071   LearningRate 0.0141   Epoch: 12   Global Step: 155110   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:10:03,134-Speed 3238.35 samples/sec   Loss 2.9687   LearningRate 0.0141   Epoch: 12   Global Step: 155120   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:10:06,245-Speed 3291.83 samples/sec   Loss 3.0389   LearningRate 0.0141   Epoch: 12   Global Step: 155130   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:10:09,316-Speed 3335.48 samples/sec   Loss 3.1038   LearningRate 0.0141   Epoch: 12   Global Step: 155140   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:10:12,427-Speed 3292.52 samples/sec   Loss 3.0266   LearningRate 0.0141   Epoch: 12   Global Step: 155150   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:10:15,638-Speed 3190.46 samples/sec   Loss 2.9384   LearningRate 0.0141   Epoch: 12   Global Step: 155160   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:10:18,775-Speed 3264.82 samples/sec   Loss 3.0572   LearningRate 0.0141   Epoch: 12   Global Step: 155170   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:10:21,842-Speed 3340.21 samples/sec   Loss 3.0619   LearningRate 0.0141   Epoch: 12   Global Step: 155180   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:10:24,990-Speed 3253.76 samples/sec   Loss 2.9355   LearningRate 0.0141   Epoch: 12   Global Step: 155190   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:10:28,164-Speed 3227.27 samples/sec   Loss 3.0206   LearningRate 0.0141   Epoch: 12   Global Step: 155200   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:10:31,265-Speed 3303.03 samples/sec   Loss 2.9834   LearningRate 0.0141   Epoch: 12   Global Step: 155210   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:10:34,375-Speed 3293.61 samples/sec   Loss 3.0570   LearningRate 0.0141   Epoch: 12   Global Step: 155220   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:10:37,563-Speed 3213.65 samples/sec   Loss 2.9449   LearningRate 0.0141   Epoch: 12   Global Step: 155230   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:10:40,677-Speed 3289.86 samples/sec   Loss 3.0177   LearningRate 0.0141   Epoch: 12   Global Step: 155240   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:10:43,770-Speed 3311.05 samples/sec   Loss 3.0328   LearningRate 0.0141   Epoch: 12   Global Step: 155250   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:10:46,889-Speed 3284.68 samples/sec   Loss 2.9087   LearningRate 0.0141   Epoch: 12   Global Step: 155260   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:10:49,951-Speed 3345.02 samples/sec   Loss 3.0505   LearningRate 0.0141   Epoch: 12   Global Step: 155270   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:10:53,096-Speed 3257.10 samples/sec   Loss 3.0360   LearningRate 0.0141   Epoch: 12   Global Step: 155280   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:10:56,178-Speed 3323.23 samples/sec   Loss 3.0540   LearningRate 0.0141   Epoch: 12   Global Step: 155290   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:10:59,269-Speed 3313.93 samples/sec   Loss 2.9892   LearningRate 0.0141   Epoch: 12   Global Step: 155300   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:11:02,353-Speed 3322.05 samples/sec   Loss 3.1368   LearningRate 0.0140   Epoch: 12   Global Step: 155310   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:11:05,468-Speed 3288.06 samples/sec   Loss 2.9602   LearningRate 0.0140   Epoch: 12   Global Step: 155320   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:11:08,538-Speed 3336.25 samples/sec   Loss 3.0486   LearningRate 0.0140   Epoch: 12   Global Step: 155330   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:11:11,628-Speed 3315.07 samples/sec   Loss 2.9756   LearningRate 0.0140   Epoch: 12   Global Step: 155340   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:11:14,764-Speed 3265.98 samples/sec   Loss 3.0776   LearningRate 0.0140   Epoch: 12   Global Step: 155350   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:11:17,889-Speed 3278.33 samples/sec   Loss 2.9830   LearningRate 0.0140   Epoch: 12   Global Step: 155360   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:11:20,999-Speed 3294.08 samples/sec   Loss 2.9919   LearningRate 0.0140   Epoch: 12   Global Step: 155370   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:11:24,183-Speed 3217.44 samples/sec   Loss 3.0058   LearningRate 0.0140   Epoch: 12   Global Step: 155380   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:11:27,363-Speed 3220.89 samples/sec   Loss 2.9751   LearningRate 0.0140   Epoch: 12   Global Step: 155390   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:11:30,520-Speed 3243.64 samples/sec   Loss 2.9949   LearningRate 0.0140   Epoch: 12   Global Step: 155400   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:11:33,658-Speed 3264.50 samples/sec   Loss 3.0445   LearningRate 0.0140   Epoch: 12   Global Step: 155410   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:11:36,827-Speed 3232.19 samples/sec   Loss 3.0299   LearningRate 0.0140   Epoch: 12   Global Step: 155420   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:11:39,980-Speed 3249.49 samples/sec   Loss 2.9255   LearningRate 0.0140   Epoch: 12   Global Step: 155430   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:11:43,146-Speed 3235.26 samples/sec   Loss 2.9540   LearningRate 0.0140   Epoch: 12   Global Step: 155440   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:11:46,236-Speed 3314.51 samples/sec   Loss 3.0368   LearningRate 0.0140   Epoch: 12   Global Step: 155450   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:11:49,407-Speed 3230.61 samples/sec   Loss 3.0044   LearningRate 0.0140   Epoch: 12   Global Step: 155460   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:11:52,492-Speed 3320.05 samples/sec   Loss 3.0476   LearningRate 0.0140   Epoch: 12   Global Step: 155470   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:11:55,558-Speed 3341.59 samples/sec   Loss 2.9460   LearningRate 0.0140   Epoch: 12   Global Step: 155480   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:11:58,661-Speed 3301.07 samples/sec   Loss 3.0816   LearningRate 0.0140   Epoch: 12   Global Step: 155490   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:12:01,885-Speed 3176.78 samples/sec   Loss 3.0370   LearningRate 0.0140   Epoch: 12   Global Step: 155500   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:12:05,073-Speed 3212.94 samples/sec   Loss 3.0198   LearningRate 0.0140   Epoch: 12   Global Step: 155510   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:12:08,175-Speed 3302.39 samples/sec   Loss 3.0126   LearningRate 0.0140   Epoch: 12   Global Step: 155520   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:12:11,279-Speed 3299.88 samples/sec   Loss 2.9875   LearningRate 0.0140   Epoch: 12   Global Step: 155530   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:12:14,414-Speed 3266.86 samples/sec   Loss 3.0696   LearningRate 0.0140   Epoch: 12   Global Step: 155540   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:12:17,510-Speed 3308.35 samples/sec   Loss 3.0831   LearningRate 0.0140   Epoch: 12   Global Step: 155550   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:12:20,584-Speed 3332.16 samples/sec   Loss 2.9327   LearningRate 0.0140   Epoch: 12   Global Step: 155560   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:12:23,741-Speed 3244.68 samples/sec   Loss 2.9931   LearningRate 0.0140   Epoch: 12   Global Step: 155570   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:12:26,835-Speed 3311.11 samples/sec   Loss 3.0234   LearningRate 0.0140   Epoch: 12   Global Step: 155580   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:12:29,974-Speed 3263.04 samples/sec   Loss 2.9754   LearningRate 0.0140   Epoch: 12   Global Step: 155590   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:12:33,093-Speed 3284.50 samples/sec   Loss 2.9972   LearningRate 0.0140   Epoch: 12   Global Step: 155600   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:12:36,218-Speed 3277.21 samples/sec   Loss 3.1027   LearningRate 0.0140   Epoch: 12   Global Step: 155610   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:12:39,349-Speed 3271.55 samples/sec   Loss 2.9652   LearningRate 0.0140   Epoch: 12   Global Step: 155620   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:12:42,473-Speed 3278.75 samples/sec   Loss 2.9522   LearningRate 0.0140   Epoch: 12   Global Step: 155630   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:12:45,564-Speed 3314.07 samples/sec   Loss 3.0080   LearningRate 0.0139   Epoch: 12   Global Step: 155640   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:12:48,661-Speed 3308.40 samples/sec   Loss 3.1039   LearningRate 0.0139   Epoch: 12   Global Step: 155650   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:12:51,804-Speed 3258.67 samples/sec   Loss 3.0353   LearningRate 0.0139   Epoch: 12   Global Step: 155660   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:12:54,924-Speed 3283.35 samples/sec   Loss 3.1765   LearningRate 0.0139   Epoch: 12   Global Step: 155670   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:12:58,025-Speed 3303.15 samples/sec   Loss 2.9898   LearningRate 0.0139   Epoch: 12   Global Step: 155680   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:13:01,172-Speed 3254.83 samples/sec   Loss 3.0516   LearningRate 0.0139   Epoch: 12   Global Step: 155690   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:13:04,250-Speed 3327.59 samples/sec   Loss 3.0214   LearningRate 0.0139   Epoch: 12   Global Step: 155700   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:13:07,397-Speed 3255.38 samples/sec   Loss 3.0638   LearningRate 0.0139   Epoch: 12   Global Step: 155710   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:13:10,485-Speed 3316.45 samples/sec   Loss 2.9524   LearningRate 0.0139   Epoch: 12   Global Step: 155720   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:13:13,640-Speed 3246.52 samples/sec   Loss 3.0508   LearningRate 0.0139   Epoch: 12   Global Step: 155730   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:13:16,857-Speed 3184.38 samples/sec   Loss 2.9394   LearningRate 0.0139   Epoch: 12   Global Step: 155740   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:13:19,955-Speed 3306.90 samples/sec   Loss 2.9718   LearningRate 0.0139   Epoch: 12   Global Step: 155750   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:13:23,061-Speed 3297.43 samples/sec   Loss 2.9414   LearningRate 0.0139   Epoch: 12   Global Step: 155760   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:13:26,221-Speed 3241.09 samples/sec   Loss 3.0032   LearningRate 0.0139   Epoch: 12   Global Step: 155770   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:13:29,306-Speed 3320.50 samples/sec   Loss 2.9883   LearningRate 0.0139   Epoch: 12   Global Step: 155780   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:13:32,450-Speed 3258.67 samples/sec   Loss 3.1159   LearningRate 0.0139   Epoch: 12   Global Step: 155790   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:13:35,525-Speed 3331.26 samples/sec   Loss 3.0164   LearningRate 0.0139   Epoch: 12   Global Step: 155800   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:13:38,624-Speed 3305.03 samples/sec   Loss 3.0745   LearningRate 0.0139   Epoch: 12   Global Step: 155810   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:13:41,757-Speed 3269.47 samples/sec   Loss 2.9917   LearningRate 0.0139   Epoch: 12   Global Step: 155820   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:13:44,831-Speed 3331.75 samples/sec   Loss 2.9894   LearningRate 0.0139   Epoch: 12   Global Step: 155830   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:13:47,940-Speed 3295.45 samples/sec   Loss 3.0470   LearningRate 0.0139   Epoch: 12   Global Step: 155840   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:13:51,029-Speed 3315.88 samples/sec   Loss 3.0736   LearningRate 0.0139   Epoch: 12   Global Step: 155850   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:13:54,122-Speed 3311.97 samples/sec   Loss 2.9790   LearningRate 0.0139   Epoch: 12   Global Step: 155860   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:13:57,261-Speed 3263.36 samples/sec   Loss 3.0256   LearningRate 0.0139   Epoch: 12   Global Step: 155870   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:14:00,364-Speed 3300.46 samples/sec   Loss 3.0579   LearningRate 0.0139   Epoch: 12   Global Step: 155880   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:14:03,512-Speed 3254.32 samples/sec   Loss 3.1105   LearningRate 0.0139   Epoch: 12   Global Step: 155890   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:14:06,637-Speed 3277.79 samples/sec   Loss 2.9544   LearningRate 0.0139   Epoch: 12   Global Step: 155900   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:14:09,713-Speed 3329.76 samples/sec   Loss 3.0173   LearningRate 0.0139   Epoch: 12   Global Step: 155910   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:14:12,780-Speed 3339.80 samples/sec   Loss 2.9949   LearningRate 0.0139   Epoch: 12   Global Step: 155920   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:14:15,944-Speed 3237.76 samples/sec   Loss 2.9933   LearningRate 0.0139   Epoch: 12   Global Step: 155930   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:14:19,014-Speed 3336.10 samples/sec   Loss 3.0529   LearningRate 0.0139   Epoch: 12   Global Step: 155940   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:14:22,075-Speed 3346.80 samples/sec   Loss 2.9836   LearningRate 0.0139   Epoch: 12   Global Step: 155950   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:14:25,150-Speed 3330.93 samples/sec   Loss 3.0319   LearningRate 0.0139   Epoch: 12   Global Step: 155960   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:14:28,282-Speed 3269.94 samples/sec   Loss 3.0947   LearningRate 0.0138   Epoch: 12   Global Step: 155970   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:14:31,401-Speed 3284.59 samples/sec   Loss 3.0409   LearningRate 0.0138   Epoch: 12   Global Step: 155980   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:14:34,499-Speed 3306.52 samples/sec   Loss 3.0108   LearningRate 0.0138   Epoch: 12   Global Step: 155990   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:14:37,605-Speed 3297.09 samples/sec   Loss 2.9926   LearningRate 0.0138   Epoch: 12   Global Step: 156000   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:14:40,714-Speed 3294.61 samples/sec   Loss 3.0158   LearningRate 0.0138   Epoch: 12   Global Step: 156010   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:14:43,784-Speed 3337.25 samples/sec   Loss 3.0366   LearningRate 0.0138   Epoch: 12   Global Step: 156020   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:14:46,909-Speed 3278.03 samples/sec   Loss 3.0815   LearningRate 0.0138   Epoch: 12   Global Step: 156030   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:14:50,077-Speed 3233.44 samples/sec   Loss 3.0727   LearningRate 0.0138   Epoch: 12   Global Step: 156040   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:14:53,243-Speed 3234.90 samples/sec   Loss 3.1230   LearningRate 0.0138   Epoch: 12   Global Step: 156050   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:14:56,384-Speed 3261.53 samples/sec   Loss 3.0106   LearningRate 0.0138   Epoch: 12   Global Step: 156060   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:14:59,534-Speed 3251.56 samples/sec   Loss 3.0074   LearningRate 0.0138   Epoch: 12   Global Step: 156070   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:15:02,690-Speed 3245.50 samples/sec   Loss 3.0389   LearningRate 0.0138   Epoch: 12   Global Step: 156080   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:15:05,841-Speed 3251.29 samples/sec   Loss 2.9783   LearningRate 0.0138   Epoch: 12   Global Step: 156090   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:15:08,964-Speed 3279.67 samples/sec   Loss 3.0048   LearningRate 0.0138   Epoch: 12   Global Step: 156100   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:15:12,121-Speed 3243.92 samples/sec   Loss 2.9793   LearningRate 0.0138   Epoch: 12   Global Step: 156110   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:15:15,253-Speed 3271.11 samples/sec   Loss 3.0197   LearningRate 0.0138   Epoch: 12   Global Step: 156120   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:15:18,431-Speed 3222.96 samples/sec   Loss 2.9239   LearningRate 0.0138   Epoch: 12   Global Step: 156130   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:15:21,533-Speed 3302.22 samples/sec   Loss 3.0136   LearningRate 0.0138   Epoch: 12   Global Step: 156140   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:15:24,666-Speed 3269.72 samples/sec   Loss 3.1071   LearningRate 0.0138   Epoch: 12   Global Step: 156150   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:15:27,767-Speed 3302.79 samples/sec   Loss 3.0525   LearningRate 0.0138   Epoch: 12   Global Step: 156160   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:15:30,875-Speed 3296.14 samples/sec   Loss 2.9809   LearningRate 0.0138   Epoch: 12   Global Step: 156170   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:15:34,003-Speed 3274.05 samples/sec   Loss 2.9904   LearningRate 0.0138   Epoch: 12   Global Step: 156180   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:15:37,141-Speed 3263.71 samples/sec   Loss 3.0556   LearningRate 0.0138   Epoch: 12   Global Step: 156190   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:15:40,249-Speed 3296.04 samples/sec   Loss 3.0596   LearningRate 0.0138   Epoch: 12   Global Step: 156200   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:15:43,358-Speed 3294.54 samples/sec   Loss 3.0367   LearningRate 0.0138   Epoch: 12   Global Step: 156210   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:15:46,429-Speed 3335.87 samples/sec   Loss 3.0639   LearningRate 0.0138   Epoch: 12   Global Step: 156220   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:15:49,533-Speed 3300.00 samples/sec   Loss 3.1133   LearningRate 0.0138   Epoch: 12   Global Step: 156230   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:15:52,649-Speed 3287.40 samples/sec   Loss 3.1074   LearningRate 0.0138   Epoch: 12   Global Step: 156240   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:15:55,780-Speed 3271.49 samples/sec   Loss 2.9592   LearningRate 0.0138   Epoch: 12   Global Step: 156250   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:15:58,868-Speed 3317.21 samples/sec   Loss 2.9617   LearningRate 0.0138   Epoch: 12   Global Step: 156260   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:16:02,016-Speed 3254.05 samples/sec   Loss 3.0204   LearningRate 0.0138   Epoch: 12   Global Step: 156270   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:16:05,179-Speed 3238.89 samples/sec   Loss 3.0850   LearningRate 0.0138   Epoch: 12   Global Step: 156280   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:16:08,297-Speed 3284.69 samples/sec   Loss 3.0174   LearningRate 0.0138   Epoch: 12   Global Step: 156290   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:16:11,381-Speed 3321.24 samples/sec   Loss 3.0416   LearningRate 0.0138   Epoch: 12   Global Step: 156300   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:16:14,515-Speed 3268.38 samples/sec   Loss 2.9782   LearningRate 0.0137   Epoch: 12   Global Step: 156310   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:16:17,674-Speed 3243.08 samples/sec   Loss 3.0462   LearningRate 0.0137   Epoch: 12   Global Step: 156320   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:16:20,787-Speed 3290.03 samples/sec   Loss 3.0390   LearningRate 0.0137   Epoch: 12   Global Step: 156330   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:16:23,852-Speed 3341.97 samples/sec   Loss 3.0555   LearningRate 0.0137   Epoch: 12   Global Step: 156340   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:16:27,001-Speed 3253.44 samples/sec   Loss 3.0166   LearningRate 0.0137   Epoch: 12   Global Step: 156350   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:16:30,089-Speed 3316.97 samples/sec   Loss 3.1086   LearningRate 0.0137   Epoch: 12   Global Step: 156360   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:16:33,183-Speed 3310.32 samples/sec   Loss 2.9964   LearningRate 0.0137   Epoch: 12   Global Step: 156370   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:16:36,272-Speed 3315.99 samples/sec   Loss 3.1179   LearningRate 0.0137   Epoch: 12   Global Step: 156380   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:16:39,334-Speed 3345.59 samples/sec   Loss 3.0861   LearningRate 0.0137   Epoch: 12   Global Step: 156390   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:16:42,456-Speed 3280.86 samples/sec   Loss 3.1008   LearningRate 0.0137   Epoch: 12   Global Step: 156400   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:16:45,544-Speed 3317.66 samples/sec   Loss 2.9850   LearningRate 0.0137   Epoch: 12   Global Step: 156410   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:16:48,631-Speed 3317.50 samples/sec   Loss 3.0914   LearningRate 0.0137   Epoch: 12   Global Step: 156420   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:16:51,807-Speed 3225.55 samples/sec   Loss 3.0758   LearningRate 0.0137   Epoch: 12   Global Step: 156430   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:16:54,929-Speed 3280.45 samples/sec   Loss 3.0043   LearningRate 0.0137   Epoch: 12   Global Step: 156440   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:16:58,004-Speed 3331.98 samples/sec   Loss 3.0750   LearningRate 0.0137   Epoch: 12   Global Step: 156450   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:17:01,088-Speed 3320.96 samples/sec   Loss 3.0232   LearningRate 0.0137   Epoch: 12   Global Step: 156460   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:17:04,196-Speed 3295.01 samples/sec   Loss 2.9875   LearningRate 0.0137   Epoch: 12   Global Step: 156470   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:17:07,308-Speed 3291.26 samples/sec   Loss 2.9923   LearningRate 0.0137   Epoch: 12   Global Step: 156480   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:17:10,395-Speed 3319.11 samples/sec   Loss 2.9541   LearningRate 0.0137   Epoch: 12   Global Step: 156490   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:17:13,540-Speed 3257.03 samples/sec   Loss 3.0550   LearningRate 0.0137   Epoch: 12   Global Step: 156500   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:17:16,693-Speed 3248.16 samples/sec   Loss 3.0948   LearningRate 0.0137   Epoch: 12   Global Step: 156510   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:17:19,793-Speed 3304.33 samples/sec   Loss 3.0161   LearningRate 0.0137   Epoch: 12   Global Step: 156520   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:17:22,890-Speed 3307.79 samples/sec   Loss 3.0308   LearningRate 0.0137   Epoch: 12   Global Step: 156530   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:17:25,966-Speed 3330.54 samples/sec   Loss 3.0111   LearningRate 0.0137   Epoch: 12   Global Step: 156540   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:17:29,053-Speed 3317.48 samples/sec   Loss 3.0694   LearningRate 0.0137   Epoch: 12   Global Step: 156550   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:17:32,107-Speed 3353.91 samples/sec   Loss 3.0321   LearningRate 0.0137   Epoch: 12   Global Step: 156560   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:17:35,289-Speed 3219.26 samples/sec   Loss 3.0019   LearningRate 0.0137   Epoch: 12   Global Step: 156570   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:17:38,362-Speed 3332.93 samples/sec   Loss 3.0204   LearningRate 0.0137   Epoch: 12   Global Step: 156580   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:17:41,448-Speed 3319.63 samples/sec   Loss 2.9600   LearningRate 0.0137   Epoch: 12   Global Step: 156590   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:17:44,510-Speed 3345.46 samples/sec   Loss 3.0990   LearningRate 0.0137   Epoch: 12   Global Step: 156600   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:17:47,570-Speed 3347.85 samples/sec   Loss 3.0815   LearningRate 0.0137   Epoch: 12   Global Step: 156610   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:17:50,705-Speed 3267.73 samples/sec   Loss 3.0321   LearningRate 0.0137   Epoch: 12   Global Step: 156620   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:17:53,822-Speed 3286.43 samples/sec   Loss 3.1224   LearningRate 0.0137   Epoch: 12   Global Step: 156630   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:17:56,879-Speed 3350.10 samples/sec   Loss 3.0425   LearningRate 0.0136   Epoch: 12   Global Step: 156640   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:17:59,998-Speed 3284.14 samples/sec   Loss 3.0271   LearningRate 0.0136   Epoch: 12   Global Step: 156650   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:18:03,108-Speed 3293.85 samples/sec   Loss 3.0983   LearningRate 0.0136   Epoch: 12   Global Step: 156660   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:18:06,242-Speed 3268.36 samples/sec   Loss 3.1179   LearningRate 0.0136   Epoch: 12   Global Step: 156670   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:18:09,300-Speed 3349.84 samples/sec   Loss 3.0086   LearningRate 0.0136   Epoch: 12   Global Step: 156680   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:18:12,376-Speed 3330.74 samples/sec   Loss 3.1217   LearningRate 0.0136   Epoch: 12   Global Step: 156690   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:18:15,513-Speed 3264.82 samples/sec   Loss 3.0470   LearningRate 0.0136   Epoch: 12   Global Step: 156700   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:18:18,639-Speed 3276.78 samples/sec   Loss 3.0169   LearningRate 0.0136   Epoch: 12   Global Step: 156710   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:18:21,714-Speed 3330.66 samples/sec   Loss 3.0547   LearningRate 0.0136   Epoch: 12   Global Step: 156720   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:18:24,787-Speed 3333.83 samples/sec   Loss 2.9604   LearningRate 0.0136   Epoch: 12   Global Step: 156730   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:18:27,916-Speed 3273.53 samples/sec   Loss 3.0396   LearningRate 0.0136   Epoch: 12   Global Step: 156740   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:18:31,041-Speed 3277.78 samples/sec   Loss 3.1348   LearningRate 0.0136   Epoch: 12   Global Step: 156750   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:18:34,114-Speed 3333.31 samples/sec   Loss 3.0327   LearningRate 0.0136   Epoch: 12   Global Step: 156760   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:18:37,259-Speed 3256.65 samples/sec   Loss 3.1734   LearningRate 0.0136   Epoch: 12   Global Step: 156770   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:18:40,390-Speed 3271.63 samples/sec   Loss 3.0836   LearningRate 0.0136   Epoch: 12   Global Step: 156780   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:18:43,489-Speed 3305.02 samples/sec   Loss 3.0474   LearningRate 0.0136   Epoch: 12   Global Step: 156790   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:18:46,603-Speed 3289.43 samples/sec   Loss 3.0080   LearningRate 0.0136   Epoch: 12   Global Step: 156800   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:18:49,757-Speed 3248.09 samples/sec   Loss 2.9369   LearningRate 0.0136   Epoch: 12   Global Step: 156810   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:18:52,841-Speed 3321.87 samples/sec   Loss 3.0290   LearningRate 0.0136   Epoch: 12   Global Step: 156820   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:18:55,916-Speed 3330.65 samples/sec   Loss 2.9952   LearningRate 0.0136   Epoch: 12   Global Step: 156830   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:18:59,026-Speed 3294.17 samples/sec   Loss 3.0471   LearningRate 0.0136   Epoch: 12   Global Step: 156840   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:19:02,122-Speed 3309.18 samples/sec   Loss 3.0708   LearningRate 0.0136   Epoch: 12   Global Step: 156850   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:19:05,192-Speed 3336.03 samples/sec   Loss 3.0637   LearningRate 0.0136   Epoch: 12   Global Step: 156860   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:19:08,266-Speed 3331.66 samples/sec   Loss 3.0573   LearningRate 0.0136   Epoch: 12   Global Step: 156870   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:19:11,343-Speed 3329.03 samples/sec   Loss 3.0881   LearningRate 0.0136   Epoch: 12   Global Step: 156880   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:19:14,447-Speed 3300.58 samples/sec   Loss 3.1183   LearningRate 0.0136   Epoch: 12   Global Step: 156890   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:19:17,551-Speed 3299.53 samples/sec   Loss 2.9889   LearningRate 0.0136   Epoch: 12   Global Step: 156900   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:19:20,624-Speed 3333.02 samples/sec   Loss 3.0393   LearningRate 0.0136   Epoch: 12   Global Step: 156910   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:19:23,736-Speed 3291.99 samples/sec   Loss 2.9783   LearningRate 0.0136   Epoch: 12   Global Step: 156920   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:19:26,780-Speed 3365.04 samples/sec   Loss 3.0399   LearningRate 0.0136   Epoch: 12   Global Step: 156930   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:19:29,860-Speed 3325.93 samples/sec   Loss 3.0230   LearningRate 0.0136   Epoch: 12   Global Step: 156940   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:19:33,017-Speed 3244.79 samples/sec   Loss 3.0467   LearningRate 0.0136   Epoch: 12   Global Step: 156950   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:19:36,104-Speed 3318.21 samples/sec   Loss 2.9907   LearningRate 0.0136   Epoch: 12   Global Step: 156960   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:19:39,185-Speed 3325.03 samples/sec   Loss 3.0765   LearningRate 0.0136   Epoch: 12   Global Step: 156970   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:19:42,277-Speed 3312.88 samples/sec   Loss 3.0148   LearningRate 0.0135   Epoch: 12   Global Step: 156980   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:19:45,359-Speed 3323.52 samples/sec   Loss 3.0451   LearningRate 0.0135   Epoch: 12   Global Step: 156990   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:19:48,450-Speed 3313.69 samples/sec   Loss 3.1173   LearningRate 0.0135   Epoch: 12   Global Step: 157000   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:19:51,525-Speed 3331.12 samples/sec   Loss 3.0603   LearningRate 0.0135   Epoch: 12   Global Step: 157010   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:19:54,575-Speed 3358.64 samples/sec   Loss 3.0734   LearningRate 0.0135   Epoch: 12   Global Step: 157020   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:19:57,630-Speed 3352.49 samples/sec   Loss 3.0077   LearningRate 0.0135   Epoch: 12   Global Step: 157030   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:20:00,701-Speed 3335.91 samples/sec   Loss 3.1230   LearningRate 0.0135   Epoch: 12   Global Step: 157040   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:20:03,781-Speed 3326.09 samples/sec   Loss 3.0343   LearningRate 0.0135   Epoch: 12   Global Step: 157050   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:20:06,847-Speed 3340.21 samples/sec   Loss 3.0236   LearningRate 0.0135   Epoch: 12   Global Step: 157060   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:20:09,894-Speed 3362.37 samples/sec   Loss 3.1439   LearningRate 0.0135   Epoch: 12   Global Step: 157070   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:20:12,989-Speed 3309.75 samples/sec   Loss 3.0832   LearningRate 0.0135   Epoch: 12   Global Step: 157080   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:20:16,155-Speed 3235.27 samples/sec   Loss 3.1089   LearningRate 0.0135   Epoch: 12   Global Step: 157090   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:20:19,200-Speed 3363.53 samples/sec   Loss 3.0366   LearningRate 0.0135   Epoch: 12   Global Step: 157100   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:20:22,255-Speed 3353.90 samples/sec   Loss 3.1142   LearningRate 0.0135   Epoch: 12   Global Step: 157110   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:20:25,330-Speed 3330.55 samples/sec   Loss 3.0786   LearningRate 0.0135   Epoch: 12   Global Step: 157120   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:20:28,428-Speed 3306.84 samples/sec   Loss 3.0495   LearningRate 0.0135   Epoch: 12   Global Step: 157130   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:20:31,512-Speed 3321.17 samples/sec   Loss 3.0886   LearningRate 0.0135   Epoch: 12   Global Step: 157140   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:20:34,585-Speed 3332.42 samples/sec   Loss 3.0399   LearningRate 0.0135   Epoch: 12   Global Step: 157150   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:20:37,706-Speed 3281.90 samples/sec   Loss 3.0365   LearningRate 0.0135   Epoch: 12   Global Step: 157160   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:20:40,814-Speed 3296.30 samples/sec   Loss 3.0486   LearningRate 0.0135   Epoch: 12   Global Step: 157170   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:20:43,879-Speed 3342.34 samples/sec   Loss 2.9994   LearningRate 0.0135   Epoch: 12   Global Step: 157180   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:20:46,962-Speed 3322.36 samples/sec   Loss 3.0450   LearningRate 0.0135   Epoch: 12   Global Step: 157190   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:20:50,038-Speed 3329.28 samples/sec   Loss 2.9689   LearningRate 0.0135   Epoch: 12   Global Step: 157200   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:20:53,101-Speed 3344.42 samples/sec   Loss 3.0918   LearningRate 0.0135   Epoch: 12   Global Step: 157210   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:20:56,157-Speed 3352.15 samples/sec   Loss 3.0842   LearningRate 0.0135   Epoch: 12   Global Step: 157220   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:20:59,321-Speed 3237.15 samples/sec   Loss 3.0684   LearningRate 0.0135   Epoch: 12   Global Step: 157230   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:21:02,529-Speed 3193.13 samples/sec   Loss 3.1161   LearningRate 0.0135   Epoch: 12   Global Step: 157240   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:21:05,724-Speed 3205.67 samples/sec   Loss 3.0904   LearningRate 0.0135   Epoch: 12   Global Step: 157250   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:21:08,802-Speed 3328.30 samples/sec   Loss 3.0434   LearningRate 0.0135   Epoch: 12   Global Step: 157260   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:21:11,898-Speed 3308.56 samples/sec   Loss 3.0412   LearningRate 0.0135   Epoch: 12   Global Step: 157270   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:21:15,038-Speed 3261.98 samples/sec   Loss 3.0973   LearningRate 0.0135   Epoch: 12   Global Step: 157280   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:21:18,155-Speed 3286.88 samples/sec   Loss 3.0271   LearningRate 0.0135   Epoch: 12   Global Step: 157290   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:21:21,234-Speed 3326.15 samples/sec   Loss 3.1199   LearningRate 0.0135   Epoch: 12   Global Step: 157300   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:21:24,296-Speed 3345.52 samples/sec   Loss 3.0806   LearningRate 0.0135   Epoch: 12   Global Step: 157310   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:21:27,363-Speed 3339.93 samples/sec   Loss 3.0358   LearningRate 0.0134   Epoch: 12   Global Step: 157320   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:21:30,452-Speed 3315.58 samples/sec   Loss 3.0670   LearningRate 0.0134   Epoch: 12   Global Step: 157330   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:21:33,551-Speed 3306.03 samples/sec   Loss 2.9738   LearningRate 0.0134   Epoch: 12   Global Step: 157340   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:21:36,651-Speed 3303.70 samples/sec   Loss 3.0705   LearningRate 0.0134   Epoch: 12   Global Step: 157350   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:21:39,861-Speed 3191.33 samples/sec   Loss 3.0922   LearningRate 0.0134   Epoch: 12   Global Step: 157360   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:21:43,017-Speed 3245.74 samples/sec   Loss 3.0749   LearningRate 0.0134   Epoch: 12   Global Step: 157370   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:21:46,091-Speed 3331.59 samples/sec   Loss 3.0469   LearningRate 0.0134   Epoch: 12   Global Step: 157380   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:21:49,252-Speed 3240.60 samples/sec   Loss 3.1306   LearningRate 0.0134   Epoch: 12   Global Step: 157390   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:21:52,356-Speed 3301.08 samples/sec   Loss 3.0503   LearningRate 0.0134   Epoch: 12   Global Step: 157400   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:21:55,470-Speed 3288.86 samples/sec   Loss 3.0552   LearningRate 0.0134   Epoch: 12   Global Step: 157410   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:21:58,533-Speed 3343.83 samples/sec   Loss 3.0309   LearningRate 0.0134   Epoch: 12   Global Step: 157420   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:22:01,685-Speed 3249.98 samples/sec   Loss 3.0977   LearningRate 0.0134   Epoch: 12   Global Step: 157430   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:22:04,846-Speed 3240.27 samples/sec   Loss 3.0164   LearningRate 0.0134   Epoch: 12   Global Step: 157440   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:22:08,015-Speed 3232.07 samples/sec   Loss 3.0615   LearningRate 0.0134   Epoch: 12   Global Step: 157450   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:22:11,109-Speed 3311.58 samples/sec   Loss 3.1751   LearningRate 0.0134   Epoch: 12   Global Step: 157460   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:22:14,269-Speed 3242.10 samples/sec   Loss 3.0733   LearningRate 0.0134   Epoch: 12   Global Step: 157470   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:22:17,413-Speed 3257.38 samples/sec   Loss 3.1098   LearningRate 0.0134   Epoch: 12   Global Step: 157480   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:22:20,493-Speed 3325.90 samples/sec   Loss 3.0679   LearningRate 0.0134   Epoch: 12   Global Step: 157490   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:22:23,590-Speed 3306.91 samples/sec   Loss 3.0061   LearningRate 0.0134   Epoch: 12   Global Step: 157500   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:22:26,695-Speed 3300.31 samples/sec   Loss 3.0358   LearningRate 0.0134   Epoch: 12   Global Step: 157510   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:22:29,791-Speed 3307.82 samples/sec   Loss 3.0689   LearningRate 0.0134   Epoch: 12   Global Step: 157520   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:22:32,903-Speed 3292.24 samples/sec   Loss 3.0501   LearningRate 0.0134   Epoch: 12   Global Step: 157530   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:22:36,011-Speed 3295.85 samples/sec   Loss 3.0514   LearningRate 0.0134   Epoch: 12   Global Step: 157540   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:22:39,114-Speed 3301.29 samples/sec   Loss 3.0518   LearningRate 0.0134   Epoch: 12   Global Step: 157550   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:22:42,255-Speed 3260.38 samples/sec   Loss 3.0400   LearningRate 0.0134   Epoch: 12   Global Step: 157560   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:22:45,347-Speed 3313.50 samples/sec   Loss 3.0773   LearningRate 0.0134   Epoch: 12   Global Step: 157570   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:22:48,475-Speed 3274.54 samples/sec   Loss 3.0767   LearningRate 0.0134   Epoch: 12   Global Step: 157580   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:22:51,600-Speed 3277.34 samples/sec   Loss 3.1095   LearningRate 0.0134   Epoch: 12   Global Step: 157590   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:22:54,754-Speed 3247.59 samples/sec   Loss 3.0195   LearningRate 0.0134   Epoch: 12   Global Step: 157600   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:22:57,871-Speed 3286.42 samples/sec   Loss 3.0442   LearningRate 0.0134   Epoch: 12   Global Step: 157610   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:23:00,971-Speed 3304.26 samples/sec   Loss 3.1208   LearningRate 0.0134   Epoch: 12   Global Step: 157620   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:23:04,113-Speed 3260.20 samples/sec   Loss 3.0133   LearningRate 0.0134   Epoch: 12   Global Step: 157630   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:23:07,250-Speed 3264.94 samples/sec   Loss 3.0718   LearningRate 0.0134   Epoch: 12   Global Step: 157640   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:23:10,366-Speed 3287.84 samples/sec   Loss 3.0383   LearningRate 0.0134   Epoch: 12   Global Step: 157650   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:23:13,518-Speed 3249.12 samples/sec   Loss 3.0617   LearningRate 0.0133   Epoch: 12   Global Step: 157660   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:23:16,611-Speed 3311.53 samples/sec   Loss 3.0263   LearningRate 0.0133   Epoch: 12   Global Step: 157670   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:23:19,683-Speed 3334.35 samples/sec   Loss 3.0616   LearningRate 0.0133   Epoch: 12   Global Step: 157680   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:23:22,781-Speed 3306.85 samples/sec   Loss 3.0181   LearningRate 0.0133   Epoch: 12   Global Step: 157690   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:23:25,913-Speed 3270.22 samples/sec   Loss 3.1066   LearningRate 0.0133   Epoch: 12   Global Step: 157700   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:23:29,054-Speed 3261.84 samples/sec   Loss 3.0397   LearningRate 0.0133   Epoch: 12   Global Step: 157710   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:23:32,164-Speed 3293.41 samples/sec   Loss 3.0069   LearningRate 0.0133   Epoch: 12   Global Step: 157720   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:23:35,236-Speed 3335.15 samples/sec   Loss 3.0902   LearningRate 0.0133   Epoch: 12   Global Step: 157730   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:23:38,354-Speed 3284.69 samples/sec   Loss 3.0666   LearningRate 0.0133   Epoch: 12   Global Step: 157740   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:23:41,526-Speed 3229.17 samples/sec   Loss 3.0925   LearningRate 0.0133   Epoch: 12   Global Step: 157750   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:23:44,633-Speed 3296.51 samples/sec   Loss 3.1663   LearningRate 0.0133   Epoch: 12   Global Step: 157760   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:23:47,705-Speed 3334.13 samples/sec   Loss 3.0504   LearningRate 0.0133   Epoch: 12   Global Step: 157770   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:23:50,822-Speed 3286.90 samples/sec   Loss 3.0698   LearningRate 0.0133   Epoch: 12   Global Step: 157780   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:23:53,975-Speed 3248.50 samples/sec   Loss 3.1245   LearningRate 0.0133   Epoch: 12   Global Step: 157790   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:23:57,053-Speed 3328.44 samples/sec   Loss 3.0572   LearningRate 0.0133   Epoch: 12   Global Step: 157800   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:24:00,208-Speed 3246.15 samples/sec   Loss 2.9724   LearningRate 0.0133   Epoch: 12   Global Step: 157810   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:24:03,313-Speed 3299.91 samples/sec   Loss 3.1145   LearningRate 0.0133   Epoch: 12   Global Step: 157820   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:24:06,458-Speed 3256.65 samples/sec   Loss 3.0618   LearningRate 0.0133   Epoch: 12   Global Step: 157830   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:24:09,570-Speed 3291.73 samples/sec   Loss 3.0453   LearningRate 0.0133   Epoch: 12   Global Step: 157840   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:24:12,680-Speed 3293.28 samples/sec   Loss 3.0277   LearningRate 0.0133   Epoch: 12   Global Step: 157850   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:24:15,790-Speed 3293.74 samples/sec   Loss 3.0491   LearningRate 0.0133   Epoch: 12   Global Step: 157860   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:24:18,966-Speed 3224.77 samples/sec   Loss 3.0285   LearningRate 0.0133   Epoch: 12   Global Step: 157870   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:24:22,040-Speed 3333.17 samples/sec   Loss 3.1030   LearningRate 0.0133   Epoch: 12   Global Step: 157880   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:24:25,193-Speed 3248.67 samples/sec   Loss 3.0890   LearningRate 0.0133   Epoch: 12   Global Step: 157890   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:24:28,380-Speed 3213.80 samples/sec   Loss 3.0635   LearningRate 0.0133   Epoch: 12   Global Step: 157900   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:24:31,484-Speed 3299.68 samples/sec   Loss 3.1203   LearningRate 0.0133   Epoch: 12   Global Step: 157910   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:24:34,545-Speed 3346.68 samples/sec   Loss 3.0488   LearningRate 0.0133   Epoch: 12   Global Step: 157920   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:24:37,700-Speed 3246.31 samples/sec   Loss 3.0281   LearningRate 0.0133   Epoch: 12   Global Step: 157930   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:24:40,864-Speed 3238.07 samples/sec   Loss 3.0359   LearningRate 0.0133   Epoch: 12   Global Step: 157940   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:24:43,980-Speed 3286.71 samples/sec   Loss 3.0657   LearningRate 0.0133   Epoch: 12   Global Step: 157950   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:24:47,074-Speed 3310.49 samples/sec   Loss 3.0634   LearningRate 0.0133   Epoch: 12   Global Step: 157960   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:24:50,268-Speed 3207.19 samples/sec   Loss 3.0750   LearningRate 0.0133   Epoch: 12   Global Step: 157970   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:24:53,446-Speed 3222.89 samples/sec   Loss 3.0049   LearningRate 0.0133   Epoch: 12   Global Step: 157980   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:24:56,531-Speed 3320.00 samples/sec   Loss 3.0798   LearningRate 0.0133   Epoch: 12   Global Step: 157990   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:24:59,620-Speed 3316.26 samples/sec   Loss 3.0672   LearningRate 0.0132   Epoch: 12   Global Step: 158000   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:25:02,705-Speed 3320.34 samples/sec   Loss 3.1423   LearningRate 0.0132   Epoch: 12   Global Step: 158010   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:25:05,874-Speed 3232.73 samples/sec   Loss 3.1028   LearningRate 0.0132   Epoch: 12   Global Step: 158020   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:25:08,960-Speed 3319.28 samples/sec   Loss 3.0761   LearningRate 0.0132   Epoch: 12   Global Step: 158030   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:25:12,017-Speed 3349.97 samples/sec   Loss 3.0459   LearningRate 0.0132   Epoch: 12   Global Step: 158040   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:25:15,172-Speed 3247.46 samples/sec   Loss 3.1288   LearningRate 0.0132   Epoch: 12   Global Step: 158050   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:25:18,297-Speed 3277.81 samples/sec   Loss 3.0614   LearningRate 0.0132   Epoch: 12   Global Step: 158060   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:25:21,395-Speed 3306.04 samples/sec   Loss 3.0620   LearningRate 0.0132   Epoch: 12   Global Step: 158070   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:25:24,540-Speed 3256.99 samples/sec   Loss 3.0471   LearningRate 0.0132   Epoch: 12   Global Step: 158080   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:25:27,644-Speed 3300.12 samples/sec   Loss 3.0373   LearningRate 0.0132   Epoch: 12   Global Step: 158090   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:25:30,726-Speed 3324.10 samples/sec   Loss 3.0736   LearningRate 0.0132   Epoch: 12   Global Step: 158100   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:25:33,790-Speed 3342.54 samples/sec   Loss 3.0983   LearningRate 0.0132   Epoch: 12   Global Step: 158110   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:25:36,873-Speed 3322.54 samples/sec   Loss 3.0493   LearningRate 0.0132   Epoch: 12   Global Step: 158120   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:25:39,948-Speed 3331.49 samples/sec   Loss 3.0719   LearningRate 0.0132   Epoch: 12   Global Step: 158130   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:25:43,033-Speed 3320.33 samples/sec   Loss 3.0357   LearningRate 0.0132   Epoch: 12   Global Step: 158140   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:25:46,093-Speed 3347.43 samples/sec   Loss 3.0832   LearningRate 0.0132   Epoch: 12   Global Step: 158150   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:25:49,174-Speed 3324.66 samples/sec   Loss 3.1353   LearningRate 0.0132   Epoch: 12   Global Step: 158160   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:25:52,291-Speed 3286.11 samples/sec   Loss 3.0957   LearningRate 0.0132   Epoch: 12   Global Step: 158170   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:25:55,360-Speed 3337.52 samples/sec   Loss 3.0843   LearningRate 0.0132   Epoch: 12   Global Step: 158180   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:25:58,470-Speed 3294.25 samples/sec   Loss 3.1924   LearningRate 0.0132   Epoch: 12   Global Step: 158190   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:26:01,523-Speed 3355.48 samples/sec   Loss 3.1154   LearningRate 0.0132   Epoch: 12   Global Step: 158200   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:26:04,630-Speed 3296.13 samples/sec   Loss 3.0175   LearningRate 0.0132   Epoch: 12   Global Step: 158210   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:26:07,743-Speed 3291.40 samples/sec   Loss 3.0880   LearningRate 0.0132   Epoch: 12   Global Step: 158220   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:26:10,820-Speed 3328.15 samples/sec   Loss 3.1119   LearningRate 0.0132   Epoch: 12   Global Step: 158230   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:26:13,978-Speed 3243.79 samples/sec   Loss 3.1649   LearningRate 0.0132   Epoch: 12   Global Step: 158240   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:26:17,049-Speed 3335.36 samples/sec   Loss 3.0741   LearningRate 0.0132   Epoch: 12   Global Step: 158250   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:26:20,082-Speed 3376.96 samples/sec   Loss 2.9872   LearningRate 0.0132   Epoch: 12   Global Step: 158260   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:26:23,179-Speed 3307.72 samples/sec   Loss 2.9503   LearningRate 0.0132   Epoch: 12   Global Step: 158270   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:26:26,261-Speed 3324.42 samples/sec   Loss 3.0628   LearningRate 0.0132   Epoch: 12   Global Step: 158280   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:26:29,333-Speed 3334.26 samples/sec   Loss 3.0546   LearningRate 0.0132   Epoch: 12   Global Step: 158290   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:26:32,425-Speed 3312.90 samples/sec   Loss 3.0914   LearningRate 0.0132   Epoch: 12   Global Step: 158300   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:26:35,534-Speed 3293.96 samples/sec   Loss 3.1202   LearningRate 0.0132   Epoch: 12   Global Step: 158310   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:26:38,611-Speed 3329.47 samples/sec   Loss 3.0818   LearningRate 0.0132   Epoch: 12   Global Step: 158320   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:26:41,668-Speed 3350.74 samples/sec   Loss 3.0294   LearningRate 0.0132   Epoch: 12   Global Step: 158330   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:26:44,749-Speed 3323.91 samples/sec   Loss 2.9938   LearningRate 0.0131   Epoch: 12   Global Step: 158340   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:26:47,817-Speed 3338.97 samples/sec   Loss 3.1119   LearningRate 0.0131   Epoch: 12   Global Step: 158350   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:26:50,910-Speed 3312.60 samples/sec   Loss 3.0914   LearningRate 0.0131   Epoch: 12   Global Step: 158360   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:26:54,053-Speed 3258.79 samples/sec   Loss 3.0793   LearningRate 0.0131   Epoch: 12   Global Step: 158370   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:26:57,116-Speed 3344.41 samples/sec   Loss 3.0562   LearningRate 0.0131   Epoch: 12   Global Step: 158380   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:27:00,198-Speed 3322.51 samples/sec   Loss 3.0275   LearningRate 0.0131   Epoch: 12   Global Step: 158390   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:27:03,341-Speed 3259.79 samples/sec   Loss 3.0905   LearningRate 0.0131   Epoch: 12   Global Step: 158400   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:27:06,378-Speed 3372.67 samples/sec   Loss 3.1291   LearningRate 0.0131   Epoch: 12   Global Step: 158410   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:27:09,442-Speed 3342.80 samples/sec   Loss 3.0278   LearningRate 0.0131   Epoch: 12   Global Step: 158420   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:27:12,505-Speed 3345.02 samples/sec   Loss 3.0257   LearningRate 0.0131   Epoch: 12   Global Step: 158430   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:27:15,631-Speed 3276.65 samples/sec   Loss 3.0620   LearningRate 0.0131   Epoch: 12   Global Step: 158440   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:27:18,763-Speed 3270.58 samples/sec   Loss 3.0988   LearningRate 0.0131   Epoch: 12   Global Step: 158450   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:27:21,873-Speed 3293.49 samples/sec   Loss 3.0932   LearningRate 0.0131   Epoch: 12   Global Step: 158460   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:27:24,978-Speed 3298.34 samples/sec   Loss 3.0984   LearningRate 0.0131   Epoch: 12   Global Step: 158470   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:27:28,087-Speed 3295.23 samples/sec   Loss 3.0764   LearningRate 0.0131   Epoch: 12   Global Step: 158480   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:27:31,160-Speed 3332.88 samples/sec   Loss 3.0472   LearningRate 0.0131   Epoch: 12   Global Step: 158490   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:27:34,240-Speed 3325.63 samples/sec   Loss 3.0754   LearningRate 0.0131   Epoch: 12   Global Step: 158500   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:27:37,408-Speed 3234.20 samples/sec   Loss 3.0696   LearningRate 0.0131   Epoch: 12   Global Step: 158510   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:27:40,512-Speed 3299.54 samples/sec   Loss 3.1751   LearningRate 0.0131   Epoch: 12   Global Step: 158520   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:27:43,582-Speed 3336.60 samples/sec   Loss 3.0756   LearningRate 0.0131   Epoch: 12   Global Step: 158530   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:27:46,706-Speed 3279.42 samples/sec   Loss 3.0471   LearningRate 0.0131   Epoch: 12   Global Step: 158540   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:27:49,787-Speed 3324.31 samples/sec   Loss 3.0841   LearningRate 0.0131   Epoch: 12   Global Step: 158550   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:27:52,901-Speed 3289.55 samples/sec   Loss 3.1388   LearningRate 0.0131   Epoch: 12   Global Step: 158560   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:27:55,990-Speed 3316.56 samples/sec   Loss 3.0841   LearningRate 0.0131   Epoch: 12   Global Step: 158570   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:27:59,056-Speed 3340.45 samples/sec   Loss 3.0662   LearningRate 0.0131   Epoch: 12   Global Step: 158580   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:28:02,170-Speed 3290.05 samples/sec   Loss 3.1537   LearningRate 0.0131   Epoch: 12   Global Step: 158590   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:28:05,239-Speed 3337.85 samples/sec   Loss 3.0511   LearningRate 0.0131   Epoch: 12   Global Step: 158600   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:28:08,387-Speed 3253.90 samples/sec   Loss 3.0554   LearningRate 0.0131   Epoch: 12   Global Step: 158610   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:28:11,460-Speed 3333.74 samples/sec   Loss 3.1133   LearningRate 0.0131   Epoch: 12   Global Step: 158620   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:28:14,562-Speed 3301.50 samples/sec   Loss 3.1095   LearningRate 0.0131   Epoch: 12   Global Step: 158630   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:28:17,680-Speed 3285.07 samples/sec   Loss 3.0286   LearningRate 0.0131   Epoch: 12   Global Step: 158640   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:28:20,744-Speed 3342.75 samples/sec   Loss 3.0265   LearningRate 0.0131   Epoch: 12   Global Step: 158650   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:28:23,824-Speed 3326.51 samples/sec   Loss 3.0758   LearningRate 0.0131   Epoch: 12   Global Step: 158660   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:28:26,920-Speed 3308.58 samples/sec   Loss 3.1695   LearningRate 0.0131   Epoch: 12   Global Step: 158670   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:28:30,089-Speed 3231.72 samples/sec   Loss 3.1147   LearningRate 0.0130   Epoch: 12   Global Step: 158680   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:28:33,165-Speed 3330.38 samples/sec   Loss 3.0534   LearningRate 0.0130   Epoch: 12   Global Step: 158690   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:28:36,268-Speed 3300.92 samples/sec   Loss 3.0559   LearningRate 0.0130   Epoch: 12   Global Step: 158700   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:28:39,391-Speed 3280.00 samples/sec   Loss 3.0391   LearningRate 0.0130   Epoch: 12   Global Step: 158710   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:28:42,478-Speed 3318.15 samples/sec   Loss 3.0592   LearningRate 0.0130   Epoch: 12   Global Step: 158720   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:28:45,536-Speed 3349.75 samples/sec   Loss 2.9887   LearningRate 0.0130   Epoch: 12   Global Step: 158730   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:28:48,660-Speed 3279.17 samples/sec   Loss 3.0380   LearningRate 0.0130   Epoch: 12   Global Step: 158740   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:28:51,773-Speed 3289.92 samples/sec   Loss 3.0926   LearningRate 0.0130   Epoch: 12   Global Step: 158750   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:28:54,844-Speed 3335.14 samples/sec   Loss 3.1240   LearningRate 0.0130   Epoch: 12   Global Step: 158760   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:28:57,903-Speed 3348.71 samples/sec   Loss 3.0965   LearningRate 0.0130   Epoch: 12   Global Step: 158770   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:29:00,982-Speed 3326.48 samples/sec   Loss 3.0804   LearningRate 0.0130   Epoch: 12   Global Step: 158780   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:29:04,047-Speed 3342.25 samples/sec   Loss 2.9991   LearningRate 0.0130   Epoch: 12   Global Step: 158790   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:29:07,122-Speed 3331.30 samples/sec   Loss 3.0460   LearningRate 0.0130   Epoch: 12   Global Step: 158800   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:29:10,193-Speed 3335.74 samples/sec   Loss 3.0970   LearningRate 0.0130   Epoch: 12   Global Step: 158810   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:29:13,365-Speed 3229.19 samples/sec   Loss 3.0430   LearningRate 0.0130   Epoch: 12   Global Step: 158820   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:29:16,422-Speed 3350.53 samples/sec   Loss 3.1144   LearningRate 0.0130   Epoch: 12   Global Step: 158830   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:29:19,506-Speed 3321.99 samples/sec   Loss 3.0369   LearningRate 0.0130   Epoch: 12   Global Step: 158840   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:29:22,620-Speed 3289.03 samples/sec   Loss 3.1016   LearningRate 0.0130   Epoch: 12   Global Step: 158850   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:29:25,721-Speed 3303.19 samples/sec   Loss 3.0306   LearningRate 0.0130   Epoch: 12   Global Step: 158860   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:29:28,800-Speed 3327.36 samples/sec   Loss 3.0176   LearningRate 0.0130   Epoch: 12   Global Step: 158870   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:29:31,878-Speed 3327.21 samples/sec   Loss 3.0849   LearningRate 0.0130   Epoch: 12   Global Step: 158880   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:29:34,981-Speed 3301.78 samples/sec   Loss 3.0670   LearningRate 0.0130   Epoch: 12   Global Step: 158890   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:29:38,038-Speed 3350.00 samples/sec   Loss 3.1731   LearningRate 0.0130   Epoch: 12   Global Step: 158900   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:29:41,099-Speed 3346.31 samples/sec   Loss 2.9496   LearningRate 0.0130   Epoch: 12   Global Step: 158910   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:29:44,164-Speed 3343.10 samples/sec   Loss 3.1081   LearningRate 0.0130   Epoch: 12   Global Step: 158920   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:29:47,247-Speed 3322.28 samples/sec   Loss 3.0773   LearningRate 0.0130   Epoch: 12   Global Step: 158930   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:29:50,346-Speed 3305.12 samples/sec   Loss 3.1128   LearningRate 0.0130   Epoch: 12   Global Step: 158940   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:29:53,441-Speed 3309.71 samples/sec   Loss 3.0316   LearningRate 0.0130   Epoch: 12   Global Step: 158950   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:29:56,507-Speed 3341.21 samples/sec   Loss 3.1596   LearningRate 0.0130   Epoch: 12   Global Step: 158960   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:29:59,595-Speed 3316.18 samples/sec   Loss 3.0224   LearningRate 0.0130   Epoch: 12   Global Step: 158970   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:30:02,737-Speed 3259.99 samples/sec   Loss 3.0825   LearningRate 0.0130   Epoch: 12   Global Step: 158980   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:30:05,834-Speed 3307.64 samples/sec   Loss 3.0374   LearningRate 0.0130   Epoch: 12   Global Step: 158990   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:30:08,898-Speed 3343.69 samples/sec   Loss 3.1147   LearningRate 0.0130   Epoch: 12   Global Step: 159000   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:30:11,972-Speed 3331.93 samples/sec   Loss 3.1034   LearningRate 0.0130   Epoch: 12   Global Step: 159010   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:30:15,038-Speed 3341.01 samples/sec   Loss 3.0552   LearningRate 0.0130   Epoch: 12   Global Step: 159020   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:30:18,135-Speed 3307.55 samples/sec   Loss 3.0335   LearningRate 0.0129   Epoch: 12   Global Step: 159030   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:30:21,216-Speed 3323.93 samples/sec   Loss 3.1429   LearningRate 0.0129   Epoch: 12   Global Step: 159040   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:30:24,315-Speed 3305.82 samples/sec   Loss 3.1022   LearningRate 0.0129   Epoch: 12   Global Step: 159050   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:30:27,369-Speed 3354.67 samples/sec   Loss 3.1371   LearningRate 0.0129   Epoch: 12   Global Step: 159060   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:30:30,422-Speed 3354.64 samples/sec   Loss 3.0705   LearningRate 0.0129   Epoch: 12   Global Step: 159070   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:30:33,498-Speed 3330.27 samples/sec   Loss 3.1354   LearningRate 0.0129   Epoch: 12   Global Step: 159080   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:30:36,570-Speed 3334.06 samples/sec   Loss 3.0738   LearningRate 0.0129   Epoch: 12   Global Step: 159090   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:30:39,676-Speed 3298.31 samples/sec   Loss 3.1703   LearningRate 0.0129   Epoch: 12   Global Step: 159100   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:30:42,725-Speed 3359.10 samples/sec   Loss 3.0672   LearningRate 0.0129   Epoch: 12   Global Step: 159110   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:30:45,799-Speed 3332.46 samples/sec   Loss 3.1511   LearningRate 0.0129   Epoch: 12   Global Step: 159120   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:30:48,910-Speed 3292.92 samples/sec   Loss 3.1934   LearningRate 0.0129   Epoch: 12   Global Step: 159130   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:30:52,050-Speed 3261.76 samples/sec   Loss 3.0873   LearningRate 0.0129   Epoch: 12   Global Step: 159140   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:30:55,176-Speed 3276.99 samples/sec   Loss 3.0326   LearningRate 0.0129   Epoch: 12   Global Step: 159150   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:30:58,336-Speed 3241.47 samples/sec   Loss 3.0893   LearningRate 0.0129   Epoch: 12   Global Step: 159160   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:31:01,450-Speed 3290.37 samples/sec   Loss 3.0933   LearningRate 0.0129   Epoch: 12   Global Step: 159170   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:31:04,582-Speed 3270.66 samples/sec   Loss 3.0400   LearningRate 0.0129   Epoch: 12   Global Step: 159180   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:31:07,658-Speed 3329.47 samples/sec   Loss 3.0958   LearningRate 0.0129   Epoch: 12   Global Step: 159190   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:31:10,770-Speed 3291.64 samples/sec   Loss 3.0008   LearningRate 0.0129   Epoch: 12   Global Step: 159200   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:31:13,856-Speed 3319.86 samples/sec   Loss 3.0294   LearningRate 0.0129   Epoch: 12   Global Step: 159210   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:31:16,966-Speed 3293.73 samples/sec   Loss 3.0991   LearningRate 0.0129   Epoch: 12   Global Step: 159220   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:31:20,089-Speed 3279.80 samples/sec   Loss 3.0614   LearningRate 0.0129   Epoch: 12   Global Step: 159230   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:31:23,183-Speed 3310.84 samples/sec   Loss 3.0318   LearningRate 0.0129   Epoch: 12   Global Step: 159240   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:31:26,275-Speed 3312.09 samples/sec   Loss 3.0898   LearningRate 0.0129   Epoch: 12   Global Step: 159250   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:31:29,392-Speed 3286.84 samples/sec   Loss 3.1011   LearningRate 0.0129   Epoch: 12   Global Step: 159260   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:31:32,531-Speed 3262.99 samples/sec   Loss 3.0861   LearningRate 0.0129   Epoch: 12   Global Step: 159270   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:31:35,614-Speed 3322.60 samples/sec   Loss 3.0388   LearningRate 0.0129   Epoch: 12   Global Step: 159280   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:31:38,669-Speed 3353.47 samples/sec   Loss 3.0838   LearningRate 0.0129   Epoch: 12   Global Step: 159290   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:31:41,764-Speed 3308.91 samples/sec   Loss 3.0936   LearningRate 0.0129   Epoch: 12   Global Step: 159300   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:31:44,890-Speed 3276.56 samples/sec   Loss 3.0658   LearningRate 0.0129   Epoch: 12   Global Step: 159310   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:31:48,028-Speed 3264.42 samples/sec   Loss 3.0170   LearningRate 0.0129   Epoch: 12   Global Step: 159320   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:31:51,193-Speed 3236.80 samples/sec   Loss 3.1224   LearningRate 0.0129   Epoch: 12   Global Step: 159330   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:31:54,283-Speed 3315.60 samples/sec   Loss 3.0886   LearningRate 0.0129   Epoch: 12   Global Step: 159340   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:31:57,372-Speed 3315.58 samples/sec   Loss 3.1238   LearningRate 0.0129   Epoch: 12   Global Step: 159350   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:32:00,458-Speed 3318.75 samples/sec   Loss 3.1079   LearningRate 0.0129   Epoch: 12   Global Step: 159360   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:32:03,590-Speed 3270.48 samples/sec   Loss 3.1814   LearningRate 0.0128   Epoch: 12   Global Step: 159370   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:32:06,760-Speed 3231.72 samples/sec   Loss 3.0700   LearningRate 0.0128   Epoch: 12   Global Step: 159380   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:32:09,850-Speed 3314.57 samples/sec   Loss 3.1149   LearningRate 0.0128   Epoch: 12   Global Step: 159390   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:32:12,970-Speed 3283.47 samples/sec   Loss 3.0846   LearningRate 0.0128   Epoch: 12   Global Step: 159400   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:32:16,115-Speed 3256.71 samples/sec   Loss 3.0875   LearningRate 0.0128   Epoch: 12   Global Step: 159410   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:32:19,283-Speed 3234.18 samples/sec   Loss 3.0800   LearningRate 0.0128   Epoch: 12   Global Step: 159420   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:32:22,341-Speed 3349.07 samples/sec   Loss 3.0719   LearningRate 0.0128   Epoch: 12   Global Step: 159430   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:32:25,460-Speed 3284.56 samples/sec   Loss 3.0380   LearningRate 0.0128   Epoch: 12   Global Step: 159440   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:32:28,555-Speed 3310.03 samples/sec   Loss 3.0450   LearningRate 0.0128   Epoch: 12   Global Step: 159450   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:32:31,712-Speed 3243.98 samples/sec   Loss 3.0523   LearningRate 0.0128   Epoch: 12   Global Step: 159460   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:32:34,834-Speed 3281.24 samples/sec   Loss 3.0974   LearningRate 0.0128   Epoch: 12   Global Step: 159470   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:32:37,899-Speed 3341.69 samples/sec   Loss 3.0322   LearningRate 0.0128   Epoch: 12   Global Step: 159480   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:32:41,007-Speed 3296.67 samples/sec   Loss 3.0956   LearningRate 0.0128   Epoch: 12   Global Step: 159490   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:32:44,137-Speed 3272.76 samples/sec   Loss 2.9946   LearningRate 0.0128   Epoch: 12   Global Step: 159500   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:32:47,220-Speed 3321.70 samples/sec   Loss 3.0755   LearningRate 0.0128   Epoch: 12   Global Step: 159510   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:32:50,299-Speed 3327.71 samples/sec   Loss 3.0423   LearningRate 0.0128   Epoch: 12   Global Step: 159520   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:32:53,436-Speed 3264.99 samples/sec   Loss 3.0563   LearningRate 0.0128   Epoch: 12   Global Step: 159530   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 15:32:56,492-Speed 3352.19 samples/sec   Loss 3.0476   LearningRate 0.0128   Epoch: 12   Global Step: 159540   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 15:32:59,637-Speed 3256.22 samples/sec   Loss 2.9970   LearningRate 0.0128   Epoch: 12   Global Step: 159550   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 15:33:02,756-Speed 3284.29 samples/sec   Loss 3.1747   LearningRate 0.0128   Epoch: 12   Global Step: 159560   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:33:05,804-Speed 3360.64 samples/sec   Loss 3.0420   LearningRate 0.0128   Epoch: 12   Global Step: 159570   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:33:08,846-Speed 3367.41 samples/sec   Loss 3.0729   LearningRate 0.0128   Epoch: 12   Global Step: 159580   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:33:11,910-Speed 3342.67 samples/sec   Loss 3.0862   LearningRate 0.0128   Epoch: 12   Global Step: 159590   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:33:15,019-Speed 3295.05 samples/sec   Loss 3.1653   LearningRate 0.0128   Epoch: 12   Global Step: 159600   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:33:18,123-Speed 3299.92 samples/sec   Loss 3.0497   LearningRate 0.0128   Epoch: 12   Global Step: 159610   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:33:21,181-Speed 3349.96 samples/sec   Loss 3.1297   LearningRate 0.0128   Epoch: 12   Global Step: 159620   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:33:24,299-Speed 3284.84 samples/sec   Loss 3.1048   LearningRate 0.0128   Epoch: 12   Global Step: 159630   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:33:27,472-Speed 3228.67 samples/sec   Loss 3.0446   LearningRate 0.0128   Epoch: 12   Global Step: 159640   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:33:30,570-Speed 3306.58 samples/sec   Loss 3.0099   LearningRate 0.0128   Epoch: 12   Global Step: 159650   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:33:33,639-Speed 3337.23 samples/sec   Loss 3.1631   LearningRate 0.0128   Epoch: 12   Global Step: 159660   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:33:36,739-Speed 3304.32 samples/sec   Loss 3.0700   LearningRate 0.0128   Epoch: 12   Global Step: 159670   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:33:39,854-Speed 3288.81 samples/sec   Loss 3.1651   LearningRate 0.0128   Epoch: 12   Global Step: 159680   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:33:43,004-Speed 3251.55 samples/sec   Loss 3.0552   LearningRate 0.0128   Epoch: 12   Global Step: 159690   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:33:46,068-Speed 3343.29 samples/sec   Loss 3.0794   LearningRate 0.0128   Epoch: 12   Global Step: 159700   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:33:49,224-Speed 3245.32 samples/sec   Loss 3.2013   LearningRate 0.0128   Epoch: 12   Global Step: 159710   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:33:52,347-Speed 3279.87 samples/sec   Loss 3.0908   LearningRate 0.0127   Epoch: 12   Global Step: 159720   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:33:55,511-Speed 3237.58 samples/sec   Loss 3.1248   LearningRate 0.0127   Epoch: 12   Global Step: 159730   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:33:58,636-Speed 3278.04 samples/sec   Loss 3.0698   LearningRate 0.0127   Epoch: 12   Global Step: 159740   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:34:01,781-Speed 3256.85 samples/sec   Loss 3.1156   LearningRate 0.0127   Epoch: 12   Global Step: 159750   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:34:04,886-Speed 3299.06 samples/sec   Loss 3.0641   LearningRate 0.0127   Epoch: 12   Global Step: 159760   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:34:07,985-Speed 3304.87 samples/sec   Loss 3.1298   LearningRate 0.0127   Epoch: 12   Global Step: 159770   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:34:11,105-Speed 3283.71 samples/sec   Loss 2.9962   LearningRate 0.0127   Epoch: 12   Global Step: 159780   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-04-27 15:34:14,252-Speed 3254.27 samples/sec   Loss 3.1429   LearningRate 0.0127   Epoch: 12   Global Step: 159790   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:34:17,435-Speed 3218.86 samples/sec   Loss 3.0672   LearningRate 0.0127   Epoch: 12   Global Step: 159800   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:34:20,521-Speed 3319.36 samples/sec   Loss 3.0489   LearningRate 0.0127   Epoch: 12   Global Step: 159810   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:34:23,558-Speed 3372.32 samples/sec   Loss 3.1568   LearningRate 0.0127   Epoch: 12   Global Step: 159820   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:34:26,628-Speed 3337.58 samples/sec   Loss 2.9713   LearningRate 0.0127   Epoch: 12   Global Step: 159830   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:34:29,692-Speed 3342.30 samples/sec   Loss 3.1319   LearningRate 0.0127   Epoch: 12   Global Step: 159840   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:34:32,758-Speed 3341.17 samples/sec   Loss 3.0236   LearningRate 0.0127   Epoch: 12   Global Step: 159850   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:34:35,914-Speed 3245.53 samples/sec   Loss 3.0719   LearningRate 0.0127   Epoch: 12   Global Step: 159860   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:34:39,019-Speed 3298.55 samples/sec   Loss 3.0367   LearningRate 0.0127   Epoch: 12   Global Step: 159870   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:34:42,241-Speed 3179.56 samples/sec   Loss 3.0752   LearningRate 0.0127   Epoch: 12   Global Step: 159880   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:34:45,294-Speed 3354.48 samples/sec   Loss 3.0089   LearningRate 0.0127   Epoch: 12   Global Step: 159890   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:34:48,382-Speed 3317.67 samples/sec   Loss 3.0953   LearningRate 0.0127   Epoch: 12   Global Step: 159900   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:34:51,565-Speed 3217.90 samples/sec   Loss 3.0461   LearningRate 0.0127   Epoch: 12   Global Step: 159910   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:34:54,672-Speed 3297.07 samples/sec   Loss 3.0086   LearningRate 0.0127   Epoch: 12   Global Step: 159920   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:34:57,769-Speed 3307.66 samples/sec   Loss 3.0565   LearningRate 0.0127   Epoch: 12   Global Step: 159930   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:35:00,824-Speed 3353.24 samples/sec   Loss 3.1050   LearningRate 0.0127   Epoch: 12   Global Step: 159940   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:35:04,002-Speed 3222.99 samples/sec   Loss 3.1036   LearningRate 0.0127   Epoch: 12   Global Step: 159950   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:35:07,183-Speed 3220.52 samples/sec   Loss 3.1181   LearningRate 0.0127   Epoch: 12   Global Step: 159960   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:35:10,274-Speed 3313.91 samples/sec   Loss 3.2073   LearningRate 0.0127   Epoch: 12   Global Step: 159970   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:35:13,331-Speed 3349.90 samples/sec   Loss 3.0584   LearningRate 0.0127   Epoch: 12   Global Step: 159980   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:35:16,526-Speed 3206.66 samples/sec   Loss 3.1236   LearningRate 0.0127   Epoch: 12   Global Step: 159990   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:35:19,616-Speed 3314.18 samples/sec   Loss 3.0552   LearningRate 0.0127   Epoch: 12   Global Step: 160000   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:35:22,699-Speed 3323.37 samples/sec   Loss 3.0458   LearningRate 0.0127   Epoch: 12   Global Step: 160010   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:35:25,911-Speed 3188.88 samples/sec   Loss 3.1214   LearningRate 0.0127   Epoch: 12   Global Step: 160020   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:35:29,048-Speed 3264.88 samples/sec   Loss 3.0933   LearningRate 0.0127   Epoch: 12   Global Step: 160030   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:35:32,155-Speed 3296.57 samples/sec   Loss 3.0103   LearningRate 0.0127   Epoch: 12   Global Step: 160040   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:35:35,279-Speed 3279.68 samples/sec   Loss 3.0584   LearningRate 0.0127   Epoch: 12   Global Step: 160050   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:35:38,372-Speed 3311.68 samples/sec   Loss 3.0522   LearningRate 0.0127   Epoch: 12   Global Step: 160060   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:35:41,490-Speed 3284.63 samples/sec   Loss 3.0623   LearningRate 0.0126   Epoch: 12   Global Step: 160070   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:35:44,607-Speed 3286.60 samples/sec   Loss 3.0742   LearningRate 0.0126   Epoch: 12   Global Step: 160080   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:35:47,746-Speed 3262.54 samples/sec   Loss 3.1185   LearningRate 0.0126   Epoch: 12   Global Step: 160090   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:35:50,873-Speed 3276.48 samples/sec   Loss 3.1073   LearningRate 0.0126   Epoch: 12   Global Step: 160100   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:35:54,030-Speed 3243.70 samples/sec   Loss 3.1296   LearningRate 0.0126   Epoch: 12   Global Step: 160110   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:35:57,950-Speed 2613.23 samples/sec   Loss 3.0945   LearningRate 0.0126   Epoch: 12   Global Step: 160120   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:36:00,998-Speed 3360.75 samples/sec   Loss 3.1317   LearningRate 0.0126   Epoch: 12   Global Step: 160130   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:36:04,186-Speed 3212.95 samples/sec   Loss 3.1397   LearningRate 0.0126   Epoch: 12   Global Step: 160140   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:36:07,271-Speed 3320.34 samples/sec   Loss 3.1383   LearningRate 0.0126   Epoch: 12   Global Step: 160150   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:36:10,417-Speed 3256.12 samples/sec   Loss 3.1087   LearningRate 0.0126   Epoch: 12   Global Step: 160160   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:36:13,569-Speed 3250.32 samples/sec   Loss 3.0972   LearningRate 0.0126   Epoch: 12   Global Step: 160170   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:36:16,692-Speed 3279.01 samples/sec   Loss 3.0807   LearningRate 0.0126   Epoch: 12   Global Step: 160180   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:36:19,778-Speed 3319.72 samples/sec   Loss 3.1069   LearningRate 0.0126   Epoch: 12   Global Step: 160190   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:36:22,859-Speed 3324.67 samples/sec   Loss 3.0845   LearningRate 0.0126   Epoch: 12   Global Step: 160200   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:36:25,971-Speed 3291.56 samples/sec   Loss 3.1270   LearningRate 0.0126   Epoch: 12   Global Step: 160210   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:36:29,077-Speed 3297.83 samples/sec   Loss 3.0827   LearningRate 0.0126   Epoch: 12   Global Step: 160220   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:36:32,177-Speed 3304.00 samples/sec   Loss 3.1021   LearningRate 0.0126   Epoch: 12   Global Step: 160230   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:36:35,253-Speed 3331.00 samples/sec   Loss 3.0685   LearningRate 0.0126   Epoch: 12   Global Step: 160240   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:36:38,354-Speed 3302.79 samples/sec   Loss 3.1909   LearningRate 0.0126   Epoch: 12   Global Step: 160250   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:36:41,508-Speed 3247.46 samples/sec   Loss 3.0597   LearningRate 0.0126   Epoch: 12   Global Step: 160260   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:36:44,628-Speed 3283.52 samples/sec   Loss 3.0525   LearningRate 0.0126   Epoch: 12   Global Step: 160270   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:36:47,723-Speed 3309.61 samples/sec   Loss 3.1375   LearningRate 0.0126   Epoch: 12   Global Step: 160280   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:36:50,824-Speed 3303.21 samples/sec   Loss 3.1303   LearningRate 0.0126   Epoch: 12   Global Step: 160290   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:36:53,897-Speed 3332.32 samples/sec   Loss 3.0288   LearningRate 0.0126   Epoch: 12   Global Step: 160300   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:36:56,975-Speed 3328.18 samples/sec   Loss 3.0740   LearningRate 0.0126   Epoch: 12   Global Step: 160310   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:37:00,093-Speed 3285.50 samples/sec   Loss 3.0254   LearningRate 0.0126   Epoch: 12   Global Step: 160320   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:37:03,182-Speed 3316.38 samples/sec   Loss 3.0718   LearningRate 0.0126   Epoch: 12   Global Step: 160330   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:37:06,282-Speed 3303.70 samples/sec   Loss 3.1212   LearningRate 0.0126   Epoch: 12   Global Step: 160340   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:37:09,351-Speed 3337.88 samples/sec   Loss 3.1695   LearningRate 0.0126   Epoch: 12   Global Step: 160350   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:37:12,472-Speed 3281.85 samples/sec   Loss 3.1416   LearningRate 0.0126   Epoch: 12   Global Step: 160360   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:37:15,596-Speed 3279.58 samples/sec   Loss 3.0916   LearningRate 0.0126   Epoch: 12   Global Step: 160370   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:37:18,702-Speed 3297.40 samples/sec   Loss 3.0547   LearningRate 0.0126   Epoch: 12   Global Step: 160380   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:37:21,778-Speed 3330.88 samples/sec   Loss 3.0955   LearningRate 0.0126   Epoch: 12   Global Step: 160390   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:37:24,911-Speed 3269.53 samples/sec   Loss 3.0061   LearningRate 0.0126   Epoch: 12   Global Step: 160400   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:37:28,041-Speed 3271.79 samples/sec   Loss 3.0548   LearningRate 0.0126   Epoch: 12   Global Step: 160410   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:37:31,134-Speed 3311.89 samples/sec   Loss 3.0504   LearningRate 0.0125   Epoch: 12   Global Step: 160420   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:37:34,261-Speed 3275.64 samples/sec   Loss 3.1051   LearningRate 0.0125   Epoch: 12   Global Step: 160430   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:37:37,398-Speed 3264.83 samples/sec   Loss 3.0205   LearningRate 0.0125   Epoch: 12   Global Step: 160440   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:37:40,509-Speed 3292.71 samples/sec   Loss 3.1122   LearningRate 0.0125   Epoch: 12   Global Step: 160450   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:37:43,666-Speed 3244.61 samples/sec   Loss 3.0781   LearningRate 0.0125   Epoch: 12   Global Step: 160460   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:37:46,731-Speed 3341.91 samples/sec   Loss 2.9825   LearningRate 0.0125   Epoch: 12   Global Step: 160470   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:37:49,820-Speed 3315.93 samples/sec   Loss 3.1080   LearningRate 0.0125   Epoch: 12   Global Step: 160480   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:37:52,908-Speed 3317.58 samples/sec   Loss 3.1635   LearningRate 0.0125   Epoch: 12   Global Step: 160490   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:37:55,996-Speed 3317.06 samples/sec   Loss 3.0909   LearningRate 0.0125   Epoch: 12   Global Step: 160500   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:37:59,045-Speed 3359.68 samples/sec   Loss 3.0762   LearningRate 0.0125   Epoch: 12   Global Step: 160510   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:38:02,188-Speed 3259.09 samples/sec   Loss 3.0747   LearningRate 0.0125   Epoch: 12   Global Step: 160520   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:38:05,293-Speed 3298.80 samples/sec   Loss 3.0297   LearningRate 0.0125   Epoch: 12   Global Step: 160530   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:38:08,398-Speed 3300.00 samples/sec   Loss 3.0994   LearningRate 0.0125   Epoch: 12   Global Step: 160540   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:38:11,443-Speed 3363.46 samples/sec   Loss 3.0825   LearningRate 0.0125   Epoch: 12   Global Step: 160550   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:38:14,618-Speed 3226.06 samples/sec   Loss 3.1101   LearningRate 0.0125   Epoch: 12   Global Step: 160560   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:38:17,779-Speed 3240.37 samples/sec   Loss 3.0330   LearningRate 0.0125   Epoch: 12   Global Step: 160570   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:38:20,877-Speed 3306.81 samples/sec   Loss 3.1700   LearningRate 0.0125   Epoch: 12   Global Step: 160580   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:38:24,016-Speed 3262.90 samples/sec   Loss 3.0706   LearningRate 0.0125   Epoch: 12   Global Step: 160590   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:38:27,139-Speed 3279.72 samples/sec   Loss 3.0651   LearningRate 0.0125   Epoch: 12   Global Step: 160600   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:38:30,239-Speed 3304.61 samples/sec   Loss 3.1250   LearningRate 0.0125   Epoch: 12   Global Step: 160610   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:38:33,356-Speed 3286.63 samples/sec   Loss 3.1049   LearningRate 0.0125   Epoch: 12   Global Step: 160620   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:38:36,529-Speed 3227.90 samples/sec   Loss 3.1482   LearningRate 0.0125   Epoch: 12   Global Step: 160630   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:38:39,667-Speed 3264.50 samples/sec   Loss 3.0531   LearningRate 0.0125   Epoch: 12   Global Step: 160640   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:38:42,782-Speed 3288.19 samples/sec   Loss 3.0874   LearningRate 0.0125   Epoch: 12   Global Step: 160650   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:38:45,866-Speed 3321.06 samples/sec   Loss 3.0190   LearningRate 0.0125   Epoch: 12   Global Step: 160660   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:38:48,952-Speed 3319.21 samples/sec   Loss 3.1340   LearningRate 0.0125   Epoch: 12   Global Step: 160670   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:38:52,149-Speed 3204.32 samples/sec   Loss 3.0827   LearningRate 0.0125   Epoch: 12   Global Step: 160680   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:38:55,235-Speed 3319.15 samples/sec   Loss 3.0783   LearningRate 0.0125   Epoch: 12   Global Step: 160690   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:38:58,338-Speed 3301.65 samples/sec   Loss 3.0789   LearningRate 0.0125   Epoch: 12   Global Step: 160700   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:39:01,467-Speed 3272.63 samples/sec   Loss 3.0998   LearningRate 0.0125   Epoch: 12   Global Step: 160710   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:39:04,634-Speed 3235.31 samples/sec   Loss 2.9887   LearningRate 0.0125   Epoch: 12   Global Step: 160720   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:39:07,803-Speed 3232.13 samples/sec   Loss 3.0415   LearningRate 0.0125   Epoch: 12   Global Step: 160730   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:39:10,855-Speed 3355.48 samples/sec   Loss 3.0837   LearningRate 0.0125   Epoch: 12   Global Step: 160740   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:39:13,935-Speed 3326.44 samples/sec   Loss 3.1265   LearningRate 0.0125   Epoch: 12   Global Step: 160750   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:39:17,041-Speed 3297.71 samples/sec   Loss 3.1030   LearningRate 0.0125   Epoch: 12   Global Step: 160760   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:39:20,153-Speed 3291.37 samples/sec   Loss 3.1317   LearningRate 0.0124   Epoch: 12   Global Step: 160770   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:39:23,325-Speed 3229.95 samples/sec   Loss 3.1366   LearningRate 0.0124   Epoch: 12   Global Step: 160780   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:39:26,463-Speed 3263.35 samples/sec   Loss 3.0548   LearningRate 0.0124   Epoch: 12   Global Step: 160790   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:39:29,600-Speed 3265.38 samples/sec   Loss 3.0861   LearningRate 0.0124   Epoch: 12   Global Step: 160800   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:39:32,728-Speed 3275.02 samples/sec   Loss 3.1266   LearningRate 0.0124   Epoch: 12   Global Step: 160810   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:39:35,864-Speed 3265.98 samples/sec   Loss 3.0540   LearningRate 0.0124   Epoch: 12   Global Step: 160820   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:39:39,004-Speed 3261.82 samples/sec   Loss 3.0864   LearningRate 0.0124   Epoch: 12   Global Step: 160830   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:39:42,111-Speed 3298.00 samples/sec   Loss 3.1325   LearningRate 0.0124   Epoch: 12   Global Step: 160840   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:39:45,158-Speed 3361.50 samples/sec   Loss 3.1648   LearningRate 0.0124   Epoch: 12   Global Step: 160850   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:39:48,270-Speed 3291.36 samples/sec   Loss 3.1432   LearningRate 0.0124   Epoch: 12   Global Step: 160860   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:39:51,366-Speed 3308.50 samples/sec   Loss 3.0467   LearningRate 0.0124   Epoch: 12   Global Step: 160870   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:39:54,472-Speed 3298.22 samples/sec   Loss 3.0363   LearningRate 0.0124   Epoch: 12   Global Step: 160880   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:39:57,596-Speed 3279.15 samples/sec   Loss 3.0877   LearningRate 0.0124   Epoch: 12   Global Step: 160890   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:40:00,700-Speed 3299.63 samples/sec   Loss 3.1358   LearningRate 0.0124   Epoch: 12   Global Step: 160900   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:40:03,807-Speed 3297.48 samples/sec   Loss 3.0893   LearningRate 0.0124   Epoch: 12   Global Step: 160910   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:40:06,934-Speed 3275.39 samples/sec   Loss 3.0695   LearningRate 0.0124   Epoch: 12   Global Step: 160920   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:40:10,050-Speed 3287.10 samples/sec   Loss 3.1040   LearningRate 0.0124   Epoch: 12   Global Step: 160930   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:40:13,192-Speed 3260.10 samples/sec   Loss 3.1146   LearningRate 0.0124   Epoch: 12   Global Step: 160940   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:40:16,330-Speed 3264.00 samples/sec   Loss 3.0738   LearningRate 0.0124   Epoch: 12   Global Step: 160950   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:40:19,432-Speed 3302.82 samples/sec   Loss 3.1666   LearningRate 0.0124   Epoch: 12   Global Step: 160960   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:40:22,529-Speed 3307.14 samples/sec   Loss 3.0618   LearningRate 0.0124   Epoch: 12   Global Step: 160970   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:40:25,588-Speed 3349.29 samples/sec   Loss 3.1043   LearningRate 0.0124   Epoch: 12   Global Step: 160980   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:40:28,665-Speed 3328.52 samples/sec   Loss 3.1140   LearningRate 0.0124   Epoch: 12   Global Step: 160990   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:40:31,767-Speed 3302.18 samples/sec   Loss 3.0053   LearningRate 0.0124   Epoch: 12   Global Step: 161000   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:40:34,826-Speed 3348.78 samples/sec   Loss 3.0604   LearningRate 0.0124   Epoch: 12   Global Step: 161010   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:40:37,998-Speed 3229.25 samples/sec   Loss 3.1041   LearningRate 0.0124   Epoch: 12   Global Step: 161020   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:40:41,066-Speed 3338.96 samples/sec   Loss 3.0701   LearningRate 0.0124   Epoch: 12   Global Step: 161030   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:40:44,127-Speed 3345.87 samples/sec   Loss 2.9988   LearningRate 0.0124   Epoch: 12   Global Step: 161040   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:40:47,269-Speed 3260.43 samples/sec   Loss 3.0991   LearningRate 0.0124   Epoch: 12   Global Step: 161050   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:40:50,365-Speed 3308.34 samples/sec   Loss 3.1037   LearningRate 0.0124   Epoch: 12   Global Step: 161060   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:40:53,556-Speed 3209.86 samples/sec   Loss 3.1231   LearningRate 0.0124   Epoch: 12   Global Step: 161070   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:40:56,665-Speed 3294.69 samples/sec   Loss 3.0184   LearningRate 0.0124   Epoch: 12   Global Step: 161080   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:40:59,810-Speed 3257.68 samples/sec   Loss 3.1349   LearningRate 0.0124   Epoch: 12   Global Step: 161090   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:41:02,944-Speed 3267.38 samples/sec   Loss 3.1324   LearningRate 0.0124   Epoch: 12   Global Step: 161100   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:41:06,051-Speed 3297.03 samples/sec   Loss 3.0826   LearningRate 0.0124   Epoch: 12   Global Step: 161110   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:41:09,103-Speed 3356.81 samples/sec   Loss 3.0487   LearningRate 0.0123   Epoch: 12   Global Step: 161120   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:41:12,217-Speed 3288.65 samples/sec   Loss 3.0531   LearningRate 0.0123   Epoch: 12   Global Step: 161130   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:41:15,321-Speed 3300.90 samples/sec   Loss 3.0867   LearningRate 0.0123   Epoch: 12   Global Step: 161140   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:41:18,476-Speed 3246.80 samples/sec   Loss 2.9775   LearningRate 0.0123   Epoch: 12   Global Step: 161150   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:41:21,541-Speed 3341.79 samples/sec   Loss 3.0177   LearningRate 0.0123   Epoch: 12   Global Step: 161160   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:41:24,630-Speed 3316.30 samples/sec   Loss 3.1186   LearningRate 0.0123   Epoch: 12   Global Step: 161170   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:41:27,735-Speed 3298.43 samples/sec   Loss 3.0815   LearningRate 0.0123   Epoch: 12   Global Step: 161180   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:41:30,835-Speed 3304.78 samples/sec   Loss 3.0521   LearningRate 0.0123   Epoch: 12   Global Step: 161190   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:41:33,915-Speed 3326.10 samples/sec   Loss 3.1246   LearningRate 0.0123   Epoch: 12   Global Step: 161200   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:41:37,001-Speed 3318.95 samples/sec   Loss 3.0938   LearningRate 0.0123   Epoch: 12   Global Step: 161210   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:41:40,122-Speed 3281.92 samples/sec   Loss 3.1186   LearningRate 0.0123   Epoch: 12   Global Step: 161220   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:41:43,195-Speed 3333.51 samples/sec   Loss 3.0584   LearningRate 0.0123   Epoch: 12   Global Step: 161230   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:41:46,268-Speed 3333.33 samples/sec   Loss 3.1239   LearningRate 0.0123   Epoch: 12   Global Step: 161240   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:41:49,366-Speed 3306.21 samples/sec   Loss 3.0882   LearningRate 0.0123   Epoch: 12   Global Step: 161250   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:41:52,486-Speed 3282.93 samples/sec   Loss 3.0648   LearningRate 0.0123   Epoch: 12   Global Step: 161260   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:41:55,629-Speed 3259.67 samples/sec   Loss 3.0155   LearningRate 0.0123   Epoch: 12   Global Step: 161270   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:41:58,703-Speed 3331.70 samples/sec   Loss 3.0605   LearningRate 0.0123   Epoch: 12   Global Step: 161280   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:42:01,854-Speed 3251.00 samples/sec   Loss 2.9784   LearningRate 0.0123   Epoch: 12   Global Step: 161290   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:42:04,947-Speed 3311.98 samples/sec   Loss 3.0686   LearningRate 0.0123   Epoch: 12   Global Step: 161300   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:42:08,027-Speed 3325.46 samples/sec   Loss 3.0325   LearningRate 0.0123   Epoch: 12   Global Step: 161310   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:42:11,113-Speed 3319.37 samples/sec   Loss 3.0463   LearningRate 0.0123   Epoch: 12   Global Step: 161320   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:42:14,187-Speed 3332.15 samples/sec   Loss 3.1664   LearningRate 0.0123   Epoch: 12   Global Step: 161330   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:42:17,276-Speed 3316.54 samples/sec   Loss 3.1132   LearningRate 0.0123   Epoch: 12   Global Step: 161340   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:42:20,356-Speed 3324.94 samples/sec   Loss 3.1027   LearningRate 0.0123   Epoch: 12   Global Step: 161350   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:42:23,460-Speed 3300.62 samples/sec   Loss 3.0840   LearningRate 0.0123   Epoch: 12   Global Step: 161360   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-04-27 15:42:26,527-Speed 3339.68 samples/sec   Loss 3.1384   LearningRate 0.0123   Epoch: 12   Global Step: 161370   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:42:29,609-Speed 3323.76 samples/sec   Loss 3.1271   LearningRate 0.0123   Epoch: 12   Global Step: 161380   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:42:32,701-Speed 3312.23 samples/sec   Loss 3.0236   LearningRate 0.0123   Epoch: 12   Global Step: 161390   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:42:35,898-Speed 3204.65 samples/sec   Loss 3.0473   LearningRate 0.0123   Epoch: 12   Global Step: 161400   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:42:38,990-Speed 3312.27 samples/sec   Loss 3.1218   LearningRate 0.0123   Epoch: 12   Global Step: 161410   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:42:42,077-Speed 3317.89 samples/sec   Loss 3.1028   LearningRate 0.0123   Epoch: 12   Global Step: 161420   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:42:45,128-Speed 3357.32 samples/sec   Loss 3.1567   LearningRate 0.0123   Epoch: 12   Global Step: 161430   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:42:48,283-Speed 3247.15 samples/sec   Loss 3.0961   LearningRate 0.0123   Epoch: 12   Global Step: 161440   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:42:51,357-Speed 3331.98 samples/sec   Loss 3.0705   LearningRate 0.0123   Epoch: 12   Global Step: 161450   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:42:54,504-Speed 3255.14 samples/sec   Loss 3.0432   LearningRate 0.0123   Epoch: 12   Global Step: 161460   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:42:57,782-Speed 3124.87 samples/sec   Loss 3.0636   LearningRate 0.0123   Epoch: 12   Global Step: 161470   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:43:30,285-Speed 315.06 samples/sec   Loss 2.4420   LearningRate 0.0122   Epoch: 13   Global Step: 161480   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:43:33,727-Speed 2975.93 samples/sec   Loss 2.2112   LearningRate 0.0122   Epoch: 13   Global Step: 161490   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:43:36,885-Speed 3243.34 samples/sec   Loss 2.1787   LearningRate 0.0122   Epoch: 13   Global Step: 161500   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:43:39,985-Speed 3305.04 samples/sec   Loss 2.1425   LearningRate 0.0122   Epoch: 13   Global Step: 161510   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:43:43,144-Speed 3241.55 samples/sec   Loss 2.2372   LearningRate 0.0122   Epoch: 13   Global Step: 161520   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-04-27 15:43:46,187-Speed 3366.56 samples/sec   Loss 2.2098   LearningRate 0.0122   Epoch: 13   Global Step: 161530   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:43:49,306-Speed 3284.37 samples/sec   Loss 2.1864   LearningRate 0.0122   Epoch: 13   Global Step: 161540   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:43:52,443-Speed 3264.81 samples/sec   Loss 2.2009   LearningRate 0.0122   Epoch: 13   Global Step: 161550   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:43:55,522-Speed 3327.56 samples/sec   Loss 2.1562   LearningRate 0.0122   Epoch: 13   Global Step: 161560   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-04-27 15:43:58,671-Speed 3252.24 samples/sec   Loss 2.1587   LearningRate 0.0122   Epoch: 13   Global Step: 161570   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 15:44:01,803-Speed 3270.30 samples/sec   Loss 2.1420   LearningRate 0.0122   Epoch: 13   Global Step: 161580   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 15:44:04,882-Speed 3327.06 samples/sec   Loss 2.2321   LearningRate 0.0122   Epoch: 13   Global Step: 161590   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 15:44:07,964-Speed 3323.83 samples/sec   Loss 2.2273   LearningRate 0.0122   Epoch: 13   Global Step: 161600   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 15:44:11,061-Speed 3307.08 samples/sec   Loss 2.1623   LearningRate 0.0122   Epoch: 13   Global Step: 161610   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 15:44:14,175-Speed 3289.81 samples/sec   Loss 2.1869   LearningRate 0.0122   Epoch: 13   Global Step: 161620   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 15:44:17,285-Speed 3294.08 samples/sec   Loss 2.1619   LearningRate 0.0122   Epoch: 13   Global Step: 161630   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:44:20,363-Speed 3328.05 samples/sec   Loss 2.2244   LearningRate 0.0122   Epoch: 13   Global Step: 161640   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:44:23,508-Speed 3256.00 samples/sec   Loss 2.2350   LearningRate 0.0122   Epoch: 13   Global Step: 161650   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:44:26,587-Speed 3327.63 samples/sec   Loss 2.2567   LearningRate 0.0122   Epoch: 13   Global Step: 161660   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:44:29,720-Speed 3269.80 samples/sec   Loss 2.1562   LearningRate 0.0122   Epoch: 13   Global Step: 161670   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:44:32,800-Speed 3325.20 samples/sec   Loss 2.1731   LearningRate 0.0122   Epoch: 13   Global Step: 161680   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:44:35,955-Speed 3246.43 samples/sec   Loss 2.2790   LearningRate 0.0122   Epoch: 13   Global Step: 161690   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:44:39,045-Speed 3315.73 samples/sec   Loss 2.1478   LearningRate 0.0122   Epoch: 13   Global Step: 161700   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:44:42,204-Speed 3242.57 samples/sec   Loss 2.2071   LearningRate 0.0122   Epoch: 13   Global Step: 161710   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:44:45,293-Speed 3315.55 samples/sec   Loss 2.2110   LearningRate 0.0122   Epoch: 13   Global Step: 161720   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:44:48,382-Speed 3316.49 samples/sec   Loss 2.2288   LearningRate 0.0122   Epoch: 13   Global Step: 161730   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:44:51,446-Speed 3342.46 samples/sec   Loss 2.1842   LearningRate 0.0122   Epoch: 13   Global Step: 161740   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:44:54,555-Speed 3295.05 samples/sec   Loss 2.2310   LearningRate 0.0122   Epoch: 13   Global Step: 161750   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:44:57,615-Speed 3347.59 samples/sec   Loss 2.2283   LearningRate 0.0122   Epoch: 13   Global Step: 161760   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:45:00,819-Speed 3197.46 samples/sec   Loss 2.2098   LearningRate 0.0122   Epoch: 13   Global Step: 161770   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:45:04,109-Speed 3113.14 samples/sec   Loss 2.1716   LearningRate 0.0122   Epoch: 13   Global Step: 161780   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:45:07,230-Speed 3282.29 samples/sec   Loss 2.2018   LearningRate 0.0122   Epoch: 13   Global Step: 161790   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:45:10,296-Speed 3340.09 samples/sec   Loss 2.1824   LearningRate 0.0122   Epoch: 13   Global Step: 161800   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:45:13,394-Speed 3307.12 samples/sec   Loss 2.1949   LearningRate 0.0122   Epoch: 13   Global Step: 161810   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:45:16,543-Speed 3252.19 samples/sec   Loss 2.2203   LearningRate 0.0122   Epoch: 13   Global Step: 161820   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:45:19,668-Speed 3278.23 samples/sec   Loss 2.2603   LearningRate 0.0121   Epoch: 13   Global Step: 161830   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:45:22,724-Speed 3351.82 samples/sec   Loss 2.2229   LearningRate 0.0121   Epoch: 13   Global Step: 161840   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:45:25,803-Speed 3326.68 samples/sec   Loss 2.2521   LearningRate 0.0121   Epoch: 13   Global Step: 161850   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:45:28,865-Speed 3345.82 samples/sec   Loss 2.2191   LearningRate 0.0121   Epoch: 13   Global Step: 161860   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:45:32,026-Speed 3240.73 samples/sec   Loss 2.1890   LearningRate 0.0121   Epoch: 13   Global Step: 161870   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:45:35,144-Speed 3284.46 samples/sec   Loss 2.1857   LearningRate 0.0121   Epoch: 13   Global Step: 161880   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:45:38,254-Speed 3293.63 samples/sec   Loss 2.2285   LearningRate 0.0121   Epoch: 13   Global Step: 161890   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:45:41,440-Speed 3215.68 samples/sec   Loss 2.1954   LearningRate 0.0121   Epoch: 13   Global Step: 161900   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:45:44,524-Speed 3321.49 samples/sec   Loss 2.2477   LearningRate 0.0121   Epoch: 13   Global Step: 161910   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:45:47,586-Speed 3344.32 samples/sec   Loss 2.2191   LearningRate 0.0121   Epoch: 13   Global Step: 161920   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:45:50,726-Speed 3262.79 samples/sec   Loss 2.2145   LearningRate 0.0121   Epoch: 13   Global Step: 161930   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:45:53,828-Speed 3302.67 samples/sec   Loss 2.2451   LearningRate 0.0121   Epoch: 13   Global Step: 161940   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:45:56,903-Speed 3329.93 samples/sec   Loss 2.2575   LearningRate 0.0121   Epoch: 13   Global Step: 161950   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:46:00,031-Speed 3275.50 samples/sec   Loss 2.1970   LearningRate 0.0121   Epoch: 13   Global Step: 161960   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:46:03,133-Speed 3302.00 samples/sec   Loss 2.2712   LearningRate 0.0121   Epoch: 13   Global Step: 161970   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:46:06,209-Speed 3330.42 samples/sec   Loss 2.2727   LearningRate 0.0121   Epoch: 13   Global Step: 161980   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:46:09,256-Speed 3361.63 samples/sec   Loss 2.2530   LearningRate 0.0121   Epoch: 13   Global Step: 161990   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:46:12,398-Speed 3259.51 samples/sec   Loss 2.2601   LearningRate 0.0121   Epoch: 13   Global Step: 162000   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:46:15,499-Speed 3303.34 samples/sec   Loss 2.3024   LearningRate 0.0121   Epoch: 13   Global Step: 162010   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:46:18,680-Speed 3219.59 samples/sec   Loss 2.2036   LearningRate 0.0121   Epoch: 13   Global Step: 162020   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:46:21,755-Speed 3331.00 samples/sec   Loss 2.2535   LearningRate 0.0121   Epoch: 13   Global Step: 162030   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:46:24,839-Speed 3322.57 samples/sec   Loss 2.3049   LearningRate 0.0121   Epoch: 13   Global Step: 162040   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:46:27,962-Speed 3279.01 samples/sec   Loss 2.2299   LearningRate 0.0121   Epoch: 13   Global Step: 162050   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 15:46:31,077-Speed 3288.19 samples/sec   Loss 2.1972   LearningRate 0.0121   Epoch: 13   Global Step: 162060   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 15:46:34,133-Speed 3352.22 samples/sec   Loss 2.1756   LearningRate 0.0121   Epoch: 13   Global Step: 162070   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:46:37,270-Speed 3265.85 samples/sec   Loss 2.2316   LearningRate 0.0121   Epoch: 13   Global Step: 162080   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:46:40,318-Speed 3359.44 samples/sec   Loss 2.2149   LearningRate 0.0121   Epoch: 13   Global Step: 162090   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:46:43,385-Speed 3339.89 samples/sec   Loss 2.1908   LearningRate 0.0121   Epoch: 13   Global Step: 162100   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:46:46,471-Speed 3320.05 samples/sec   Loss 2.2534   LearningRate 0.0121   Epoch: 13   Global Step: 162110   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:46:49,531-Speed 3347.22 samples/sec   Loss 2.2700   LearningRate 0.0121   Epoch: 13   Global Step: 162120   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:46:52,644-Speed 3290.04 samples/sec   Loss 2.2465   LearningRate 0.0121   Epoch: 13   Global Step: 162130   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:46:55,703-Speed 3348.77 samples/sec   Loss 2.2067   LearningRate 0.0121   Epoch: 13   Global Step: 162140   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:46:58,749-Speed 3362.77 samples/sec   Loss 2.1860   LearningRate 0.0121   Epoch: 13   Global Step: 162150   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:47:01,827-Speed 3328.09 samples/sec   Loss 2.3063   LearningRate 0.0121   Epoch: 13   Global Step: 162160   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:47:04,877-Speed 3357.73 samples/sec   Loss 2.2533   LearningRate 0.0121   Epoch: 13   Global Step: 162170   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:47:07,980-Speed 3301.83 samples/sec   Loss 2.3307   LearningRate 0.0121   Epoch: 13   Global Step: 162180   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:47:11,054-Speed 3332.34 samples/sec   Loss 2.2128   LearningRate 0.0120   Epoch: 13   Global Step: 162190   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:47:14,153-Speed 3304.43 samples/sec   Loss 2.1570   LearningRate 0.0120   Epoch: 13   Global Step: 162200   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:47:17,358-Speed 3196.98 samples/sec   Loss 2.2796   LearningRate 0.0120   Epoch: 13   Global Step: 162210   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:47:20,427-Speed 3337.70 samples/sec   Loss 2.2261   LearningRate 0.0120   Epoch: 13   Global Step: 162220   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:47:23,494-Speed 3340.21 samples/sec   Loss 2.2249   LearningRate 0.0120   Epoch: 13   Global Step: 162230   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:47:26,604-Speed 3292.88 samples/sec   Loss 2.2553   LearningRate 0.0120   Epoch: 13   Global Step: 162240   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:47:29,709-Speed 3298.96 samples/sec   Loss 2.2794   LearningRate 0.0120   Epoch: 13   Global Step: 162250   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:47:32,825-Speed 3287.42 samples/sec   Loss 2.3075   LearningRate 0.0120   Epoch: 13   Global Step: 162260   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:47:35,884-Speed 3349.24 samples/sec   Loss 2.2973   LearningRate 0.0120   Epoch: 13   Global Step: 162270   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 15:47:39,011-Speed 3275.33 samples/sec   Loss 2.2204   LearningRate 0.0120   Epoch: 13   Global Step: 162280   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:47:42,150-Speed 3263.16 samples/sec   Loss 2.2082   LearningRate 0.0120   Epoch: 13   Global Step: 162290   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:47:45,241-Speed 3314.41 samples/sec   Loss 2.2712   LearningRate 0.0120   Epoch: 13   Global Step: 162300   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:47:48,313-Speed 3334.44 samples/sec   Loss 2.2346   LearningRate 0.0120   Epoch: 13   Global Step: 162310   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:47:51,405-Speed 3312.30 samples/sec   Loss 2.2378   LearningRate 0.0120   Epoch: 13   Global Step: 162320   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:47:54,475-Speed 3337.14 samples/sec   Loss 2.2436   LearningRate 0.0120   Epoch: 13   Global Step: 162330   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:47:57,581-Speed 3298.07 samples/sec   Loss 2.3089   LearningRate 0.0120   Epoch: 13   Global Step: 162340   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:48:00,637-Speed 3351.93 samples/sec   Loss 2.2408   LearningRate 0.0120   Epoch: 13   Global Step: 162350   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:48:03,726-Speed 3315.78 samples/sec   Loss 2.3917   LearningRate 0.0120   Epoch: 13   Global Step: 162360   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:48:06,828-Speed 3302.64 samples/sec   Loss 2.2541   LearningRate 0.0120   Epoch: 13   Global Step: 162370   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:48:09,888-Speed 3346.65 samples/sec   Loss 2.2936   LearningRate 0.0120   Epoch: 13   Global Step: 162380   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:48:12,959-Speed 3336.02 samples/sec   Loss 2.2613   LearningRate 0.0120   Epoch: 13   Global Step: 162390   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:48:16,036-Speed 3328.24 samples/sec   Loss 2.2752   LearningRate 0.0120   Epoch: 13   Global Step: 162400   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:48:19,168-Speed 3271.17 samples/sec   Loss 2.3268   LearningRate 0.0120   Epoch: 13   Global Step: 162410   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:48:22,281-Speed 3289.94 samples/sec   Loss 2.2505   LearningRate 0.0120   Epoch: 13   Global Step: 162420   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:48:25,411-Speed 3272.95 samples/sec   Loss 2.2742   LearningRate 0.0120   Epoch: 13   Global Step: 162430   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:48:28,513-Speed 3301.76 samples/sec   Loss 2.2724   LearningRate 0.0120   Epoch: 13   Global Step: 162440   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:48:31,658-Speed 3257.38 samples/sec   Loss 2.3033   LearningRate 0.0120   Epoch: 13   Global Step: 162450   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:48:34,839-Speed 3220.43 samples/sec   Loss 2.2923   LearningRate 0.0120   Epoch: 13   Global Step: 162460   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:48:38,001-Speed 3239.75 samples/sec   Loss 2.2622   LearningRate 0.0120   Epoch: 13   Global Step: 162470   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:48:41,109-Speed 3296.06 samples/sec   Loss 2.2953   LearningRate 0.0120   Epoch: 13   Global Step: 162480   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:48:44,203-Speed 3309.92 samples/sec   Loss 2.2927   LearningRate 0.0120   Epoch: 13   Global Step: 162490   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:48:47,375-Speed 3229.55 samples/sec   Loss 2.3105   LearningRate 0.0120   Epoch: 13   Global Step: 162500   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:48:50,569-Speed 3207.32 samples/sec   Loss 2.2943   LearningRate 0.0120   Epoch: 13   Global Step: 162510   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:48:53,687-Speed 3284.37 samples/sec   Loss 2.3128   LearningRate 0.0120   Epoch: 13   Global Step: 162520   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:48:56,763-Speed 3330.67 samples/sec   Loss 2.3214   LearningRate 0.0120   Epoch: 13   Global Step: 162530   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:48:59,933-Speed 3231.12 samples/sec   Loss 2.3008   LearningRate 0.0120   Epoch: 13   Global Step: 162540   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:49:03,113-Speed 3221.53 samples/sec   Loss 2.3024   LearningRate 0.0119   Epoch: 13   Global Step: 162550   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:49:06,325-Speed 3188.78 samples/sec   Loss 2.3319   LearningRate 0.0119   Epoch: 13   Global Step: 162560   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:49:09,376-Speed 3357.36 samples/sec   Loss 2.2908   LearningRate 0.0119   Epoch: 13   Global Step: 162570   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:49:12,482-Speed 3297.71 samples/sec   Loss 2.3347   LearningRate 0.0119   Epoch: 13   Global Step: 162580   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:49:15,660-Speed 3223.89 samples/sec   Loss 2.3253   LearningRate 0.0119   Epoch: 13   Global Step: 162590   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:49:18,787-Speed 3275.70 samples/sec   Loss 2.3485   LearningRate 0.0119   Epoch: 13   Global Step: 162600   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:49:21,874-Speed 3318.13 samples/sec   Loss 2.3099   LearningRate 0.0119   Epoch: 13   Global Step: 162610   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:49:25,046-Speed 3229.55 samples/sec   Loss 2.3401   LearningRate 0.0119   Epoch: 13   Global Step: 162620   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:49:28,239-Speed 3208.14 samples/sec   Loss 2.3034   LearningRate 0.0119   Epoch: 13   Global Step: 162630   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:49:31,329-Speed 3314.53 samples/sec   Loss 2.2830   LearningRate 0.0119   Epoch: 13   Global Step: 162640   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:49:34,421-Speed 3312.74 samples/sec   Loss 2.3266   LearningRate 0.0119   Epoch: 13   Global Step: 162650   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:49:37,515-Speed 3311.22 samples/sec   Loss 2.2962   LearningRate 0.0119   Epoch: 13   Global Step: 162660   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:49:40,634-Speed 3283.81 samples/sec   Loss 2.2327   LearningRate 0.0119   Epoch: 13   Global Step: 162670   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:49:43,791-Speed 3245.06 samples/sec   Loss 2.3333   LearningRate 0.0119   Epoch: 13   Global Step: 162680   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:49:46,855-Speed 3342.08 samples/sec   Loss 2.3280   LearningRate 0.0119   Epoch: 13   Global Step: 162690   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:49:49,953-Speed 3306.94 samples/sec   Loss 2.2405   LearningRate 0.0119   Epoch: 13   Global Step: 162700   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:49:53,183-Speed 3171.31 samples/sec   Loss 2.3453   LearningRate 0.0119   Epoch: 13   Global Step: 162710   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:49:56,253-Speed 3336.11 samples/sec   Loss 2.3023   LearningRate 0.0119   Epoch: 13   Global Step: 162720   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:49:59,303-Speed 3358.50 samples/sec   Loss 2.3100   LearningRate 0.0119   Epoch: 13   Global Step: 162730   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 15:50:02,407-Speed 3300.15 samples/sec   Loss 2.2419   LearningRate 0.0119   Epoch: 13   Global Step: 162740   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 15:50:05,522-Speed 3288.86 samples/sec   Loss 2.3528   LearningRate 0.0119   Epoch: 13   Global Step: 162750   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 15:50:08,607-Speed 3320.22 samples/sec   Loss 2.3678   LearningRate 0.0119   Epoch: 13   Global Step: 162760   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 15:50:11,712-Speed 3299.65 samples/sec   Loss 2.4323   LearningRate 0.0119   Epoch: 13   Global Step: 162770   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 15:50:14,801-Speed 3315.13 samples/sec   Loss 2.3152   LearningRate 0.0119   Epoch: 13   Global Step: 162780   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 15:50:17,965-Speed 3238.40 samples/sec   Loss 2.3439   LearningRate 0.0119   Epoch: 13   Global Step: 162790   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 15:50:21,046-Speed 3324.75 samples/sec   Loss 2.2915   LearningRate 0.0119   Epoch: 13   Global Step: 162800   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 15:50:24,168-Speed 3280.67 samples/sec   Loss 2.3197   LearningRate 0.0119   Epoch: 13   Global Step: 162810   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 15:50:27,304-Speed 3265.51 samples/sec   Loss 2.2787   LearningRate 0.0119   Epoch: 13   Global Step: 162820   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 15:50:30,479-Speed 3226.14 samples/sec   Loss 2.3508   LearningRate 0.0119   Epoch: 13   Global Step: 162830   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:50:33,571-Speed 3313.73 samples/sec   Loss 2.3081   LearningRate 0.0119   Epoch: 13   Global Step: 162840   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:50:36,649-Speed 3328.02 samples/sec   Loss 2.3780   LearningRate 0.0119   Epoch: 13   Global Step: 162850   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:50:39,761-Speed 3291.13 samples/sec   Loss 2.3374   LearningRate 0.0119   Epoch: 13   Global Step: 162860   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:50:42,919-Speed 3243.68 samples/sec   Loss 2.3253   LearningRate 0.0119   Epoch: 13   Global Step: 162870   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:50:46,011-Speed 3312.37 samples/sec   Loss 2.3813   LearningRate 0.0119   Epoch: 13   Global Step: 162880   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:50:49,104-Speed 3311.93 samples/sec   Loss 2.2835   LearningRate 0.0119   Epoch: 13   Global Step: 162890   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:50:52,268-Speed 3237.07 samples/sec   Loss 2.3052   LearningRate 0.0119   Epoch: 13   Global Step: 162900   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:50:55,390-Speed 3281.33 samples/sec   Loss 2.4063   LearningRate 0.0118   Epoch: 13   Global Step: 162910   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:50:58,541-Speed 3250.80 samples/sec   Loss 2.4309   LearningRate 0.0118   Epoch: 13   Global Step: 162920   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:51:01,735-Speed 3207.21 samples/sec   Loss 2.3522   LearningRate 0.0118   Epoch: 13   Global Step: 162930   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:51:04,844-Speed 3294.40 samples/sec   Loss 2.3545   LearningRate 0.0118   Epoch: 13   Global Step: 162940   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:51:07,954-Speed 3293.36 samples/sec   Loss 2.3393   LearningRate 0.0118   Epoch: 13   Global Step: 162950   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:51:11,127-Speed 3228.92 samples/sec   Loss 2.3409   LearningRate 0.0118   Epoch: 13   Global Step: 162960   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:51:14,281-Speed 3247.30 samples/sec   Loss 2.3610   LearningRate 0.0118   Epoch: 13   Global Step: 162970   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:51:17,419-Speed 3264.55 samples/sec   Loss 2.4013   LearningRate 0.0118   Epoch: 13   Global Step: 162980   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:51:20,511-Speed 3312.02 samples/sec   Loss 2.3304   LearningRate 0.0118   Epoch: 13   Global Step: 162990   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:51:23,637-Speed 3276.58 samples/sec   Loss 2.3506   LearningRate 0.0118   Epoch: 13   Global Step: 163000   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:51:26,789-Speed 3249.97 samples/sec   Loss 2.3951   LearningRate 0.0118   Epoch: 13   Global Step: 163010   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:51:29,860-Speed 3335.28 samples/sec   Loss 2.3554   LearningRate 0.0118   Epoch: 13   Global Step: 163020   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:51:32,922-Speed 3345.85 samples/sec   Loss 2.3779   LearningRate 0.0118   Epoch: 13   Global Step: 163030   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:51:36,082-Speed 3242.21 samples/sec   Loss 2.3503   LearningRate 0.0118   Epoch: 13   Global Step: 163040   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:51:39,228-Speed 3255.84 samples/sec   Loss 2.3143   LearningRate 0.0118   Epoch: 13   Global Step: 163050   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:51:42,290-Speed 3344.47 samples/sec   Loss 2.4011   LearningRate 0.0118   Epoch: 13   Global Step: 163060   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:51:45,361-Speed 3335.57 samples/sec   Loss 2.3720   LearningRate 0.0118   Epoch: 13   Global Step: 163070   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:51:48,410-Speed 3360.13 samples/sec   Loss 2.3796   LearningRate 0.0118   Epoch: 13   Global Step: 163080   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:51:51,571-Speed 3240.37 samples/sec   Loss 2.4026   LearningRate 0.0118   Epoch: 13   Global Step: 163090   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:51:54,744-Speed 3228.00 samples/sec   Loss 2.3346   LearningRate 0.0118   Epoch: 13   Global Step: 163100   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:51:57,825-Speed 3325.22 samples/sec   Loss 2.3558   LearningRate 0.0118   Epoch: 13   Global Step: 163110   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:52:00,895-Speed 3335.68 samples/sec   Loss 2.4100   LearningRate 0.0118   Epoch: 13   Global Step: 163120   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:52:03,998-Speed 3301.61 samples/sec   Loss 2.3667   LearningRate 0.0118   Epoch: 13   Global Step: 163130   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:52:07,145-Speed 3254.64 samples/sec   Loss 2.3996   LearningRate 0.0118   Epoch: 13   Global Step: 163140   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:52:10,232-Speed 3318.89 samples/sec   Loss 2.3973   LearningRate 0.0118   Epoch: 13   Global Step: 163150   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:52:13,330-Speed 3305.57 samples/sec   Loss 2.2926   LearningRate 0.0118   Epoch: 13   Global Step: 163160   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:52:16,409-Speed 3327.66 samples/sec   Loss 2.3300   LearningRate 0.0118   Epoch: 13   Global Step: 163170   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:52:19,501-Speed 3312.78 samples/sec   Loss 2.3857   LearningRate 0.0118   Epoch: 13   Global Step: 163180   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:52:22,560-Speed 3347.69 samples/sec   Loss 2.4282   LearningRate 0.0118   Epoch: 13   Global Step: 163190   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:52:25,689-Speed 3274.63 samples/sec   Loss 2.3154   LearningRate 0.0118   Epoch: 13   Global Step: 163200   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:52:28,797-Speed 3295.80 samples/sec   Loss 2.3264   LearningRate 0.0118   Epoch: 13   Global Step: 163210   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:52:31,854-Speed 3350.02 samples/sec   Loss 2.3555   LearningRate 0.0118   Epoch: 13   Global Step: 163220   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:52:34,919-Speed 3342.00 samples/sec   Loss 2.3861   LearningRate 0.0118   Epoch: 13   Global Step: 163230   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:52:38,040-Speed 3282.97 samples/sec   Loss 2.3782   LearningRate 0.0118   Epoch: 13   Global Step: 163240   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:52:41,125-Speed 3320.18 samples/sec   Loss 2.3659   LearningRate 0.0118   Epoch: 13   Global Step: 163250   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:52:44,198-Speed 3332.77 samples/sec   Loss 2.3642   LearningRate 0.0118   Epoch: 13   Global Step: 163260   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:52:47,285-Speed 3318.97 samples/sec   Loss 2.3580   LearningRate 0.0117   Epoch: 13   Global Step: 163270   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:52:50,341-Speed 3351.34 samples/sec   Loss 2.3871   LearningRate 0.0117   Epoch: 13   Global Step: 163280   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:52:53,540-Speed 3202.34 samples/sec   Loss 2.3240   LearningRate 0.0117   Epoch: 13   Global Step: 163290   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:52:56,670-Speed 3272.30 samples/sec   Loss 2.3455   LearningRate 0.0117   Epoch: 13   Global Step: 163300   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:52:59,786-Speed 3287.07 samples/sec   Loss 2.3639   LearningRate 0.0117   Epoch: 13   Global Step: 163310   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:53:02,945-Speed 3243.25 samples/sec   Loss 2.3081   LearningRate 0.0117   Epoch: 13   Global Step: 163320   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:53:06,123-Speed 3222.40 samples/sec   Loss 2.3274   LearningRate 0.0117   Epoch: 13   Global Step: 163330   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:53:09,220-Speed 3307.53 samples/sec   Loss 2.3262   LearningRate 0.0117   Epoch: 13   Global Step: 163340   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:53:12,252-Speed 3379.32 samples/sec   Loss 2.3972   LearningRate 0.0117   Epoch: 13   Global Step: 163350   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:53:15,358-Speed 3298.08 samples/sec   Loss 2.3755   LearningRate 0.0117   Epoch: 13   Global Step: 163360   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:53:18,483-Speed 3277.53 samples/sec   Loss 2.3657   LearningRate 0.0117   Epoch: 13   Global Step: 163370   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:53:21,558-Speed 3330.82 samples/sec   Loss 2.4556   LearningRate 0.0117   Epoch: 13   Global Step: 163380   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:53:24,685-Speed 3276.91 samples/sec   Loss 2.4328   LearningRate 0.0117   Epoch: 13   Global Step: 163390   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:53:27,810-Speed 3278.03 samples/sec   Loss 2.3374   LearningRate 0.0117   Epoch: 13   Global Step: 163400   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:53:30,972-Speed 3238.57 samples/sec   Loss 2.4008   LearningRate 0.0117   Epoch: 13   Global Step: 163410   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:53:34,093-Speed 3281.84 samples/sec   Loss 2.3902   LearningRate 0.0117   Epoch: 13   Global Step: 163420   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:53:37,297-Speed 3197.32 samples/sec   Loss 2.3039   LearningRate 0.0117   Epoch: 13   Global Step: 163430   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:53:40,429-Speed 3270.29 samples/sec   Loss 2.3510   LearningRate 0.0117   Epoch: 13   Global Step: 163440   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:53:43,572-Speed 3259.42 samples/sec   Loss 2.4309   LearningRate 0.0117   Epoch: 13   Global Step: 163450   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:53:46,636-Speed 3343.15 samples/sec   Loss 2.3946   LearningRate 0.0117   Epoch: 13   Global Step: 163460   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:53:49,791-Speed 3246.53 samples/sec   Loss 2.4849   LearningRate 0.0117   Epoch: 13   Global Step: 163470   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:53:52,865-Speed 3332.58 samples/sec   Loss 2.4122   LearningRate 0.0117   Epoch: 13   Global Step: 163480   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:53:55,923-Speed 3349.31 samples/sec   Loss 2.4056   LearningRate 0.0117   Epoch: 13   Global Step: 163490   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:53:58,999-Speed 3329.77 samples/sec   Loss 2.3737   LearningRate 0.0117   Epoch: 13   Global Step: 163500   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:54:02,183-Speed 3217.21 samples/sec   Loss 2.3568   LearningRate 0.0117   Epoch: 13   Global Step: 163510   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:54:05,263-Speed 3326.16 samples/sec   Loss 2.3416   LearningRate 0.0117   Epoch: 13   Global Step: 163520   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:54:08,389-Speed 3278.29 samples/sec   Loss 2.3893   LearningRate 0.0117   Epoch: 13   Global Step: 163530   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:54:11,468-Speed 3326.80 samples/sec   Loss 2.3546   LearningRate 0.0117   Epoch: 13   Global Step: 163540   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:54:14,647-Speed 3222.84 samples/sec   Loss 2.3874   LearningRate 0.0117   Epoch: 13   Global Step: 163550   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:54:17,748-Speed 3302.77 samples/sec   Loss 2.3918   LearningRate 0.0117   Epoch: 13   Global Step: 163560   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:54:20,852-Speed 3300.59 samples/sec   Loss 2.3503   LearningRate 0.0117   Epoch: 13   Global Step: 163570   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:54:23,917-Speed 3341.58 samples/sec   Loss 2.4114   LearningRate 0.0117   Epoch: 13   Global Step: 163580   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:54:27,028-Speed 3292.66 samples/sec   Loss 2.3159   LearningRate 0.0117   Epoch: 13   Global Step: 163590   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:54:30,108-Speed 3325.20 samples/sec   Loss 2.4299   LearningRate 0.0117   Epoch: 13   Global Step: 163600   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:54:33,180-Speed 3334.40 samples/sec   Loss 2.4233   LearningRate 0.0117   Epoch: 13   Global Step: 163610   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:54:36,272-Speed 3313.18 samples/sec   Loss 2.4065   LearningRate 0.0117   Epoch: 13   Global Step: 163620   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:54:39,356-Speed 3321.58 samples/sec   Loss 2.4022   LearningRate 0.0116   Epoch: 13   Global Step: 163630   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:54:42,468-Speed 3292.13 samples/sec   Loss 2.4049   LearningRate 0.0116   Epoch: 13   Global Step: 163640   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:54:45,563-Speed 3308.91 samples/sec   Loss 2.3761   LearningRate 0.0116   Epoch: 13   Global Step: 163650   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:54:48,675-Speed 3291.43 samples/sec   Loss 2.3810   LearningRate 0.0116   Epoch: 13   Global Step: 163660   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:54:51,826-Speed 3251.40 samples/sec   Loss 2.3874   LearningRate 0.0116   Epoch: 13   Global Step: 163670   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:54:54,927-Speed 3303.58 samples/sec   Loss 2.3713   LearningRate 0.0116   Epoch: 13   Global Step: 163680   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:54:58,044-Speed 3286.32 samples/sec   Loss 2.3160   LearningRate 0.0116   Epoch: 13   Global Step: 163690   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:55:01,176-Speed 3270.25 samples/sec   Loss 2.3669   LearningRate 0.0116   Epoch: 13   Global Step: 163700   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:55:04,278-Speed 3301.98 samples/sec   Loss 2.4631   LearningRate 0.0116   Epoch: 13   Global Step: 163710   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:55:07,426-Speed 3254.61 samples/sec   Loss 2.3950   LearningRate 0.0116   Epoch: 13   Global Step: 163720   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:55:10,532-Speed 3297.88 samples/sec   Loss 2.4020   LearningRate 0.0116   Epoch: 13   Global Step: 163730   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:55:13,625-Speed 3311.27 samples/sec   Loss 2.4131   LearningRate 0.0116   Epoch: 13   Global Step: 163740   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:55:16,703-Speed 3329.00 samples/sec   Loss 2.3534   LearningRate 0.0116   Epoch: 13   Global Step: 163750   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:55:19,830-Speed 3275.47 samples/sec   Loss 2.4141   LearningRate 0.0116   Epoch: 13   Global Step: 163760   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:55:22,921-Speed 3313.77 samples/sec   Loss 2.4009   LearningRate 0.0116   Epoch: 13   Global Step: 163770   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:55:26,213-Speed 3111.99 samples/sec   Loss 2.3523   LearningRate 0.0116   Epoch: 13   Global Step: 163780   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:55:29,396-Speed 3217.04 samples/sec   Loss 2.3325   LearningRate 0.0116   Epoch: 13   Global Step: 163790   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:55:32,456-Speed 3347.88 samples/sec   Loss 2.4688   LearningRate 0.0116   Epoch: 13   Global Step: 163800   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:55:35,559-Speed 3300.96 samples/sec   Loss 2.4416   LearningRate 0.0116   Epoch: 13   Global Step: 163810   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:55:38,670-Speed 3293.34 samples/sec   Loss 2.3788   LearningRate 0.0116   Epoch: 13   Global Step: 163820   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:55:41,764-Speed 3311.10 samples/sec   Loss 2.4550   LearningRate 0.0116   Epoch: 13   Global Step: 163830   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:55:44,856-Speed 3312.46 samples/sec   Loss 2.4144   LearningRate 0.0116   Epoch: 13   Global Step: 163840   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:55:47,917-Speed 3346.38 samples/sec   Loss 2.3660   LearningRate 0.0116   Epoch: 13   Global Step: 163850   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:55:51,087-Speed 3230.73 samples/sec   Loss 2.4966   LearningRate 0.0116   Epoch: 13   Global Step: 163860   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:55:54,173-Speed 3320.05 samples/sec   Loss 2.3775   LearningRate 0.0116   Epoch: 13   Global Step: 163870   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:55:57,240-Speed 3340.19 samples/sec   Loss 2.3660   LearningRate 0.0116   Epoch: 13   Global Step: 163880   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:56:00,336-Speed 3308.28 samples/sec   Loss 2.3552   LearningRate 0.0116   Epoch: 13   Global Step: 163890   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:56:03,431-Speed 3309.23 samples/sec   Loss 2.4161   LearningRate 0.0116   Epoch: 13   Global Step: 163900   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:56:06,518-Speed 3319.25 samples/sec   Loss 2.2838   LearningRate 0.0116   Epoch: 13   Global Step: 163910   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:56:09,591-Speed 3332.17 samples/sec   Loss 2.3959   LearningRate 0.0116   Epoch: 13   Global Step: 163920   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:56:12,743-Speed 3250.61 samples/sec   Loss 2.3638   LearningRate 0.0116   Epoch: 13   Global Step: 163930   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:56:15,878-Speed 3267.13 samples/sec   Loss 2.4182   LearningRate 0.0116   Epoch: 13   Global Step: 163940   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:56:18,955-Speed 3329.33 samples/sec   Loss 2.4164   LearningRate 0.0116   Epoch: 13   Global Step: 163950   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:56:22,007-Speed 3356.34 samples/sec   Loss 2.4277   LearningRate 0.0116   Epoch: 13   Global Step: 163960   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:56:25,141-Speed 3268.38 samples/sec   Loss 2.4327   LearningRate 0.0116   Epoch: 13   Global Step: 163970   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:56:28,212-Speed 3336.30 samples/sec   Loss 2.4100   LearningRate 0.0116   Epoch: 13   Global Step: 163980   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:56:31,291-Speed 3325.89 samples/sec   Loss 2.4033   LearningRate 0.0116   Epoch: 13   Global Step: 163990   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:56:34,350-Speed 3349.12 samples/sec   Loss 2.4375   LearningRate 0.0115   Epoch: 13   Global Step: 164000   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:56:37,428-Speed 3327.82 samples/sec   Loss 2.4416   LearningRate 0.0115   Epoch: 13   Global Step: 164010   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:56:40,533-Speed 3299.55 samples/sec   Loss 2.4141   LearningRate 0.0115   Epoch: 13   Global Step: 164020   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:56:43,590-Speed 3351.10 samples/sec   Loss 2.3127   LearningRate 0.0115   Epoch: 13   Global Step: 164030   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:56:46,767-Speed 3223.47 samples/sec   Loss 2.3335   LearningRate 0.0115   Epoch: 13   Global Step: 164040   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:56:49,866-Speed 3305.78 samples/sec   Loss 2.4126   LearningRate 0.0115   Epoch: 13   Global Step: 164050   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:56:52,984-Speed 3285.10 samples/sec   Loss 2.4181   LearningRate 0.0115   Epoch: 13   Global Step: 164060   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:56:56,092-Speed 3296.63 samples/sec   Loss 2.4601   LearningRate 0.0115   Epoch: 13   Global Step: 164070   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:56:59,174-Speed 3322.63 samples/sec   Loss 2.4113   LearningRate 0.0115   Epoch: 13   Global Step: 164080   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:57:02,279-Speed 3299.30 samples/sec   Loss 2.4618   LearningRate 0.0115   Epoch: 13   Global Step: 164090   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:57:05,454-Speed 3226.60 samples/sec   Loss 2.4127   LearningRate 0.0115   Epoch: 13   Global Step: 164100   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:57:08,545-Speed 3313.97 samples/sec   Loss 2.4361   LearningRate 0.0115   Epoch: 13   Global Step: 164110   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:57:11,625-Speed 3325.60 samples/sec   Loss 2.4064   LearningRate 0.0115   Epoch: 13   Global Step: 164120   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:57:14,754-Speed 3273.76 samples/sec   Loss 2.4808   LearningRate 0.0115   Epoch: 13   Global Step: 164130   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:57:17,862-Speed 3296.26 samples/sec   Loss 2.4321   LearningRate 0.0115   Epoch: 13   Global Step: 164140   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:57:20,894-Speed 3377.53 samples/sec   Loss 2.4197   LearningRate 0.0115   Epoch: 13   Global Step: 164150   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:57:23,983-Speed 3316.54 samples/sec   Loss 2.4294   LearningRate 0.0115   Epoch: 13   Global Step: 164160   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:57:27,077-Speed 3310.13 samples/sec   Loss 2.4535   LearningRate 0.0115   Epoch: 13   Global Step: 164170   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:57:30,198-Speed 3282.50 samples/sec   Loss 2.4583   LearningRate 0.0115   Epoch: 13   Global Step: 164180   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:57:33,325-Speed 3275.25 samples/sec   Loss 2.4693   LearningRate 0.0115   Epoch: 13   Global Step: 164190   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:57:36,433-Speed 3296.18 samples/sec   Loss 2.4832   LearningRate 0.0115   Epoch: 13   Global Step: 164200   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:57:39,567-Speed 3268.53 samples/sec   Loss 2.3766   LearningRate 0.0115   Epoch: 13   Global Step: 164210   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:57:42,682-Speed 3288.11 samples/sec   Loss 2.4408   LearningRate 0.0115   Epoch: 13   Global Step: 164220   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:57:45,751-Speed 3338.38 samples/sec   Loss 2.4887   LearningRate 0.0115   Epoch: 13   Global Step: 164230   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:57:48,827-Speed 3329.06 samples/sec   Loss 2.5163   LearningRate 0.0115   Epoch: 13   Global Step: 164240   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:57:51,938-Speed 3293.28 samples/sec   Loss 2.4183   LearningRate 0.0115   Epoch: 13   Global Step: 164250   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:57:55,029-Speed 3313.49 samples/sec   Loss 2.3795   LearningRate 0.0115   Epoch: 13   Global Step: 164260   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:57:58,097-Speed 3339.21 samples/sec   Loss 2.4347   LearningRate 0.0115   Epoch: 13   Global Step: 164270   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:58:01,248-Speed 3250.26 samples/sec   Loss 2.3761   LearningRate 0.0115   Epoch: 13   Global Step: 164280   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:58:04,325-Speed 3329.07 samples/sec   Loss 2.4876   LearningRate 0.0115   Epoch: 13   Global Step: 164290   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:58:07,432-Speed 3296.83 samples/sec   Loss 2.4382   LearningRate 0.0115   Epoch: 13   Global Step: 164300   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:58:10,535-Speed 3302.11 samples/sec   Loss 2.4455   LearningRate 0.0115   Epoch: 13   Global Step: 164310   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:58:13,701-Speed 3235.38 samples/sec   Loss 2.4131   LearningRate 0.0115   Epoch: 13   Global Step: 164320   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:58:16,878-Speed 3223.45 samples/sec   Loss 2.4575   LearningRate 0.0115   Epoch: 13   Global Step: 164330   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:58:19,959-Speed 3325.30 samples/sec   Loss 2.4376   LearningRate 0.0115   Epoch: 13   Global Step: 164340   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:58:23,044-Speed 3319.82 samples/sec   Loss 2.4180   LearningRate 0.0115   Epoch: 13   Global Step: 164350   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:58:26,178-Speed 3268.64 samples/sec   Loss 2.4156   LearningRate 0.0115   Epoch: 13   Global Step: 164360   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:58:29,276-Speed 3306.22 samples/sec   Loss 2.4209   LearningRate 0.0114   Epoch: 13   Global Step: 164370   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:58:32,353-Speed 3329.38 samples/sec   Loss 2.4460   LearningRate 0.0114   Epoch: 13   Global Step: 164380   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:58:35,413-Speed 3346.97 samples/sec   Loss 2.4185   LearningRate 0.0114   Epoch: 13   Global Step: 164390   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:58:38,528-Speed 3287.82 samples/sec   Loss 2.4192   LearningRate 0.0114   Epoch: 13   Global Step: 164400   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:58:41,736-Speed 3193.62 samples/sec   Loss 2.4072   LearningRate 0.0114   Epoch: 13   Global Step: 164410   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:58:44,871-Speed 3267.19 samples/sec   Loss 2.4433   LearningRate 0.0114   Epoch: 13   Global Step: 164420   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:58:47,990-Speed 3283.70 samples/sec   Loss 2.4147   LearningRate 0.0114   Epoch: 13   Global Step: 164430   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:58:51,091-Speed 3303.65 samples/sec   Loss 2.3619   LearningRate 0.0114   Epoch: 13   Global Step: 164440   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:58:54,234-Speed 3258.95 samples/sec   Loss 2.4686   LearningRate 0.0114   Epoch: 13   Global Step: 164450   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:58:57,299-Speed 3342.22 samples/sec   Loss 2.4365   LearningRate 0.0114   Epoch: 13   Global Step: 164460   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:59:00,385-Speed 3318.98 samples/sec   Loss 2.4944   LearningRate 0.0114   Epoch: 13   Global Step: 164470   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:59:03,460-Speed 3331.55 samples/sec   Loss 2.4745   LearningRate 0.0114   Epoch: 13   Global Step: 164480   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 15:59:06,619-Speed 3242.40 samples/sec   Loss 2.4806   LearningRate 0.0114   Epoch: 13   Global Step: 164490   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:59:09,725-Speed 3296.81 samples/sec   Loss 2.4029   LearningRate 0.0114   Epoch: 13   Global Step: 164500   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:59:12,830-Speed 3300.16 samples/sec   Loss 2.4361   LearningRate 0.0114   Epoch: 13   Global Step: 164510   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 15:59:15,962-Speed 3269.57 samples/sec   Loss 2.4540   LearningRate 0.0114   Epoch: 13   Global Step: 164520   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 15:59:19,054-Speed 3312.94 samples/sec   Loss 2.4826   LearningRate 0.0114   Epoch: 13   Global Step: 164530   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 15:59:22,145-Speed 3314.40 samples/sec   Loss 2.4256   LearningRate 0.0114   Epoch: 13   Global Step: 164540   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 15:59:25,225-Speed 3325.67 samples/sec   Loss 2.3900   LearningRate 0.0114   Epoch: 13   Global Step: 164550   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 15:59:28,309-Speed 3320.99 samples/sec   Loss 2.4624   LearningRate 0.0114   Epoch: 13   Global Step: 164560   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 15:59:31,432-Speed 3280.50 samples/sec   Loss 2.4309   LearningRate 0.0114   Epoch: 13   Global Step: 164570   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 15:59:34,541-Speed 3294.97 samples/sec   Loss 2.4720   LearningRate 0.0114   Epoch: 13   Global Step: 164580   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 15:59:37,695-Speed 3247.43 samples/sec   Loss 2.4497   LearningRate 0.0114   Epoch: 13   Global Step: 164590   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 15:59:40,840-Speed 3257.18 samples/sec   Loss 2.4000   LearningRate 0.0114   Epoch: 13   Global Step: 164600   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 15:59:43,916-Speed 3329.18 samples/sec   Loss 2.5173   LearningRate 0.0114   Epoch: 13   Global Step: 164610   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:59:47,006-Speed 3315.30 samples/sec   Loss 2.4064   LearningRate 0.0114   Epoch: 13   Global Step: 164620   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:59:50,075-Speed 3337.31 samples/sec   Loss 2.4574   LearningRate 0.0114   Epoch: 13   Global Step: 164630   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:59:53,156-Speed 3324.82 samples/sec   Loss 2.4269   LearningRate 0.0114   Epoch: 13   Global Step: 164640   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:59:56,229-Speed 3333.10 samples/sec   Loss 2.4330   LearningRate 0.0114   Epoch: 13   Global Step: 164650   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 15:59:59,305-Speed 3329.81 samples/sec   Loss 2.4571   LearningRate 0.0114   Epoch: 13   Global Step: 164660   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:00:02,476-Speed 3230.74 samples/sec   Loss 2.4993   LearningRate 0.0114   Epoch: 13   Global Step: 164670   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:00:05,537-Speed 3346.62 samples/sec   Loss 2.4240   LearningRate 0.0114   Epoch: 13   Global Step: 164680   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:00:08,626-Speed 3315.13 samples/sec   Loss 2.5237   LearningRate 0.0114   Epoch: 13   Global Step: 164690   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:00:11,794-Speed 3233.85 samples/sec   Loss 2.4328   LearningRate 0.0114   Epoch: 13   Global Step: 164700   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:00:14,933-Speed 3262.86 samples/sec   Loss 2.4491   LearningRate 0.0114   Epoch: 13   Global Step: 164710   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:00:18,041-Speed 3296.28 samples/sec   Loss 2.4519   LearningRate 0.0114   Epoch: 13   Global Step: 164720   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:00:21,101-Speed 3347.09 samples/sec   Loss 2.4417   LearningRate 0.0113   Epoch: 13   Global Step: 164730   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:00:24,151-Speed 3358.54 samples/sec   Loss 2.5046   LearningRate 0.0113   Epoch: 13   Global Step: 164740   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:00:27,249-Speed 3306.61 samples/sec   Loss 2.4463   LearningRate 0.0113   Epoch: 13   Global Step: 164750   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:00:30,347-Speed 3305.75 samples/sec   Loss 2.5118   LearningRate 0.0113   Epoch: 13   Global Step: 164760   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:00:33,431-Speed 3321.90 samples/sec   Loss 2.4410   LearningRate 0.0113   Epoch: 13   Global Step: 164770   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:00:36,559-Speed 3274.38 samples/sec   Loss 2.4996   LearningRate 0.0113   Epoch: 13   Global Step: 164780   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:00:39,703-Speed 3258.30 samples/sec   Loss 2.4536   LearningRate 0.0113   Epoch: 13   Global Step: 164790   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:00:42,754-Speed 3360.88 samples/sec   Loss 2.4423   LearningRate 0.0113   Epoch: 13   Global Step: 164800   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:00:45,829-Speed 3331.03 samples/sec   Loss 2.4216   LearningRate 0.0113   Epoch: 13   Global Step: 164810   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:00:48,981-Speed 3250.20 samples/sec   Loss 2.4037   LearningRate 0.0113   Epoch: 13   Global Step: 164820   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:00:52,107-Speed 3276.77 samples/sec   Loss 2.4899   LearningRate 0.0113   Epoch: 13   Global Step: 164830   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:00:55,284-Speed 3224.14 samples/sec   Loss 2.5222   LearningRate 0.0113   Epoch: 13   Global Step: 164840   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:00:58,421-Speed 3265.42 samples/sec   Loss 2.4632   LearningRate 0.0113   Epoch: 13   Global Step: 164850   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:01:01,578-Speed 3244.75 samples/sec   Loss 2.4408   LearningRate 0.0113   Epoch: 13   Global Step: 164860   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:01:04,776-Speed 3203.20 samples/sec   Loss 2.5258   LearningRate 0.0113   Epoch: 13   Global Step: 164870   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:01:07,889-Speed 3289.70 samples/sec   Loss 2.5330   LearningRate 0.0113   Epoch: 13   Global Step: 164880   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:01:10,979-Speed 3314.76 samples/sec   Loss 2.4550   LearningRate 0.0113   Epoch: 13   Global Step: 164890   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:01:14,160-Speed 3220.39 samples/sec   Loss 2.5204   LearningRate 0.0113   Epoch: 13   Global Step: 164900   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:01:17,265-Speed 3299.31 samples/sec   Loss 2.4069   LearningRate 0.0113   Epoch: 13   Global Step: 164910   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:01:20,316-Speed 3356.95 samples/sec   Loss 2.4190   LearningRate 0.0113   Epoch: 13   Global Step: 164920   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:01:23,451-Speed 3267.34 samples/sec   Loss 2.4874   LearningRate 0.0113   Epoch: 13   Global Step: 164930   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:01:26,576-Speed 3277.76 samples/sec   Loss 2.4236   LearningRate 0.0113   Epoch: 13   Global Step: 164940   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:01:29,732-Speed 3245.70 samples/sec   Loss 2.5223   LearningRate 0.0113   Epoch: 13   Global Step: 164950   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:01:32,859-Speed 3276.28 samples/sec   Loss 2.4456   LearningRate 0.0113   Epoch: 13   Global Step: 164960   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:01:35,971-Speed 3290.85 samples/sec   Loss 2.4459   LearningRate 0.0113   Epoch: 13   Global Step: 164970   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:01:39,102-Speed 3272.11 samples/sec   Loss 2.4542   LearningRate 0.0113   Epoch: 13   Global Step: 164980   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:01:42,272-Speed 3231.14 samples/sec   Loss 2.4870   LearningRate 0.0113   Epoch: 13   Global Step: 164990   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:01:45,330-Speed 3349.74 samples/sec   Loss 2.4702   LearningRate 0.0113   Epoch: 13   Global Step: 165000   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:01:48,448-Speed 3285.30 samples/sec   Loss 2.4550   LearningRate 0.0113   Epoch: 13   Global Step: 165010   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:01:51,625-Speed 3223.75 samples/sec   Loss 2.5083   LearningRate 0.0113   Epoch: 13   Global Step: 165020   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:01:54,756-Speed 3271.82 samples/sec   Loss 2.4532   LearningRate 0.0113   Epoch: 13   Global Step: 165030   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:01:57,843-Speed 3317.79 samples/sec   Loss 2.3983   LearningRate 0.0113   Epoch: 13   Global Step: 165040   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:02:00,957-Speed 3289.82 samples/sec   Loss 2.4856   LearningRate 0.0113   Epoch: 13   Global Step: 165050   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:02:04,065-Speed 3295.97 samples/sec   Loss 2.4053   LearningRate 0.0113   Epoch: 13   Global Step: 165060   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:02:07,213-Speed 3253.47 samples/sec   Loss 2.5028   LearningRate 0.0113   Epoch: 13   Global Step: 165070   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:02:10,286-Speed 3333.83 samples/sec   Loss 2.4839   LearningRate 0.0113   Epoch: 13   Global Step: 165080   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:02:13,460-Speed 3226.71 samples/sec   Loss 2.4573   LearningRate 0.0113   Epoch: 13   Global Step: 165090   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:02:16,594-Speed 3269.38 samples/sec   Loss 2.5470   LearningRate 0.0112   Epoch: 13   Global Step: 165100   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:02:19,697-Speed 3301.09 samples/sec   Loss 2.5070   LearningRate 0.0112   Epoch: 13   Global Step: 165110   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:02:22,798-Speed 3303.67 samples/sec   Loss 2.5042   LearningRate 0.0112   Epoch: 13   Global Step: 165120   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:02:25,952-Speed 3247.04 samples/sec   Loss 2.5700   LearningRate 0.0112   Epoch: 13   Global Step: 165130   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:02:29,061-Speed 3294.85 samples/sec   Loss 2.4610   LearningRate 0.0112   Epoch: 13   Global Step: 165140   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:02:32,176-Speed 3288.32 samples/sec   Loss 2.4815   LearningRate 0.0112   Epoch: 13   Global Step: 165150   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:02:35,223-Speed 3361.61 samples/sec   Loss 2.4760   LearningRate 0.0112   Epoch: 13   Global Step: 165160   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:02:38,300-Speed 3329.31 samples/sec   Loss 2.4801   LearningRate 0.0112   Epoch: 13   Global Step: 165170   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:02:41,358-Speed 3349.28 samples/sec   Loss 2.4965   LearningRate 0.0112   Epoch: 13   Global Step: 165180   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:02:44,446-Speed 3317.16 samples/sec   Loss 2.4580   LearningRate 0.0112   Epoch: 13   Global Step: 165190   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:02:47,530-Speed 3320.84 samples/sec   Loss 2.5041   LearningRate 0.0112   Epoch: 13   Global Step: 165200   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:02:50,661-Speed 3271.67 samples/sec   Loss 2.4687   LearningRate 0.0112   Epoch: 13   Global Step: 165210   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:02:53,754-Speed 3311.85 samples/sec   Loss 2.4768   LearningRate 0.0112   Epoch: 13   Global Step: 165220   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:02:56,823-Speed 3338.13 samples/sec   Loss 2.5481   LearningRate 0.0112   Epoch: 13   Global Step: 165230   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:02:59,909-Speed 3319.37 samples/sec   Loss 2.4442   LearningRate 0.0112   Epoch: 13   Global Step: 165240   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:03:03,000-Speed 3314.11 samples/sec   Loss 2.4789   LearningRate 0.0112   Epoch: 13   Global Step: 165250   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:03:06,136-Speed 3265.96 samples/sec   Loss 2.4834   LearningRate 0.0112   Epoch: 13   Global Step: 165260   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:03:09,196-Speed 3347.20 samples/sec   Loss 2.5521   LearningRate 0.0112   Epoch: 13   Global Step: 165270   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:03:12,341-Speed 3257.02 samples/sec   Loss 2.4917   LearningRate 0.0112   Epoch: 13   Global Step: 165280   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 16:03:15,447-Speed 3297.61 samples/sec   Loss 2.4861   LearningRate 0.0112   Epoch: 13   Global Step: 165290   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:03:18,527-Speed 3326.53 samples/sec   Loss 2.4458   LearningRate 0.0112   Epoch: 13   Global Step: 165300   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:03:21,585-Speed 3349.10 samples/sec   Loss 2.4755   LearningRate 0.0112   Epoch: 13   Global Step: 165310   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:03:24,644-Speed 3348.36 samples/sec   Loss 2.4849   LearningRate 0.0112   Epoch: 13   Global Step: 165320   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:03:27,747-Speed 3301.23 samples/sec   Loss 2.4630   LearningRate 0.0112   Epoch: 13   Global Step: 165330   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:03:30,810-Speed 3344.50 samples/sec   Loss 2.5012   LearningRate 0.0112   Epoch: 13   Global Step: 165340   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:03:33,863-Speed 3355.19 samples/sec   Loss 2.4785   LearningRate 0.0112   Epoch: 13   Global Step: 165350   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:03:37,054-Speed 3209.87 samples/sec   Loss 2.5897   LearningRate 0.0112   Epoch: 13   Global Step: 165360   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:03:40,246-Speed 3208.78 samples/sec   Loss 2.5037   LearningRate 0.0112   Epoch: 13   Global Step: 165370   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:03:43,315-Speed 3337.70 samples/sec   Loss 2.4552   LearningRate 0.0112   Epoch: 13   Global Step: 165380   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:03:46,404-Speed 3315.83 samples/sec   Loss 2.5289   LearningRate 0.0112   Epoch: 13   Global Step: 165390   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:03:49,554-Speed 3251.60 samples/sec   Loss 2.4946   LearningRate 0.0112   Epoch: 13   Global Step: 165400   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:03:52,685-Speed 3272.53 samples/sec   Loss 2.3998   LearningRate 0.0112   Epoch: 13   Global Step: 165410   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:03:55,844-Speed 3242.20 samples/sec   Loss 2.4721   LearningRate 0.0112   Epoch: 13   Global Step: 165420   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:03:58,910-Speed 3341.26 samples/sec   Loss 2.5173   LearningRate 0.0112   Epoch: 13   Global Step: 165430   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:04:02,050-Speed 3262.28 samples/sec   Loss 2.5348   LearningRate 0.0112   Epoch: 13   Global Step: 165440   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:04:05,190-Speed 3262.45 samples/sec   Loss 2.4755   LearningRate 0.0112   Epoch: 13   Global Step: 165450   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:04:08,292-Speed 3301.63 samples/sec   Loss 2.4931   LearningRate 0.0112   Epoch: 13   Global Step: 165460   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:04:11,360-Speed 3339.16 samples/sec   Loss 2.4840   LearningRate 0.0111   Epoch: 13   Global Step: 165470   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:04:14,522-Speed 3239.99 samples/sec   Loss 2.5206   LearningRate 0.0111   Epoch: 13   Global Step: 165480   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:04:17,610-Speed 3316.19 samples/sec   Loss 2.5093   LearningRate 0.0111   Epoch: 13   Global Step: 165490   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:04:20,701-Speed 3314.89 samples/sec   Loss 2.5787   LearningRate 0.0111   Epoch: 13   Global Step: 165500   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:04:23,813-Speed 3291.46 samples/sec   Loss 2.5343   LearningRate 0.0111   Epoch: 13   Global Step: 165510   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:04:26,926-Speed 3290.60 samples/sec   Loss 2.5601   LearningRate 0.0111   Epoch: 13   Global Step: 165520   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:04:30,117-Speed 3209.64 samples/sec   Loss 2.5235   LearningRate 0.0111   Epoch: 13   Global Step: 165530   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:04:33,206-Speed 3316.62 samples/sec   Loss 2.4690   LearningRate 0.0111   Epoch: 13   Global Step: 165540   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:04:36,332-Speed 3276.21 samples/sec   Loss 2.5120   LearningRate 0.0111   Epoch: 13   Global Step: 165550   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:04:39,502-Speed 3232.28 samples/sec   Loss 2.5174   LearningRate 0.0111   Epoch: 13   Global Step: 165560   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:04:42,601-Speed 3305.10 samples/sec   Loss 2.5371   LearningRate 0.0111   Epoch: 13   Global Step: 165570   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:04:45,661-Speed 3347.62 samples/sec   Loss 2.4526   LearningRate 0.0111   Epoch: 13   Global Step: 165580   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:04:48,789-Speed 3273.95 samples/sec   Loss 2.4741   LearningRate 0.0111   Epoch: 13   Global Step: 165590   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:04:51,876-Speed 3318.39 samples/sec   Loss 2.5211   LearningRate 0.0111   Epoch: 13   Global Step: 165600   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:04:55,007-Speed 3271.58 samples/sec   Loss 2.4772   LearningRate 0.0111   Epoch: 13   Global Step: 165610   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:04:58,083-Speed 3330.11 samples/sec   Loss 2.4879   LearningRate 0.0111   Epoch: 13   Global Step: 165620   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:05:01,145-Speed 3345.72 samples/sec   Loss 2.5529   LearningRate 0.0111   Epoch: 13   Global Step: 165630   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:05:04,238-Speed 3311.70 samples/sec   Loss 2.5150   LearningRate 0.0111   Epoch: 13   Global Step: 165640   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:05:07,360-Speed 3280.73 samples/sec   Loss 2.5631   LearningRate 0.0111   Epoch: 13   Global Step: 165650   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:05:10,467-Speed 3296.76 samples/sec   Loss 2.4926   LearningRate 0.0111   Epoch: 13   Global Step: 165660   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:05:13,600-Speed 3270.35 samples/sec   Loss 2.4747   LearningRate 0.0111   Epoch: 13   Global Step: 165670   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:05:16,674-Speed 3331.84 samples/sec   Loss 2.5005   LearningRate 0.0111   Epoch: 13   Global Step: 165680   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:05:19,725-Speed 3356.86 samples/sec   Loss 2.4770   LearningRate 0.0111   Epoch: 13   Global Step: 165690   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:05:22,802-Speed 3329.68 samples/sec   Loss 2.5140   LearningRate 0.0111   Epoch: 13   Global Step: 165700   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:05:26,027-Speed 3176.24 samples/sec   Loss 2.5045   LearningRate 0.0111   Epoch: 13   Global Step: 165710   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:05:29,167-Speed 3261.73 samples/sec   Loss 2.4653   LearningRate 0.0111   Epoch: 13   Global Step: 165720   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:05:32,272-Speed 3299.33 samples/sec   Loss 2.5508   LearningRate 0.0111   Epoch: 13   Global Step: 165730   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:05:35,366-Speed 3310.99 samples/sec   Loss 2.5585   LearningRate 0.0111   Epoch: 13   Global Step: 165740   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:05:38,501-Speed 3266.54 samples/sec   Loss 2.5255   LearningRate 0.0111   Epoch: 13   Global Step: 165750   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:05:41,596-Speed 3310.21 samples/sec   Loss 2.5708   LearningRate 0.0111   Epoch: 13   Global Step: 165760   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:05:44,689-Speed 3311.19 samples/sec   Loss 2.4853   LearningRate 0.0111   Epoch: 13   Global Step: 165770   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:05:47,814-Speed 3278.14 samples/sec   Loss 2.5367   LearningRate 0.0111   Epoch: 13   Global Step: 165780   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:05:51,018-Speed 3196.65 samples/sec   Loss 2.5237   LearningRate 0.0111   Epoch: 13   Global Step: 165790   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:05:54,085-Speed 3339.73 samples/sec   Loss 2.4960   LearningRate 0.0111   Epoch: 13   Global Step: 165800   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:05:57,126-Speed 3368.97 samples/sec   Loss 2.5380   LearningRate 0.0111   Epoch: 13   Global Step: 165810   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:06:00,314-Speed 3213.18 samples/sec   Loss 2.4569   LearningRate 0.0111   Epoch: 13   Global Step: 165820   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:06:03,499-Speed 3215.67 samples/sec   Loss 2.6161   LearningRate 0.0111   Epoch: 13   Global Step: 165830   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:06:06,558-Speed 3348.92 samples/sec   Loss 2.5080   LearningRate 0.0111   Epoch: 13   Global Step: 165840   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:06:09,641-Speed 3323.25 samples/sec   Loss 2.4816   LearningRate 0.0110   Epoch: 13   Global Step: 165850   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:06:12,785-Speed 3257.99 samples/sec   Loss 2.4708   LearningRate 0.0110   Epoch: 13   Global Step: 165860   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:06:15,884-Speed 3304.40 samples/sec   Loss 2.6045   LearningRate 0.0110   Epoch: 13   Global Step: 165870   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:06:19,010-Speed 3277.32 samples/sec   Loss 2.4656   LearningRate 0.0110   Epoch: 13   Global Step: 165880   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:06:22,108-Speed 3306.57 samples/sec   Loss 2.4896   LearningRate 0.0110   Epoch: 13   Global Step: 165890   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:06:25,200-Speed 3312.95 samples/sec   Loss 2.5293   LearningRate 0.0110   Epoch: 13   Global Step: 165900   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:06:28,288-Speed 3317.03 samples/sec   Loss 2.5162   LearningRate 0.0110   Epoch: 13   Global Step: 165910   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:06:31,496-Speed 3193.13 samples/sec   Loss 2.5043   LearningRate 0.0110   Epoch: 13   Global Step: 165920   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:06:34,569-Speed 3333.26 samples/sec   Loss 2.4967   LearningRate 0.0110   Epoch: 13   Global Step: 165930   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:06:37,726-Speed 3243.47 samples/sec   Loss 2.5297   LearningRate 0.0110   Epoch: 13   Global Step: 165940   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:06:40,859-Speed 3269.90 samples/sec   Loss 2.4911   LearningRate 0.0110   Epoch: 13   Global Step: 165950   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:06:43,960-Speed 3302.80 samples/sec   Loss 2.5185   LearningRate 0.0110   Epoch: 13   Global Step: 165960   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:06:46,998-Speed 3372.29 samples/sec   Loss 2.4729   LearningRate 0.0110   Epoch: 13   Global Step: 165970   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:06:50,083-Speed 3320.23 samples/sec   Loss 2.5671   LearningRate 0.0110   Epoch: 13   Global Step: 165980   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:06:53,168-Speed 3319.85 samples/sec   Loss 2.5611   LearningRate 0.0110   Epoch: 13   Global Step: 165990   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:06:56,285-Speed 3287.06 samples/sec   Loss 2.5201   LearningRate 0.0110   Epoch: 13   Global Step: 166000   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:06:59,397-Speed 3290.71 samples/sec   Loss 2.5264   LearningRate 0.0110   Epoch: 13   Global Step: 166010   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:07:02,518-Speed 3281.89 samples/sec   Loss 2.5051   LearningRate 0.0110   Epoch: 13   Global Step: 166020   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:07:05,608-Speed 3315.01 samples/sec   Loss 2.5606   LearningRate 0.0110   Epoch: 13   Global Step: 166030   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:07:08,705-Speed 3307.95 samples/sec   Loss 2.5340   LearningRate 0.0110   Epoch: 13   Global Step: 166040   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:07:11,827-Speed 3280.61 samples/sec   Loss 2.5137   LearningRate 0.0110   Epoch: 13   Global Step: 166050   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:07:14,988-Speed 3240.39 samples/sec   Loss 2.4376   LearningRate 0.0110   Epoch: 13   Global Step: 166060   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:07:18,065-Speed 3328.87 samples/sec   Loss 2.6261   LearningRate 0.0110   Epoch: 13   Global Step: 166070   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:07:21,150-Speed 3319.85 samples/sec   Loss 2.5197   LearningRate 0.0110   Epoch: 13   Global Step: 166080   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:07:24,258-Speed 3295.51 samples/sec   Loss 2.5766   LearningRate 0.0110   Epoch: 13   Global Step: 166090   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:07:27,415-Speed 3245.38 samples/sec   Loss 2.5346   LearningRate 0.0110   Epoch: 13   Global Step: 166100   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:07:30,551-Speed 3266.40 samples/sec   Loss 2.5223   LearningRate 0.0110   Epoch: 13   Global Step: 166110   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:07:33,703-Speed 3248.78 samples/sec   Loss 2.4968   LearningRate 0.0110   Epoch: 13   Global Step: 166120   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:07:36,890-Speed 3214.73 samples/sec   Loss 2.5355   LearningRate 0.0110   Epoch: 13   Global Step: 166130   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:07:40,032-Speed 3259.64 samples/sec   Loss 2.6010   LearningRate 0.0110   Epoch: 13   Global Step: 166140   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:07:43,178-Speed 3256.45 samples/sec   Loss 2.5393   LearningRate 0.0110   Epoch: 13   Global Step: 166150   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:07:46,285-Speed 3296.68 samples/sec   Loss 2.5171   LearningRate 0.0110   Epoch: 13   Global Step: 166160   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:07:49,369-Speed 3321.34 samples/sec   Loss 2.5398   LearningRate 0.0110   Epoch: 13   Global Step: 166170   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:07:52,464-Speed 3310.16 samples/sec   Loss 2.4324   LearningRate 0.0110   Epoch: 13   Global Step: 166180   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:07:55,570-Speed 3297.32 samples/sec   Loss 2.5396   LearningRate 0.0110   Epoch: 13   Global Step: 166190   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:07:58,697-Speed 3276.29 samples/sec   Loss 2.4816   LearningRate 0.0110   Epoch: 13   Global Step: 166200   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:08:01,810-Speed 3290.39 samples/sec   Loss 2.5284   LearningRate 0.0110   Epoch: 13   Global Step: 166210   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:08:04,946-Speed 3265.72 samples/sec   Loss 2.5179   LearningRate 0.0109   Epoch: 13   Global Step: 166220   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:08:08,106-Speed 3242.07 samples/sec   Loss 2.5128   LearningRate 0.0109   Epoch: 13   Global Step: 166230   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:08:11,247-Speed 3260.92 samples/sec   Loss 2.4885   LearningRate 0.0109   Epoch: 13   Global Step: 166240   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:08:14,347-Speed 3303.60 samples/sec   Loss 2.4699   LearningRate 0.0109   Epoch: 13   Global Step: 166250   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:08:17,508-Speed 3241.37 samples/sec   Loss 2.5372   LearningRate 0.0109   Epoch: 13   Global Step: 166260   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:08:20,608-Speed 3304.25 samples/sec   Loss 2.5666   LearningRate 0.0109   Epoch: 13   Global Step: 166270   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:08:23,699-Speed 3313.46 samples/sec   Loss 2.4651   LearningRate 0.0109   Epoch: 13   Global Step: 166280   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:08:26,801-Speed 3302.35 samples/sec   Loss 2.5053   LearningRate 0.0109   Epoch: 13   Global Step: 166290   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:08:29,921-Speed 3283.42 samples/sec   Loss 2.5285   LearningRate 0.0109   Epoch: 13   Global Step: 166300   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:08:33,011-Speed 3315.18 samples/sec   Loss 2.5052   LearningRate 0.0109   Epoch: 13   Global Step: 166310   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:08:36,092-Speed 3323.85 samples/sec   Loss 2.4781   LearningRate 0.0109   Epoch: 13   Global Step: 166320   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:08:39,192-Speed 3304.93 samples/sec   Loss 2.4471   LearningRate 0.0109   Epoch: 13   Global Step: 166330   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:08:42,351-Speed 3242.26 samples/sec   Loss 2.6046   LearningRate 0.0109   Epoch: 13   Global Step: 166340   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:08:45,441-Speed 3315.47 samples/sec   Loss 2.4692   LearningRate 0.0109   Epoch: 13   Global Step: 166350   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:08:48,609-Speed 3233.23 samples/sec   Loss 2.5573   LearningRate 0.0109   Epoch: 13   Global Step: 166360   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:08:51,792-Speed 3218.40 samples/sec   Loss 2.5779   LearningRate 0.0109   Epoch: 13   Global Step: 166370   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:08:54,907-Speed 3288.66 samples/sec   Loss 2.5002   LearningRate 0.0109   Epoch: 13   Global Step: 166380   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:08:57,980-Speed 3332.92 samples/sec   Loss 2.5900   LearningRate 0.0109   Epoch: 13   Global Step: 166390   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:09:01,079-Speed 3304.73 samples/sec   Loss 2.5637   LearningRate 0.0109   Epoch: 13   Global Step: 166400   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:09:04,161-Speed 3323.55 samples/sec   Loss 2.5648   LearningRate 0.0109   Epoch: 13   Global Step: 166410   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:09:07,294-Speed 3269.56 samples/sec   Loss 2.5701   LearningRate 0.0109   Epoch: 13   Global Step: 166420   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:09:10,391-Speed 3308.22 samples/sec   Loss 2.5655   LearningRate 0.0109   Epoch: 13   Global Step: 166430   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:09:13,511-Speed 3282.95 samples/sec   Loss 2.5855   LearningRate 0.0109   Epoch: 13   Global Step: 166440   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:09:16,634-Speed 3279.60 samples/sec   Loss 2.5498   LearningRate 0.0109   Epoch: 13   Global Step: 166450   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:09:19,733-Speed 3304.80 samples/sec   Loss 2.5285   LearningRate 0.0109   Epoch: 13   Global Step: 166460   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:09:22,826-Speed 3312.64 samples/sec   Loss 2.6109   LearningRate 0.0109   Epoch: 13   Global Step: 166470   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:09:25,963-Speed 3265.09 samples/sec   Loss 2.6344   LearningRate 0.0109   Epoch: 13   Global Step: 166480   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:09:29,081-Speed 3284.96 samples/sec   Loss 2.5982   LearningRate 0.0109   Epoch: 13   Global Step: 166490   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:09:32,207-Speed 3277.14 samples/sec   Loss 2.5486   LearningRate 0.0109   Epoch: 13   Global Step: 166500   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:09:35,305-Speed 3306.80 samples/sec   Loss 2.5507   LearningRate 0.0109   Epoch: 13   Global Step: 166510   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:09:38,476-Speed 3229.89 samples/sec   Loss 2.5519   LearningRate 0.0109   Epoch: 13   Global Step: 166520   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:09:41,582-Speed 3297.89 samples/sec   Loss 2.5330   LearningRate 0.0109   Epoch: 13   Global Step: 166530   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:09:44,686-Speed 3298.96 samples/sec   Loss 2.6060   LearningRate 0.0109   Epoch: 13   Global Step: 166540   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:09:47,764-Speed 3328.36 samples/sec   Loss 2.4996   LearningRate 0.0109   Epoch: 13   Global Step: 166550   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:09:50,894-Speed 3272.71 samples/sec   Loss 2.5575   LearningRate 0.0109   Epoch: 13   Global Step: 166560   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:09:54,022-Speed 3274.35 samples/sec   Loss 2.5718   LearningRate 0.0109   Epoch: 13   Global Step: 166570   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:09:57,103-Speed 3324.40 samples/sec   Loss 2.5333   LearningRate 0.0109   Epoch: 13   Global Step: 166580   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:10:00,209-Speed 3298.44 samples/sec   Loss 2.5257   LearningRate 0.0109   Epoch: 13   Global Step: 166590   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:10:03,325-Speed 3287.28 samples/sec   Loss 2.5530   LearningRate 0.0108   Epoch: 13   Global Step: 166600   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:10:06,402-Speed 3328.42 samples/sec   Loss 2.5480   LearningRate 0.0108   Epoch: 13   Global Step: 166610   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:10:09,521-Speed 3284.27 samples/sec   Loss 2.6016   LearningRate 0.0108   Epoch: 13   Global Step: 166620   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:10:12,666-Speed 3257.14 samples/sec   Loss 2.5540   LearningRate 0.0108   Epoch: 13   Global Step: 166630   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:10:15,806-Speed 3262.16 samples/sec   Loss 2.4816   LearningRate 0.0108   Epoch: 13   Global Step: 166640   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:10:18,941-Speed 3267.44 samples/sec   Loss 2.5814   LearningRate 0.0108   Epoch: 13   Global Step: 166650   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:10:22,007-Speed 3341.10 samples/sec   Loss 2.5258   LearningRate 0.0108   Epoch: 13   Global Step: 166660   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:10:25,115-Speed 3295.50 samples/sec   Loss 2.5291   LearningRate 0.0108   Epoch: 13   Global Step: 166670   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:10:28,230-Speed 3289.31 samples/sec   Loss 2.5167   LearningRate 0.0108   Epoch: 13   Global Step: 166680   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:10:31,353-Speed 3279.40 samples/sec   Loss 2.5752   LearningRate 0.0108   Epoch: 13   Global Step: 166690   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:10:34,439-Speed 3319.82 samples/sec   Loss 2.5723   LearningRate 0.0108   Epoch: 13   Global Step: 166700   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:10:37,512-Speed 3332.89 samples/sec   Loss 2.4298   LearningRate 0.0108   Epoch: 13   Global Step: 166710   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:10:40,613-Speed 3303.32 samples/sec   Loss 2.5311   LearningRate 0.0108   Epoch: 13   Global Step: 166720   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 16:10:43,714-Speed 3303.34 samples/sec   Loss 2.5199   LearningRate 0.0108   Epoch: 13   Global Step: 166730   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:10:46,787-Speed 3332.67 samples/sec   Loss 2.5172   LearningRate 0.0108   Epoch: 13   Global Step: 166740   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:10:49,859-Speed 3335.13 samples/sec   Loss 2.5454   LearningRate 0.0108   Epoch: 13   Global Step: 166750   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:10:52,973-Speed 3289.34 samples/sec   Loss 2.5665   LearningRate 0.0108   Epoch: 13   Global Step: 166760   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:10:56,044-Speed 3335.20 samples/sec   Loss 2.6055   LearningRate 0.0108   Epoch: 13   Global Step: 166770   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:10:59,132-Speed 3316.55 samples/sec   Loss 2.5857   LearningRate 0.0108   Epoch: 13   Global Step: 166780   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:11:02,239-Speed 3297.56 samples/sec   Loss 2.6290   LearningRate 0.0108   Epoch: 13   Global Step: 166790   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:11:05,345-Speed 3297.71 samples/sec   Loss 2.5364   LearningRate 0.0108   Epoch: 13   Global Step: 166800   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:11:08,405-Speed 3347.02 samples/sec   Loss 2.5116   LearningRate 0.0108   Epoch: 13   Global Step: 166810   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:11:11,497-Speed 3312.52 samples/sec   Loss 2.4569   LearningRate 0.0108   Epoch: 13   Global Step: 166820   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:11:14,598-Speed 3303.54 samples/sec   Loss 2.5718   LearningRate 0.0108   Epoch: 13   Global Step: 166830   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:11:17,722-Speed 3279.41 samples/sec   Loss 2.4610   LearningRate 0.0108   Epoch: 13   Global Step: 166840   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:11:20,823-Speed 3302.23 samples/sec   Loss 2.5493   LearningRate 0.0108   Epoch: 13   Global Step: 166850   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:11:23,951-Speed 3275.29 samples/sec   Loss 2.5624   LearningRate 0.0108   Epoch: 13   Global Step: 166860   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:11:27,048-Speed 3307.90 samples/sec   Loss 2.5459   LearningRate 0.0108   Epoch: 13   Global Step: 166870   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:11:30,136-Speed 3316.65 samples/sec   Loss 2.5242   LearningRate 0.0108   Epoch: 13   Global Step: 166880   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:11:33,233-Speed 3307.08 samples/sec   Loss 2.5791   LearningRate 0.0108   Epoch: 13   Global Step: 166890   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:11:36,350-Speed 3286.33 samples/sec   Loss 2.5157   LearningRate 0.0108   Epoch: 13   Global Step: 166900   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:11:39,428-Speed 3328.48 samples/sec   Loss 2.5373   LearningRate 0.0108   Epoch: 13   Global Step: 166910   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:11:42,512-Speed 3321.32 samples/sec   Loss 2.5459   LearningRate 0.0108   Epoch: 13   Global Step: 166920   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:11:45,570-Speed 3349.20 samples/sec   Loss 2.5029   LearningRate 0.0108   Epoch: 13   Global Step: 166930   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:11:48,667-Speed 3306.69 samples/sec   Loss 2.6073   LearningRate 0.0108   Epoch: 13   Global Step: 166940   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:11:51,836-Speed 3232.57 samples/sec   Loss 2.5611   LearningRate 0.0108   Epoch: 13   Global Step: 166950   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:11:54,955-Speed 3284.20 samples/sec   Loss 2.6017   LearningRate 0.0108   Epoch: 13   Global Step: 166960   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:11:58,019-Speed 3342.80 samples/sec   Loss 2.5776   LearningRate 0.0108   Epoch: 13   Global Step: 166970   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:12:01,116-Speed 3307.63 samples/sec   Loss 2.5341   LearningRate 0.0107   Epoch: 13   Global Step: 166980   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:12:04,262-Speed 3256.38 samples/sec   Loss 2.5826   LearningRate 0.0107   Epoch: 13   Global Step: 166990   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:12:07,364-Speed 3301.62 samples/sec   Loss 2.6223   LearningRate 0.0107   Epoch: 13   Global Step: 167000   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:12:10,461-Speed 3308.13 samples/sec   Loss 2.5278   LearningRate 0.0107   Epoch: 13   Global Step: 167010   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:12:13,589-Speed 3274.37 samples/sec   Loss 2.5955   LearningRate 0.0107   Epoch: 13   Global Step: 167020   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:12:16,719-Speed 3272.55 samples/sec   Loss 2.6278   LearningRate 0.0107   Epoch: 13   Global Step: 167030   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:12:19,853-Speed 3268.12 samples/sec   Loss 2.6699   LearningRate 0.0107   Epoch: 13   Global Step: 167040   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:12:22,947-Speed 3310.71 samples/sec   Loss 2.5705   LearningRate 0.0107   Epoch: 13   Global Step: 167050   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:12:26,051-Speed 3300.86 samples/sec   Loss 2.4827   LearningRate 0.0107   Epoch: 13   Global Step: 167060   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:12:29,226-Speed 3225.71 samples/sec   Loss 2.5775   LearningRate 0.0107   Epoch: 13   Global Step: 167070   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:12:32,317-Speed 3313.75 samples/sec   Loss 2.5573   LearningRate 0.0107   Epoch: 13   Global Step: 167080   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:12:35,407-Speed 3315.47 samples/sec   Loss 2.6149   LearningRate 0.0107   Epoch: 13   Global Step: 167090   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:12:38,561-Speed 3247.84 samples/sec   Loss 2.4901   LearningRate 0.0107   Epoch: 13   Global Step: 167100   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:12:41,654-Speed 3311.55 samples/sec   Loss 2.4798   LearningRate 0.0107   Epoch: 13   Global Step: 167110   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:12:45,378-Speed 2750.03 samples/sec   Loss 2.5339   LearningRate 0.0107   Epoch: 13   Global Step: 167120   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:12:48,483-Speed 3299.18 samples/sec   Loss 2.5459   LearningRate 0.0107   Epoch: 13   Global Step: 167130   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:12:51,601-Speed 3285.19 samples/sec   Loss 2.5229   LearningRate 0.0107   Epoch: 13   Global Step: 167140   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:12:54,770-Speed 3232.71 samples/sec   Loss 2.5535   LearningRate 0.0107   Epoch: 13   Global Step: 167150   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:12:57,871-Speed 3303.65 samples/sec   Loss 2.5507   LearningRate 0.0107   Epoch: 13   Global Step: 167160   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:13:00,985-Speed 3288.40 samples/sec   Loss 2.5589   LearningRate 0.0107   Epoch: 13   Global Step: 167170   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:13:04,081-Speed 3309.58 samples/sec   Loss 2.5432   LearningRate 0.0107   Epoch: 13   Global Step: 167180   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:13:07,206-Speed 3277.10 samples/sec   Loss 2.5420   LearningRate 0.0107   Epoch: 13   Global Step: 167190   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:13:10,273-Speed 3340.14 samples/sec   Loss 2.5783   LearningRate 0.0107   Epoch: 13   Global Step: 167200   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:13:13,383-Speed 3293.43 samples/sec   Loss 2.6010   LearningRate 0.0107   Epoch: 13   Global Step: 167210   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:13:16,468-Speed 3321.08 samples/sec   Loss 2.5219   LearningRate 0.0107   Epoch: 13   Global Step: 167220   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:13:19,600-Speed 3269.60 samples/sec   Loss 2.5886   LearningRate 0.0107   Epoch: 13   Global Step: 167230   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:13:22,694-Speed 3311.30 samples/sec   Loss 2.5932   LearningRate 0.0107   Epoch: 13   Global Step: 167240   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:13:25,770-Speed 3330.05 samples/sec   Loss 2.5745   LearningRate 0.0107   Epoch: 13   Global Step: 167250   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:13:28,853-Speed 3322.24 samples/sec   Loss 2.5030   LearningRate 0.0107   Epoch: 13   Global Step: 167260   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:13:31,965-Speed 3292.27 samples/sec   Loss 2.5387   LearningRate 0.0107   Epoch: 13   Global Step: 167270   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:13:35,111-Speed 3255.41 samples/sec   Loss 2.6085   LearningRate 0.0107   Epoch: 13   Global Step: 167280   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:13:38,225-Speed 3289.48 samples/sec   Loss 2.5936   LearningRate 0.0107   Epoch: 13   Global Step: 167290   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:13:41,363-Speed 3264.83 samples/sec   Loss 2.5167   LearningRate 0.0107   Epoch: 13   Global Step: 167300   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:13:44,454-Speed 3313.60 samples/sec   Loss 2.4925   LearningRate 0.0107   Epoch: 13   Global Step: 167310   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:13:47,587-Speed 3269.57 samples/sec   Loss 2.6063   LearningRate 0.0107   Epoch: 13   Global Step: 167320   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:13:50,760-Speed 3227.53 samples/sec   Loss 2.6203   LearningRate 0.0107   Epoch: 13   Global Step: 167330   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:13:53,876-Speed 3287.61 samples/sec   Loss 2.5161   LearningRate 0.0107   Epoch: 13   Global Step: 167340   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:13:56,974-Speed 3305.86 samples/sec   Loss 2.5914   LearningRate 0.0106   Epoch: 13   Global Step: 167350   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:14:00,077-Speed 3300.78 samples/sec   Loss 2.4960   LearningRate 0.0106   Epoch: 13   Global Step: 167360   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:14:03,270-Speed 3208.71 samples/sec   Loss 2.6224   LearningRate 0.0106   Epoch: 13   Global Step: 167370   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:14:06,360-Speed 3315.03 samples/sec   Loss 2.4640   LearningRate 0.0106   Epoch: 13   Global Step: 167380   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:14:09,425-Speed 3341.85 samples/sec   Loss 2.6434   LearningRate 0.0106   Epoch: 13   Global Step: 167390   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:14:12,562-Speed 3265.24 samples/sec   Loss 2.5591   LearningRate 0.0106   Epoch: 13   Global Step: 167400   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:14:16,248-Speed 2779.10 samples/sec   Loss 2.5727   LearningRate 0.0106   Epoch: 13   Global Step: 167410   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:14:19,350-Speed 3302.13 samples/sec   Loss 2.5877   LearningRate 0.0106   Epoch: 13   Global Step: 167420   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:14:22,427-Speed 3328.75 samples/sec   Loss 2.5710   LearningRate 0.0106   Epoch: 13   Global Step: 167430   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:14:27,564-Speed 1993.75 samples/sec   Loss 2.5818   LearningRate 0.0106   Epoch: 13   Global Step: 167440   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:14:30,726-Speed 3239.30 samples/sec   Loss 2.5952   LearningRate 0.0106   Epoch: 13   Global Step: 167450   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:14:33,800-Speed 3333.33 samples/sec   Loss 2.5645   LearningRate 0.0106   Epoch: 13   Global Step: 167460   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:14:36,934-Speed 3267.95 samples/sec   Loss 2.5161   LearningRate 0.0106   Epoch: 13   Global Step: 167470   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 16:14:40,003-Speed 3338.12 samples/sec   Loss 2.5376   LearningRate 0.0106   Epoch: 13   Global Step: 167480   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:14:43,088-Speed 3320.23 samples/sec   Loss 2.6279   LearningRate 0.0106   Epoch: 13   Global Step: 167490   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:14:46,170-Speed 3323.77 samples/sec   Loss 2.5710   LearningRate 0.0106   Epoch: 13   Global Step: 167500   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:14:49,245-Speed 3330.43 samples/sec   Loss 2.5698   LearningRate 0.0106   Epoch: 13   Global Step: 167510   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:14:52,342-Speed 3308.07 samples/sec   Loss 2.5490   LearningRate 0.0106   Epoch: 13   Global Step: 167520   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:14:55,454-Speed 3292.01 samples/sec   Loss 2.5543   LearningRate 0.0106   Epoch: 13   Global Step: 167530   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:14:58,521-Speed 3339.12 samples/sec   Loss 2.5712   LearningRate 0.0106   Epoch: 13   Global Step: 167540   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:15:01,634-Speed 3290.23 samples/sec   Loss 2.5265   LearningRate 0.0106   Epoch: 13   Global Step: 167550   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:15:04,743-Speed 3295.24 samples/sec   Loss 2.6089   LearningRate 0.0106   Epoch: 13   Global Step: 167560   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:15:07,880-Speed 3264.92 samples/sec   Loss 2.5564   LearningRate 0.0106   Epoch: 13   Global Step: 167570   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:15:10,964-Speed 3321.08 samples/sec   Loss 2.5852   LearningRate 0.0106   Epoch: 13   Global Step: 167580   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:15:14,146-Speed 3219.07 samples/sec   Loss 2.6136   LearningRate 0.0106   Epoch: 13   Global Step: 167590   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:15:17,229-Speed 3322.17 samples/sec   Loss 2.4999   LearningRate 0.0106   Epoch: 13   Global Step: 167600   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:15:20,334-Speed 3299.71 samples/sec   Loss 2.5368   LearningRate 0.0106   Epoch: 13   Global Step: 167610   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:15:23,401-Speed 3339.95 samples/sec   Loss 2.4593   LearningRate 0.0106   Epoch: 13   Global Step: 167620   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:15:26,513-Speed 3291.87 samples/sec   Loss 2.5262   LearningRate 0.0106   Epoch: 13   Global Step: 167630   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:15:29,607-Speed 3310.64 samples/sec   Loss 2.5907   LearningRate 0.0106   Epoch: 13   Global Step: 167640   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:15:32,672-Speed 3341.96 samples/sec   Loss 2.5791   LearningRate 0.0106   Epoch: 13   Global Step: 167650   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:15:35,757-Speed 3320.29 samples/sec   Loss 2.6076   LearningRate 0.0106   Epoch: 13   Global Step: 167660   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:15:38,840-Speed 3322.59 samples/sec   Loss 2.6345   LearningRate 0.0106   Epoch: 13   Global Step: 167670   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:15:41,982-Speed 3259.63 samples/sec   Loss 2.5863   LearningRate 0.0106   Epoch: 13   Global Step: 167680   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:15:45,025-Speed 3367.11 samples/sec   Loss 2.4686   LearningRate 0.0106   Epoch: 13   Global Step: 167690   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:15:48,100-Speed 3330.66 samples/sec   Loss 2.5679   LearningRate 0.0106   Epoch: 13   Global Step: 167700   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:15:51,148-Speed 3360.86 samples/sec   Loss 2.5982   LearningRate 0.0106   Epoch: 13   Global Step: 167710   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:15:54,228-Speed 3325.55 samples/sec   Loss 2.5608   LearningRate 0.0106   Epoch: 13   Global Step: 167720   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:15:57,283-Speed 3353.50 samples/sec   Loss 2.6408   LearningRate 0.0106   Epoch: 13   Global Step: 167730   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:16:00,468-Speed 3215.80 samples/sec   Loss 2.5352   LearningRate 0.0105   Epoch: 13   Global Step: 167740   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:16:03,584-Speed 3287.63 samples/sec   Loss 2.6016   LearningRate 0.0105   Epoch: 13   Global Step: 167750   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:16:06,716-Speed 3270.23 samples/sec   Loss 2.5853   LearningRate 0.0105   Epoch: 13   Global Step: 167760   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:16:09,784-Speed 3339.28 samples/sec   Loss 2.4769   LearningRate 0.0105   Epoch: 13   Global Step: 167770   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:16:12,859-Speed 3330.42 samples/sec   Loss 2.5922   LearningRate 0.0105   Epoch: 13   Global Step: 167780   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:16:15,940-Speed 3325.04 samples/sec   Loss 2.6095   LearningRate 0.0105   Epoch: 13   Global Step: 167790   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:16:19,052-Speed 3291.98 samples/sec   Loss 2.6424   LearningRate 0.0105   Epoch: 13   Global Step: 167800   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:16:22,117-Speed 3341.38 samples/sec   Loss 2.5869   LearningRate 0.0105   Epoch: 13   Global Step: 167810   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:16:25,227-Speed 3294.17 samples/sec   Loss 2.6689   LearningRate 0.0105   Epoch: 13   Global Step: 167820   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:16:28,325-Speed 3306.55 samples/sec   Loss 2.6424   LearningRate 0.0105   Epoch: 13   Global Step: 167830   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:16:31,450-Speed 3277.16 samples/sec   Loss 2.6341   LearningRate 0.0105   Epoch: 13   Global Step: 167840   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:16:34,561-Speed 3293.13 samples/sec   Loss 2.6111   LearningRate 0.0105   Epoch: 13   Global Step: 167850   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:16:37,691-Speed 3272.70 samples/sec   Loss 2.6788   LearningRate 0.0105   Epoch: 13   Global Step: 167860   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:16:40,818-Speed 3275.98 samples/sec   Loss 2.6193   LearningRate 0.0105   Epoch: 13   Global Step: 167870   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:16:43,886-Speed 3337.91 samples/sec   Loss 2.6048   LearningRate 0.0105   Epoch: 13   Global Step: 167880   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:16:47,010-Speed 3279.05 samples/sec   Loss 2.5623   LearningRate 0.0105   Epoch: 13   Global Step: 167890   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:16:50,127-Speed 3286.87 samples/sec   Loss 2.6570   LearningRate 0.0105   Epoch: 13   Global Step: 167900   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:16:53,297-Speed 3231.31 samples/sec   Loss 2.5310   LearningRate 0.0105   Epoch: 13   Global Step: 167910   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:16:56,453-Speed 3245.88 samples/sec   Loss 2.5856   LearningRate 0.0105   Epoch: 13   Global Step: 167920   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:16:59,561-Speed 3295.76 samples/sec   Loss 2.6215   LearningRate 0.0105   Epoch: 13   Global Step: 167930   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:17:02,651-Speed 3314.75 samples/sec   Loss 2.5283   LearningRate 0.0105   Epoch: 13   Global Step: 167940   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:17:05,752-Speed 3303.88 samples/sec   Loss 2.5675   LearningRate 0.0105   Epoch: 13   Global Step: 167950   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:17:08,846-Speed 3309.95 samples/sec   Loss 2.5999   LearningRate 0.0105   Epoch: 13   Global Step: 167960   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:17:11,923-Speed 3329.52 samples/sec   Loss 2.5127   LearningRate 0.0105   Epoch: 13   Global Step: 167970   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:17:15,060-Speed 3265.02 samples/sec   Loss 2.5544   LearningRate 0.0105   Epoch: 13   Global Step: 167980   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:17:18,163-Speed 3301.50 samples/sec   Loss 2.4990   LearningRate 0.0105   Epoch: 13   Global Step: 167990   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:17:21,223-Speed 3347.02 samples/sec   Loss 2.5470   LearningRate 0.0105   Epoch: 13   Global Step: 168000   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:17:24,359-Speed 3266.09 samples/sec   Loss 2.6281   LearningRate 0.0105   Epoch: 13   Global Step: 168010   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:17:27,487-Speed 3274.99 samples/sec   Loss 2.5611   LearningRate 0.0105   Epoch: 13   Global Step: 168020   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:17:30,577-Speed 3315.06 samples/sec   Loss 2.6287   LearningRate 0.0105   Epoch: 13   Global Step: 168030   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:17:33,637-Speed 3347.10 samples/sec   Loss 2.5728   LearningRate 0.0105   Epoch: 13   Global Step: 168040   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:17:36,826-Speed 3212.40 samples/sec   Loss 2.6096   LearningRate 0.0105   Epoch: 13   Global Step: 168050   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:17:39,912-Speed 3319.45 samples/sec   Loss 2.6272   LearningRate 0.0105   Epoch: 13   Global Step: 168060   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:17:43,077-Speed 3236.16 samples/sec   Loss 2.5416   LearningRate 0.0105   Epoch: 13   Global Step: 168070   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:17:46,153-Speed 3330.11 samples/sec   Loss 2.5616   LearningRate 0.0105   Epoch: 13   Global Step: 168080   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:17:49,260-Speed 3295.87 samples/sec   Loss 2.6029   LearningRate 0.0105   Epoch: 13   Global Step: 168090   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:17:52,359-Speed 3306.34 samples/sec   Loss 2.5990   LearningRate 0.0105   Epoch: 13   Global Step: 168100   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:17:55,483-Speed 3278.54 samples/sec   Loss 2.6027   LearningRate 0.0105   Epoch: 13   Global Step: 168110   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:17:58,563-Speed 3325.18 samples/sec   Loss 2.6627   LearningRate 0.0104   Epoch: 13   Global Step: 168120   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:18:01,638-Speed 3332.09 samples/sec   Loss 2.5098   LearningRate 0.0104   Epoch: 13   Global Step: 168130   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:18:04,741-Speed 3301.06 samples/sec   Loss 2.6051   LearningRate 0.0104   Epoch: 13   Global Step: 168140   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:18:07,840-Speed 3304.69 samples/sec   Loss 2.5771   LearningRate 0.0104   Epoch: 13   Global Step: 168150   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:18:10,949-Speed 3295.12 samples/sec   Loss 2.5436   LearningRate 0.0104   Epoch: 13   Global Step: 168160   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:18:14,131-Speed 3219.16 samples/sec   Loss 2.5922   LearningRate 0.0104   Epoch: 13   Global Step: 168170   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:18:17,371-Speed 3161.19 samples/sec   Loss 2.5844   LearningRate 0.0104   Epoch: 13   Global Step: 168180   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:18:20,467-Speed 3308.58 samples/sec   Loss 2.6896   LearningRate 0.0104   Epoch: 13   Global Step: 168190   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:18:23,586-Speed 3284.86 samples/sec   Loss 2.5940   LearningRate 0.0104   Epoch: 13   Global Step: 168200   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:18:26,739-Speed 3248.27 samples/sec   Loss 2.6035   LearningRate 0.0104   Epoch: 13   Global Step: 168210   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:18:29,868-Speed 3273.73 samples/sec   Loss 2.6021   LearningRate 0.0104   Epoch: 13   Global Step: 168220   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:18:32,952-Speed 3320.85 samples/sec   Loss 2.5832   LearningRate 0.0104   Epoch: 13   Global Step: 168230   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:18:36,148-Speed 3205.46 samples/sec   Loss 2.6376   LearningRate 0.0104   Epoch: 13   Global Step: 168240   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:18:39,298-Speed 3252.11 samples/sec   Loss 2.5737   LearningRate 0.0104   Epoch: 13   Global Step: 168250   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:18:42,394-Speed 3307.60 samples/sec   Loss 2.6131   LearningRate 0.0104   Epoch: 13   Global Step: 168260   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:18:45,482-Speed 3317.20 samples/sec   Loss 2.5887   LearningRate 0.0104   Epoch: 13   Global Step: 168270   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:18:48,558-Speed 3330.76 samples/sec   Loss 2.6816   LearningRate 0.0104   Epoch: 13   Global Step: 168280   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:18:51,689-Speed 3271.07 samples/sec   Loss 2.5708   LearningRate 0.0104   Epoch: 13   Global Step: 168290   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:18:54,892-Speed 3198.19 samples/sec   Loss 2.5586   LearningRate 0.0104   Epoch: 13   Global Step: 168300   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:18:57,990-Speed 3306.15 samples/sec   Loss 2.5471   LearningRate 0.0104   Epoch: 13   Global Step: 168310   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:19:01,131-Speed 3260.66 samples/sec   Loss 2.5593   LearningRate 0.0104   Epoch: 13   Global Step: 168320   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:19:04,222-Speed 3314.35 samples/sec   Loss 2.6604   LearningRate 0.0104   Epoch: 13   Global Step: 168330   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:19:07,333-Speed 3292.00 samples/sec   Loss 2.6082   LearningRate 0.0104   Epoch: 13   Global Step: 168340   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:19:10,466-Speed 3269.29 samples/sec   Loss 2.5892   LearningRate 0.0104   Epoch: 13   Global Step: 168350   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:19:13,663-Speed 3204.35 samples/sec   Loss 2.5809   LearningRate 0.0104   Epoch: 13   Global Step: 168360   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:19:16,856-Speed 3208.20 samples/sec   Loss 2.5737   LearningRate 0.0104   Epoch: 13   Global Step: 168370   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:19:20,006-Speed 3251.96 samples/sec   Loss 2.6921   LearningRate 0.0104   Epoch: 13   Global Step: 168380   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:19:23,128-Speed 3280.30 samples/sec   Loss 2.6160   LearningRate 0.0104   Epoch: 13   Global Step: 168390   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:19:26,263-Speed 3267.71 samples/sec   Loss 2.6215   LearningRate 0.0104   Epoch: 13   Global Step: 168400   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:19:29,378-Speed 3288.36 samples/sec   Loss 2.5737   LearningRate 0.0104   Epoch: 13   Global Step: 168410   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:19:32,447-Speed 3337.85 samples/sec   Loss 2.6126   LearningRate 0.0104   Epoch: 13   Global Step: 168420   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:19:35,555-Speed 3295.31 samples/sec   Loss 2.6314   LearningRate 0.0104   Epoch: 13   Global Step: 168430   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:19:38,658-Speed 3301.25 samples/sec   Loss 2.6783   LearningRate 0.0104   Epoch: 13   Global Step: 168440   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:19:41,814-Speed 3245.16 samples/sec   Loss 2.6192   LearningRate 0.0104   Epoch: 13   Global Step: 168450   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:19:44,905-Speed 3314.51 samples/sec   Loss 2.6147   LearningRate 0.0104   Epoch: 13   Global Step: 168460   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:19:48,054-Speed 3252.17 samples/sec   Loss 2.5667   LearningRate 0.0104   Epoch: 13   Global Step: 168470   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:19:51,182-Speed 3274.98 samples/sec   Loss 2.6529   LearningRate 0.0104   Epoch: 13   Global Step: 168480   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:19:54,282-Speed 3304.57 samples/sec   Loss 2.6437   LearningRate 0.0104   Epoch: 13   Global Step: 168490   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:19:57,364-Speed 3323.57 samples/sec   Loss 2.6065   LearningRate 0.0103   Epoch: 13   Global Step: 168500   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:20:00,487-Speed 3279.40 samples/sec   Loss 2.6217   LearningRate 0.0103   Epoch: 13   Global Step: 168510   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:20:03,582-Speed 3309.00 samples/sec   Loss 2.5329   LearningRate 0.0103   Epoch: 13   Global Step: 168520   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:20:06,735-Speed 3249.10 samples/sec   Loss 2.6577   LearningRate 0.0103   Epoch: 13   Global Step: 168530   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 16:20:09,805-Speed 3336.35 samples/sec   Loss 2.5307   LearningRate 0.0103   Epoch: 13   Global Step: 168540   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 16:20:12,920-Speed 3288.99 samples/sec   Loss 2.5768   LearningRate 0.0103   Epoch: 13   Global Step: 168550   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 16:20:16,041-Speed 3282.09 samples/sec   Loss 2.5568   LearningRate 0.0103   Epoch: 13   Global Step: 168560   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:20:19,187-Speed 3255.59 samples/sec   Loss 2.5740   LearningRate 0.0103   Epoch: 13   Global Step: 168570   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:20:22,256-Speed 3337.76 samples/sec   Loss 2.5885   LearningRate 0.0103   Epoch: 13   Global Step: 168580   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:20:25,381-Speed 3277.52 samples/sec   Loss 2.6147   LearningRate 0.0103   Epoch: 13   Global Step: 168590   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:20:28,528-Speed 3255.39 samples/sec   Loss 2.6070   LearningRate 0.0103   Epoch: 13   Global Step: 168600   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:20:31,621-Speed 3310.92 samples/sec   Loss 2.6466   LearningRate 0.0103   Epoch: 13   Global Step: 168610   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:20:34,739-Speed 3286.01 samples/sec   Loss 2.6277   LearningRate 0.0103   Epoch: 13   Global Step: 168620   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:20:37,829-Speed 3314.13 samples/sec   Loss 2.5548   LearningRate 0.0103   Epoch: 13   Global Step: 168630   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:20:41,028-Speed 3201.79 samples/sec   Loss 2.5911   LearningRate 0.0103   Epoch: 13   Global Step: 168640   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:20:44,134-Speed 3298.73 samples/sec   Loss 2.6875   LearningRate 0.0103   Epoch: 13   Global Step: 168650   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:20:47,248-Speed 3289.58 samples/sec   Loss 2.6189   LearningRate 0.0103   Epoch: 13   Global Step: 168660   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:20:50,364-Speed 3286.98 samples/sec   Loss 2.6201   LearningRate 0.0103   Epoch: 13   Global Step: 168670   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:20:53,484-Speed 3282.55 samples/sec   Loss 2.5963   LearningRate 0.0103   Epoch: 13   Global Step: 168680   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:20:56,557-Speed 3333.81 samples/sec   Loss 2.6139   LearningRate 0.0103   Epoch: 13   Global Step: 168690   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:20:59,629-Speed 3334.16 samples/sec   Loss 2.6358   LearningRate 0.0103   Epoch: 13   Global Step: 168700   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:21:02,729-Speed 3304.95 samples/sec   Loss 2.5957   LearningRate 0.0103   Epoch: 13   Global Step: 168710   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:21:05,846-Speed 3286.27 samples/sec   Loss 2.5674   LearningRate 0.0103   Epoch: 13   Global Step: 168720   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:21:08,950-Speed 3299.01 samples/sec   Loss 2.5730   LearningRate 0.0103   Epoch: 13   Global Step: 168730   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:21:12,043-Speed 3311.84 samples/sec   Loss 2.6473   LearningRate 0.0103   Epoch: 13   Global Step: 168740   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:21:15,206-Speed 3239.05 samples/sec   Loss 2.5789   LearningRate 0.0103   Epoch: 13   Global Step: 168750   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:21:18,309-Speed 3300.82 samples/sec   Loss 2.5993   LearningRate 0.0103   Epoch: 13   Global Step: 168760   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-04-27 16:21:21,375-Speed 3340.76 samples/sec   Loss 2.7084   LearningRate 0.0103   Epoch: 13   Global Step: 168770   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:21:24,478-Speed 3301.63 samples/sec   Loss 2.5872   LearningRate 0.0103   Epoch: 13   Global Step: 168780   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:21:27,595-Speed 3285.79 samples/sec   Loss 2.5236   LearningRate 0.0103   Epoch: 13   Global Step: 168790   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:21:30,734-Speed 3263.50 samples/sec   Loss 2.5832   LearningRate 0.0103   Epoch: 13   Global Step: 168800   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:21:33,879-Speed 3257.14 samples/sec   Loss 2.5675   LearningRate 0.0103   Epoch: 13   Global Step: 168810   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:21:37,030-Speed 3250.56 samples/sec   Loss 2.5771   LearningRate 0.0103   Epoch: 13   Global Step: 168820   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:21:40,154-Speed 3278.44 samples/sec   Loss 2.5916   LearningRate 0.0103   Epoch: 13   Global Step: 168830   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:21:43,323-Speed 3233.00 samples/sec   Loss 2.5956   LearningRate 0.0103   Epoch: 13   Global Step: 168840   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:21:46,459-Speed 3266.37 samples/sec   Loss 2.6245   LearningRate 0.0103   Epoch: 13   Global Step: 168850   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:21:49,593-Speed 3268.04 samples/sec   Loss 2.6055   LearningRate 0.0103   Epoch: 13   Global Step: 168860   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:21:52,760-Speed 3234.06 samples/sec   Loss 2.6098   LearningRate 0.0103   Epoch: 13   Global Step: 168870   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:21:55,852-Speed 3313.50 samples/sec   Loss 2.6307   LearningRate 0.0103   Epoch: 13   Global Step: 168880   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:21:58,945-Speed 3310.90 samples/sec   Loss 2.6119   LearningRate 0.0102   Epoch: 13   Global Step: 168890   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:22:02,017-Speed 3334.79 samples/sec   Loss 2.5553   LearningRate 0.0102   Epoch: 13   Global Step: 168900   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:22:05,119-Speed 3301.52 samples/sec   Loss 2.5783   LearningRate 0.0102   Epoch: 13   Global Step: 168910   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:22:08,180-Speed 3346.48 samples/sec   Loss 2.6466   LearningRate 0.0102   Epoch: 13   Global Step: 168920   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:22:11,285-Speed 3298.85 samples/sec   Loss 2.5790   LearningRate 0.0102   Epoch: 13   Global Step: 168930   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:22:14,493-Speed 3193.41 samples/sec   Loss 2.5735   LearningRate 0.0102   Epoch: 13   Global Step: 168940   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:22:17,578-Speed 3320.00 samples/sec   Loss 2.6253   LearningRate 0.0102   Epoch: 13   Global Step: 168950   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:22:20,677-Speed 3305.29 samples/sec   Loss 2.5656   LearningRate 0.0102   Epoch: 13   Global Step: 168960   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:22:23,803-Speed 3277.21 samples/sec   Loss 2.5891   LearningRate 0.0102   Epoch: 13   Global Step: 168970   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:22:26,884-Speed 3324.95 samples/sec   Loss 2.6338   LearningRate 0.0102   Epoch: 13   Global Step: 168980   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:22:30,013-Speed 3273.53 samples/sec   Loss 2.5714   LearningRate 0.0102   Epoch: 13   Global Step: 168990   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:22:33,063-Speed 3358.26 samples/sec   Loss 2.6191   LearningRate 0.0102   Epoch: 13   Global Step: 169000   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:22:36,164-Speed 3303.77 samples/sec   Loss 2.6711   LearningRate 0.0102   Epoch: 13   Global Step: 169010   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:22:39,296-Speed 3269.86 samples/sec   Loss 2.6403   LearningRate 0.0102   Epoch: 13   Global Step: 169020   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:22:42,380-Speed 3321.78 samples/sec   Loss 2.5482   LearningRate 0.0102   Epoch: 13   Global Step: 169030   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:22:45,451-Speed 3335.31 samples/sec   Loss 2.5261   LearningRate 0.0102   Epoch: 13   Global Step: 169040   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:22:48,620-Speed 3232.39 samples/sec   Loss 2.5888   LearningRate 0.0102   Epoch: 13   Global Step: 169050   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:22:51,741-Speed 3282.16 samples/sec   Loss 2.5579   LearningRate 0.0102   Epoch: 13   Global Step: 169060   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:22:54,872-Speed 3271.29 samples/sec   Loss 2.6131   LearningRate 0.0102   Epoch: 13   Global Step: 169070   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:22:57,947-Speed 3331.67 samples/sec   Loss 2.6225   LearningRate 0.0102   Epoch: 13   Global Step: 169080   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:23:01,139-Speed 3208.46 samples/sec   Loss 2.5493   LearningRate 0.0102   Epoch: 13   Global Step: 169090   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:23:04,225-Speed 3319.48 samples/sec   Loss 2.5684   LearningRate 0.0102   Epoch: 13   Global Step: 169100   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:23:07,411-Speed 3215.55 samples/sec   Loss 2.6116   LearningRate 0.0102   Epoch: 13   Global Step: 169110   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:23:10,477-Speed 3341.08 samples/sec   Loss 2.5619   LearningRate 0.0102   Epoch: 13   Global Step: 169120   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:23:13,577-Speed 3304.21 samples/sec   Loss 2.6387   LearningRate 0.0102   Epoch: 13   Global Step: 169130   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:23:16,650-Speed 3333.47 samples/sec   Loss 2.5783   LearningRate 0.0102   Epoch: 13   Global Step: 169140   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:23:19,760-Speed 3293.07 samples/sec   Loss 2.6016   LearningRate 0.0102   Epoch: 13   Global Step: 169150   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:23:22,879-Speed 3284.32 samples/sec   Loss 2.6553   LearningRate 0.0102   Epoch: 13   Global Step: 169160   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:23:26,096-Speed 3183.62 samples/sec   Loss 2.6454   LearningRate 0.0102   Epoch: 13   Global Step: 169170   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:23:29,188-Speed 3312.77 samples/sec   Loss 2.6510   LearningRate 0.0102   Epoch: 13   Global Step: 169180   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:23:32,272-Speed 3322.28 samples/sec   Loss 2.5840   LearningRate 0.0102   Epoch: 13   Global Step: 169190   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:23:35,346-Speed 3332.10 samples/sec   Loss 2.6402   LearningRate 0.0102   Epoch: 13   Global Step: 169200   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:23:38,424-Speed 3327.75 samples/sec   Loss 2.6291   LearningRate 0.0102   Epoch: 13   Global Step: 169210   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:23:41,541-Speed 3285.61 samples/sec   Loss 2.5966   LearningRate 0.0102   Epoch: 13   Global Step: 169220   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:23:44,608-Speed 3340.62 samples/sec   Loss 2.6032   LearningRate 0.0102   Epoch: 13   Global Step: 169230   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:23:47,723-Speed 3287.97 samples/sec   Loss 2.6654   LearningRate 0.0102   Epoch: 13   Global Step: 169240   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:23:50,846-Speed 3280.25 samples/sec   Loss 2.5810   LearningRate 0.0102   Epoch: 13   Global Step: 169250   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:23:53,985-Speed 3263.29 samples/sec   Loss 2.6252   LearningRate 0.0102   Epoch: 13   Global Step: 169260   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:23:57,039-Speed 3353.81 samples/sec   Loss 2.6723   LearningRate 0.0102   Epoch: 13   Global Step: 169270   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:24:00,193-Speed 3248.36 samples/sec   Loss 2.6064   LearningRate 0.0101   Epoch: 13   Global Step: 169280   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:24:03,338-Speed 3256.24 samples/sec   Loss 2.7093   LearningRate 0.0101   Epoch: 13   Global Step: 169290   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:24:06,503-Speed 3236.49 samples/sec   Loss 2.5879   LearningRate 0.0101   Epoch: 13   Global Step: 169300   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:24:09,604-Speed 3303.58 samples/sec   Loss 2.6667   LearningRate 0.0101   Epoch: 13   Global Step: 169310   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:24:12,678-Speed 3332.23 samples/sec   Loss 2.6968   LearningRate 0.0101   Epoch: 13   Global Step: 169320   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:24:15,883-Speed 3195.73 samples/sec   Loss 2.6396   LearningRate 0.0101   Epoch: 13   Global Step: 169330   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:24:18,942-Speed 3348.92 samples/sec   Loss 2.6141   LearningRate 0.0101   Epoch: 13   Global Step: 169340   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:24:22,078-Speed 3265.69 samples/sec   Loss 2.6382   LearningRate 0.0101   Epoch: 13   Global Step: 169350   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:24:25,276-Speed 3203.30 samples/sec   Loss 2.5566   LearningRate 0.0101   Epoch: 13   Global Step: 169360   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:24:28,507-Speed 3169.85 samples/sec   Loss 2.6310   LearningRate 0.0101   Epoch: 13   Global Step: 169370   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:24:31,659-Speed 3250.41 samples/sec   Loss 2.6249   LearningRate 0.0101   Epoch: 13   Global Step: 169380   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:24:34,831-Speed 3228.79 samples/sec   Loss 2.5223   LearningRate 0.0101   Epoch: 13   Global Step: 169390   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:24:37,997-Speed 3235.53 samples/sec   Loss 2.5641   LearningRate 0.0101   Epoch: 13   Global Step: 169400   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:24:41,089-Speed 3312.40 samples/sec   Loss 2.5721   LearningRate 0.0101   Epoch: 13   Global Step: 169410   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:24:44,174-Speed 3320.71 samples/sec   Loss 2.5938   LearningRate 0.0101   Epoch: 13   Global Step: 169420   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:24:47,260-Speed 3319.70 samples/sec   Loss 2.6359   LearningRate 0.0101   Epoch: 13   Global Step: 169430   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:24:50,398-Speed 3264.50 samples/sec   Loss 2.5812   LearningRate 0.0101   Epoch: 13   Global Step: 169440   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:24:53,559-Speed 3240.44 samples/sec   Loss 2.6146   LearningRate 0.0101   Epoch: 13   Global Step: 169450   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:24:56,693-Speed 3268.38 samples/sec   Loss 2.6430   LearningRate 0.0101   Epoch: 13   Global Step: 169460   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:24:59,827-Speed 3268.34 samples/sec   Loss 2.6633   LearningRate 0.0101   Epoch: 13   Global Step: 169470   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:25:02,942-Speed 3288.42 samples/sec   Loss 2.6145   LearningRate 0.0101   Epoch: 13   Global Step: 169480   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:25:06,070-Speed 3274.94 samples/sec   Loss 2.6013   LearningRate 0.0101   Epoch: 13   Global Step: 169490   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:25:09,163-Speed 3311.63 samples/sec   Loss 2.6129   LearningRate 0.0101   Epoch: 13   Global Step: 169500   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:25:12,319-Speed 3245.09 samples/sec   Loss 2.6404   LearningRate 0.0101   Epoch: 13   Global Step: 169510   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:25:15,489-Speed 3231.49 samples/sec   Loss 2.6062   LearningRate 0.0101   Epoch: 13   Global Step: 169520   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:25:18,585-Speed 3308.18 samples/sec   Loss 2.6382   LearningRate 0.0101   Epoch: 13   Global Step: 169530   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:25:21,672-Speed 3318.27 samples/sec   Loss 2.5101   LearningRate 0.0101   Epoch: 13   Global Step: 169540   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:25:24,777-Speed 3299.01 samples/sec   Loss 2.6188   LearningRate 0.0101   Epoch: 13   Global Step: 169550   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:25:27,922-Speed 3257.05 samples/sec   Loss 2.5977   LearningRate 0.0101   Epoch: 13   Global Step: 169560   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:25:31,036-Speed 3289.32 samples/sec   Loss 2.6076   LearningRate 0.0101   Epoch: 13   Global Step: 169570   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:25:34,172-Speed 3266.13 samples/sec   Loss 2.6376   LearningRate 0.0101   Epoch: 13   Global Step: 169580   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:25:37,287-Speed 3289.11 samples/sec   Loss 2.5824   LearningRate 0.0101   Epoch: 13   Global Step: 169590   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:25:40,511-Speed 3177.25 samples/sec   Loss 2.6315   LearningRate 0.0101   Epoch: 13   Global Step: 169600   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:25:43,649-Speed 3264.46 samples/sec   Loss 2.6269   LearningRate 0.0101   Epoch: 13   Global Step: 169610   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:25:46,729-Speed 3325.00 samples/sec   Loss 2.6325   LearningRate 0.0101   Epoch: 13   Global Step: 169620   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:25:49,816-Speed 3319.14 samples/sec   Loss 2.6560   LearningRate 0.0101   Epoch: 13   Global Step: 169630   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:25:52,898-Speed 3322.79 samples/sec   Loss 2.5736   LearningRate 0.0101   Epoch: 13   Global Step: 169640   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:25:55,966-Speed 3339.47 samples/sec   Loss 2.5566   LearningRate 0.0101   Epoch: 13   Global Step: 169650   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:25:59,041-Speed 3331.12 samples/sec   Loss 2.5695   LearningRate 0.0101   Epoch: 13   Global Step: 169660   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:26:02,199-Speed 3243.63 samples/sec   Loss 2.5677   LearningRate 0.0100   Epoch: 13   Global Step: 169670   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:26:05,273-Speed 3332.31 samples/sec   Loss 2.5782   LearningRate 0.0100   Epoch: 13   Global Step: 169680   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:26:08,361-Speed 3316.73 samples/sec   Loss 2.6170   LearningRate 0.0100   Epoch: 13   Global Step: 169690   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:26:11,515-Speed 3247.82 samples/sec   Loss 2.6179   LearningRate 0.0100   Epoch: 13   Global Step: 169700   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:26:14,610-Speed 3309.62 samples/sec   Loss 2.6083   LearningRate 0.0100   Epoch: 13   Global Step: 169710   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:26:17,700-Speed 3314.83 samples/sec   Loss 2.5279   LearningRate 0.0100   Epoch: 13   Global Step: 169720   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:26:20,769-Speed 3338.25 samples/sec   Loss 2.6556   LearningRate 0.0100   Epoch: 13   Global Step: 169730   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:26:23,847-Speed 3327.84 samples/sec   Loss 2.6074   LearningRate 0.0100   Epoch: 13   Global Step: 169740   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:26:26,961-Speed 3289.17 samples/sec   Loss 2.5694   LearningRate 0.0100   Epoch: 13   Global Step: 169750   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:26:30,064-Speed 3301.14 samples/sec   Loss 2.6844   LearningRate 0.0100   Epoch: 13   Global Step: 169760   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:26:33,135-Speed 3335.46 samples/sec   Loss 2.6070   LearningRate 0.0100   Epoch: 13   Global Step: 169770   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:26:36,293-Speed 3243.11 samples/sec   Loss 2.6482   LearningRate 0.0100   Epoch: 13   Global Step: 169780   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:26:39,384-Speed 3313.47 samples/sec   Loss 2.6321   LearningRate 0.0100   Epoch: 13   Global Step: 169790   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:26:42,466-Speed 3324.34 samples/sec   Loss 2.6162   LearningRate 0.0100   Epoch: 13   Global Step: 169800   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:26:45,585-Speed 3284.29 samples/sec   Loss 2.6387   LearningRate 0.0100   Epoch: 13   Global Step: 169810   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:26:48,724-Speed 3262.52 samples/sec   Loss 2.6334   LearningRate 0.0100   Epoch: 13   Global Step: 169820   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:26:51,845-Speed 3282.77 samples/sec   Loss 2.6566   LearningRate 0.0100   Epoch: 13   Global Step: 169830   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:26:54,987-Speed 3260.32 samples/sec   Loss 2.6821   LearningRate 0.0100   Epoch: 13   Global Step: 169840   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:26:58,038-Speed 3356.88 samples/sec   Loss 2.6923   LearningRate 0.0100   Epoch: 13   Global Step: 169850   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:27:01,096-Speed 3349.38 samples/sec   Loss 2.6149   LearningRate 0.0100   Epoch: 13   Global Step: 169860   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:27:04,190-Speed 3310.61 samples/sec   Loss 2.6014   LearningRate 0.0100   Epoch: 13   Global Step: 169870   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:27:07,367-Speed 3224.77 samples/sec   Loss 2.5890   LearningRate 0.0100   Epoch: 13   Global Step: 169880   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:27:10,429-Speed 3344.71 samples/sec   Loss 2.7073   LearningRate 0.0100   Epoch: 13   Global Step: 169890   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:27:13,531-Speed 3302.77 samples/sec   Loss 2.5996   LearningRate 0.0100   Epoch: 13   Global Step: 169900   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:27:16,645-Speed 3289.79 samples/sec   Loss 2.6506   LearningRate 0.0100   Epoch: 13   Global Step: 169910   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:27:19,736-Speed 3313.55 samples/sec   Loss 2.6336   LearningRate 0.0100   Epoch: 13   Global Step: 169920   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:27:22,830-Speed 3311.00 samples/sec   Loss 2.6389   LearningRate 0.0100   Epoch: 13   Global Step: 169930   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:27:25,883-Speed 3355.23 samples/sec   Loss 2.5981   LearningRate 0.0100   Epoch: 13   Global Step: 169940   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:27:28,956-Speed 3333.22 samples/sec   Loss 2.5897   LearningRate 0.0100   Epoch: 13   Global Step: 169950   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:27:32,058-Speed 3302.58 samples/sec   Loss 2.6097   LearningRate 0.0100   Epoch: 13   Global Step: 169960   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:27:35,145-Speed 3318.16 samples/sec   Loss 2.5965   LearningRate 0.0100   Epoch: 13   Global Step: 169970   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:27:38,346-Speed 3200.23 samples/sec   Loss 2.6378   LearningRate 0.0100   Epoch: 13   Global Step: 169980   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:27:41,474-Speed 3274.27 samples/sec   Loss 2.6224   LearningRate 0.0100   Epoch: 13   Global Step: 169990   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:27:44,570-Speed 3308.69 samples/sec   Loss 2.6149   LearningRate 0.0100   Epoch: 13   Global Step: 170000   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:27:47,675-Speed 3298.39 samples/sec   Loss 2.6604   LearningRate 0.0100   Epoch: 13   Global Step: 170010   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:27:50,758-Speed 3322.41 samples/sec   Loss 2.5618   LearningRate 0.0100   Epoch: 13   Global Step: 170020   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:27:53,870-Speed 3291.77 samples/sec   Loss 2.5681   LearningRate 0.0100   Epoch: 13   Global Step: 170030   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:27:56,989-Speed 3284.08 samples/sec   Loss 2.5561   LearningRate 0.0100   Epoch: 13   Global Step: 170040   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:28:00,082-Speed 3311.41 samples/sec   Loss 2.6363   LearningRate 0.0100   Epoch: 13   Global Step: 170050   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:28:03,238-Speed 3246.47 samples/sec   Loss 2.5960   LearningRate 0.0099   Epoch: 13   Global Step: 170060   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:28:06,311-Speed 3332.63 samples/sec   Loss 2.6495   LearningRate 0.0099   Epoch: 13   Global Step: 170070   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:28:09,391-Speed 3326.23 samples/sec   Loss 2.5658   LearningRate 0.0099   Epoch: 13   Global Step: 170080   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:28:12,500-Speed 3294.98 samples/sec   Loss 2.6518   LearningRate 0.0099   Epoch: 13   Global Step: 170090   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:28:15,653-Speed 3247.92 samples/sec   Loss 2.6544   LearningRate 0.0099   Epoch: 13   Global Step: 170100   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:28:18,748-Speed 3310.91 samples/sec   Loss 2.5972   LearningRate 0.0099   Epoch: 13   Global Step: 170110   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:28:21,819-Speed 3335.40 samples/sec   Loss 2.6644   LearningRate 0.0099   Epoch: 13   Global Step: 170120   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:28:24,919-Speed 3303.33 samples/sec   Loss 2.6238   LearningRate 0.0099   Epoch: 13   Global Step: 170130   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:28:28,006-Speed 3319.08 samples/sec   Loss 2.6035   LearningRate 0.0099   Epoch: 13   Global Step: 170140   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:28:31,200-Speed 3206.58 samples/sec   Loss 2.6035   LearningRate 0.0099   Epoch: 13   Global Step: 170150   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:28:34,237-Speed 3372.57 samples/sec   Loss 2.5789   LearningRate 0.0099   Epoch: 13   Global Step: 170160   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:28:37,323-Speed 3319.92 samples/sec   Loss 2.6295   LearningRate 0.0099   Epoch: 13   Global Step: 170170   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:28:40,450-Speed 3275.52 samples/sec   Loss 2.5777   LearningRate 0.0099   Epoch: 13   Global Step: 170180   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:28:43,582-Speed 3269.93 samples/sec   Loss 2.5800   LearningRate 0.0099   Epoch: 13   Global Step: 170190   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:28:46,674-Speed 3312.73 samples/sec   Loss 2.6617   LearningRate 0.0099   Epoch: 13   Global Step: 170200   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:28:49,755-Speed 3325.39 samples/sec   Loss 2.6134   LearningRate 0.0099   Epoch: 13   Global Step: 170210   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:28:52,887-Speed 3270.22 samples/sec   Loss 2.6583   LearningRate 0.0099   Epoch: 13   Global Step: 170220   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:28:55,980-Speed 3311.32 samples/sec   Loss 2.6504   LearningRate 0.0099   Epoch: 13   Global Step: 170230   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:28:59,044-Speed 3343.10 samples/sec   Loss 2.5980   LearningRate 0.0099   Epoch: 13   Global Step: 170240   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:29:02,172-Speed 3274.37 samples/sec   Loss 2.6214   LearningRate 0.0099   Epoch: 13   Global Step: 170250   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:29:05,274-Speed 3302.31 samples/sec   Loss 2.6133   LearningRate 0.0099   Epoch: 13   Global Step: 170260   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:29:08,344-Speed 3337.26 samples/sec   Loss 2.6036   LearningRate 0.0099   Epoch: 13   Global Step: 170270   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:29:11,403-Speed 3347.77 samples/sec   Loss 2.5857   LearningRate 0.0099   Epoch: 13   Global Step: 170280   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:29:14,477-Speed 3332.01 samples/sec   Loss 2.5640   LearningRate 0.0099   Epoch: 13   Global Step: 170290   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:29:17,590-Speed 3291.00 samples/sec   Loss 2.6688   LearningRate 0.0099   Epoch: 13   Global Step: 170300   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:29:20,699-Speed 3294.61 samples/sec   Loss 2.5421   LearningRate 0.0099   Epoch: 13   Global Step: 170310   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:29:23,812-Speed 3290.92 samples/sec   Loss 2.5698   LearningRate 0.0099   Epoch: 13   Global Step: 170320   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:29:26,963-Speed 3250.88 samples/sec   Loss 2.6040   LearningRate 0.0099   Epoch: 13   Global Step: 170330   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:29:30,151-Speed 3212.67 samples/sec   Loss 2.6226   LearningRate 0.0099   Epoch: 13   Global Step: 170340   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:29:33,204-Speed 3355.01 samples/sec   Loss 2.5899   LearningRate 0.0099   Epoch: 13   Global Step: 170350   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:29:36,323-Speed 3284.56 samples/sec   Loss 2.6728   LearningRate 0.0099   Epoch: 13   Global Step: 170360   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:29:39,536-Speed 3187.64 samples/sec   Loss 2.6071   LearningRate 0.0099   Epoch: 13   Global Step: 170370   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:29:42,651-Speed 3288.64 samples/sec   Loss 2.6097   LearningRate 0.0099   Epoch: 13   Global Step: 170380   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:29:45,726-Speed 3330.96 samples/sec   Loss 2.6693   LearningRate 0.0099   Epoch: 13   Global Step: 170390   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:29:48,815-Speed 3315.77 samples/sec   Loss 2.5882   LearningRate 0.0099   Epoch: 13   Global Step: 170400   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:29:51,859-Speed 3365.24 samples/sec   Loss 2.5847   LearningRate 0.0099   Epoch: 13   Global Step: 170410   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:29:54,967-Speed 3296.21 samples/sec   Loss 2.6241   LearningRate 0.0099   Epoch: 13   Global Step: 170420   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:29:58,040-Speed 3332.72 samples/sec   Loss 2.6002   LearningRate 0.0099   Epoch: 13   Global Step: 170430   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:30:01,121-Speed 3325.02 samples/sec   Loss 2.5713   LearningRate 0.0099   Epoch: 13   Global Step: 170440   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:30:04,213-Speed 3313.05 samples/sec   Loss 2.5702   LearningRate 0.0099   Epoch: 13   Global Step: 170450   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:30:07,368-Speed 3246.22 samples/sec   Loss 2.6087   LearningRate 0.0098   Epoch: 13   Global Step: 170460   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:30:10,445-Speed 3328.72 samples/sec   Loss 2.6970   LearningRate 0.0098   Epoch: 13   Global Step: 170470   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:30:13,596-Speed 3251.52 samples/sec   Loss 2.6327   LearningRate 0.0098   Epoch: 13   Global Step: 170480   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:30:16,760-Speed 3237.24 samples/sec   Loss 2.6188   LearningRate 0.0098   Epoch: 13   Global Step: 170490   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:30:19,831-Speed 3334.80 samples/sec   Loss 2.5449   LearningRate 0.0098   Epoch: 13   Global Step: 170500   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:30:22,944-Speed 3290.85 samples/sec   Loss 2.6028   LearningRate 0.0098   Epoch: 13   Global Step: 170510   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:30:26,061-Speed 3286.14 samples/sec   Loss 2.6103   LearningRate 0.0098   Epoch: 13   Global Step: 170520   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:30:29,179-Speed 3285.43 samples/sec   Loss 2.6711   LearningRate 0.0098   Epoch: 13   Global Step: 170530   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:30:32,255-Speed 3330.34 samples/sec   Loss 2.6239   LearningRate 0.0098   Epoch: 13   Global Step: 170540   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:30:35,414-Speed 3241.91 samples/sec   Loss 2.5981   LearningRate 0.0098   Epoch: 13   Global Step: 170550   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:30:38,519-Speed 3299.73 samples/sec   Loss 2.5810   LearningRate 0.0098   Epoch: 13   Global Step: 170560   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:30:41,595-Speed 3329.90 samples/sec   Loss 2.6371   LearningRate 0.0098   Epoch: 13   Global Step: 170570   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:30:44,667-Speed 3333.77 samples/sec   Loss 2.6915   LearningRate 0.0098   Epoch: 13   Global Step: 170580   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:30:47,802-Speed 3268.15 samples/sec   Loss 2.6420   LearningRate 0.0098   Epoch: 13   Global Step: 170590   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:30:50,870-Speed 3338.56 samples/sec   Loss 2.6507   LearningRate 0.0098   Epoch: 13   Global Step: 170600   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:30:53,951-Speed 3324.64 samples/sec   Loss 2.6201   LearningRate 0.0098   Epoch: 13   Global Step: 170610   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:30:57,008-Speed 3350.82 samples/sec   Loss 2.7399   LearningRate 0.0098   Epoch: 13   Global Step: 170620   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:31:00,107-Speed 3305.29 samples/sec   Loss 2.6260   LearningRate 0.0098   Epoch: 13   Global Step: 170630   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:31:03,227-Speed 3282.66 samples/sec   Loss 2.7050   LearningRate 0.0098   Epoch: 13   Global Step: 170640   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:31:06,332-Speed 3298.84 samples/sec   Loss 2.5424   LearningRate 0.0098   Epoch: 13   Global Step: 170650   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:31:09,381-Speed 3360.48 samples/sec   Loss 2.6231   LearningRate 0.0098   Epoch: 13   Global Step: 170660   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:31:12,544-Speed 3238.23 samples/sec   Loss 2.6517   LearningRate 0.0098   Epoch: 13   Global Step: 170670   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:31:15,678-Speed 3267.67 samples/sec   Loss 2.6673   LearningRate 0.0098   Epoch: 13   Global Step: 170680   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:31:18,808-Speed 3272.66 samples/sec   Loss 2.6302   LearningRate 0.0098   Epoch: 13   Global Step: 170690   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:31:21,899-Speed 3313.79 samples/sec   Loss 2.5916   LearningRate 0.0098   Epoch: 13   Global Step: 170700   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:31:24,957-Speed 3349.87 samples/sec   Loss 2.6286   LearningRate 0.0098   Epoch: 13   Global Step: 170710   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:31:28,081-Speed 3278.95 samples/sec   Loss 2.5891   LearningRate 0.0098   Epoch: 13   Global Step: 170720   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:31:31,188-Speed 3296.69 samples/sec   Loss 2.6328   LearningRate 0.0098   Epoch: 13   Global Step: 170730   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:31:34,251-Speed 3344.25 samples/sec   Loss 2.6050   LearningRate 0.0098   Epoch: 13   Global Step: 170740   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:31:37,371-Speed 3284.02 samples/sec   Loss 2.6761   LearningRate 0.0098   Epoch: 13   Global Step: 170750   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:31:40,521-Speed 3251.48 samples/sec   Loss 2.6049   LearningRate 0.0098   Epoch: 13   Global Step: 170760   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:31:43,710-Speed 3212.14 samples/sec   Loss 2.6601   LearningRate 0.0098   Epoch: 13   Global Step: 170770   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:31:46,782-Speed 3334.39 samples/sec   Loss 2.5760   LearningRate 0.0098   Epoch: 13   Global Step: 170780   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:31:49,868-Speed 3319.41 samples/sec   Loss 2.6021   LearningRate 0.0098   Epoch: 13   Global Step: 170790   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:31:52,972-Speed 3299.86 samples/sec   Loss 2.6281   LearningRate 0.0098   Epoch: 13   Global Step: 170800   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:31:56,028-Speed 3351.60 samples/sec   Loss 2.6340   LearningRate 0.0098   Epoch: 13   Global Step: 170810   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:31:59,148-Speed 3283.15 samples/sec   Loss 2.6630   LearningRate 0.0098   Epoch: 13   Global Step: 170820   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:32:02,309-Speed 3241.05 samples/sec   Loss 2.6468   LearningRate 0.0098   Epoch: 13   Global Step: 170830   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:32:05,412-Speed 3301.65 samples/sec   Loss 2.6219   LearningRate 0.0098   Epoch: 13   Global Step: 170840   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:32:08,521-Speed 3294.29 samples/sec   Loss 2.6408   LearningRate 0.0098   Epoch: 13   Global Step: 170850   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:32:11,612-Speed 3313.50 samples/sec   Loss 2.6799   LearningRate 0.0097   Epoch: 13   Global Step: 170860   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:32:14,712-Speed 3304.10 samples/sec   Loss 2.5506   LearningRate 0.0097   Epoch: 13   Global Step: 170870   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:32:17,813-Speed 3303.27 samples/sec   Loss 2.6754   LearningRate 0.0097   Epoch: 13   Global Step: 170880   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:32:20,907-Speed 3310.56 samples/sec   Loss 2.6338   LearningRate 0.0097   Epoch: 13   Global Step: 170890   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:32:23,999-Speed 3312.94 samples/sec   Loss 2.5983   LearningRate 0.0097   Epoch: 13   Global Step: 170900   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:32:27,054-Speed 3353.00 samples/sec   Loss 2.5892   LearningRate 0.0097   Epoch: 13   Global Step: 170910   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:32:30,148-Speed 3310.67 samples/sec   Loss 2.6506   LearningRate 0.0097   Epoch: 13   Global Step: 170920   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:32:33,251-Speed 3300.93 samples/sec   Loss 2.6608   LearningRate 0.0097   Epoch: 13   Global Step: 170930   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:32:36,374-Speed 3280.23 samples/sec   Loss 2.6468   LearningRate 0.0097   Epoch: 13   Global Step: 170940   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:32:39,535-Speed 3239.97 samples/sec   Loss 2.5855   LearningRate 0.0097   Epoch: 13   Global Step: 170950   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:32:42,621-Speed 3319.04 samples/sec   Loss 2.5961   LearningRate 0.0097   Epoch: 13   Global Step: 170960   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:32:45,695-Speed 3333.08 samples/sec   Loss 2.6311   LearningRate 0.0097   Epoch: 13   Global Step: 170970   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:32:48,863-Speed 3233.22 samples/sec   Loss 2.7172   LearningRate 0.0097   Epoch: 13   Global Step: 170980   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:32:51,969-Speed 3297.50 samples/sec   Loss 2.6428   LearningRate 0.0097   Epoch: 13   Global Step: 170990   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:32:55,129-Speed 3241.42 samples/sec   Loss 2.5483   LearningRate 0.0097   Epoch: 13   Global Step: 171000   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:32:58,206-Speed 3328.87 samples/sec   Loss 2.6121   LearningRate 0.0097   Epoch: 13   Global Step: 171010   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:33:01,288-Speed 3323.74 samples/sec   Loss 2.7292   LearningRate 0.0097   Epoch: 13   Global Step: 171020   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:33:04,438-Speed 3251.44 samples/sec   Loss 2.6441   LearningRate 0.0097   Epoch: 13   Global Step: 171030   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:33:07,558-Speed 3283.65 samples/sec   Loss 2.6413   LearningRate 0.0097   Epoch: 13   Global Step: 171040   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:33:10,621-Speed 3343.73 samples/sec   Loss 2.6426   LearningRate 0.0097   Epoch: 13   Global Step: 171050   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:33:13,815-Speed 3206.54 samples/sec   Loss 2.6873   LearningRate 0.0097   Epoch: 13   Global Step: 171060   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:33:16,944-Speed 3274.55 samples/sec   Loss 2.5664   LearningRate 0.0097   Epoch: 13   Global Step: 171070   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:33:20,070-Speed 3276.46 samples/sec   Loss 2.5299   LearningRate 0.0097   Epoch: 13   Global Step: 171080   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:33:23,130-Speed 3347.51 samples/sec   Loss 2.6531   LearningRate 0.0097   Epoch: 13   Global Step: 171090   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:33:26,214-Speed 3320.72 samples/sec   Loss 2.6759   LearningRate 0.0097   Epoch: 13   Global Step: 171100   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:33:29,288-Speed 3332.78 samples/sec   Loss 2.5960   LearningRate 0.0097   Epoch: 13   Global Step: 171110   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:33:32,387-Speed 3304.75 samples/sec   Loss 2.5783   LearningRate 0.0097   Epoch: 13   Global Step: 171120   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:33:35,474-Speed 3318.88 samples/sec   Loss 2.6686   LearningRate 0.0097   Epoch: 13   Global Step: 171130   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:33:38,636-Speed 3239.34 samples/sec   Loss 2.6508   LearningRate 0.0097   Epoch: 13   Global Step: 171140   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:33:41,772-Speed 3265.81 samples/sec   Loss 2.6689   LearningRate 0.0097   Epoch: 13   Global Step: 171150   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:33:44,873-Speed 3303.31 samples/sec   Loss 2.6769   LearningRate 0.0097   Epoch: 13   Global Step: 171160   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:33:48,022-Speed 3253.81 samples/sec   Loss 2.6670   LearningRate 0.0097   Epoch: 13   Global Step: 171170   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:33:51,100-Speed 3327.48 samples/sec   Loss 2.6474   LearningRate 0.0097   Epoch: 13   Global Step: 171180   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:33:54,297-Speed 3203.37 samples/sec   Loss 2.7149   LearningRate 0.0097   Epoch: 13   Global Step: 171190   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:33:57,378-Speed 3325.12 samples/sec   Loss 2.6780   LearningRate 0.0097   Epoch: 13   Global Step: 171200   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:34:00,463-Speed 3320.35 samples/sec   Loss 2.6366   LearningRate 0.0097   Epoch: 13   Global Step: 171210   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:34:03,664-Speed 3200.83 samples/sec   Loss 2.6665   LearningRate 0.0097   Epoch: 13   Global Step: 171220   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:34:06,759-Speed 3309.80 samples/sec   Loss 2.6233   LearningRate 0.0097   Epoch: 13   Global Step: 171230   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:34:09,858-Speed 3304.30 samples/sec   Loss 2.6561   LearningRate 0.0097   Epoch: 13   Global Step: 171240   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:34:12,962-Speed 3300.38 samples/sec   Loss 2.5795   LearningRate 0.0096   Epoch: 13   Global Step: 171250   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:34:16,163-Speed 3200.88 samples/sec   Loss 2.5776   LearningRate 0.0096   Epoch: 13   Global Step: 171260   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:34:19,289-Speed 3276.62 samples/sec   Loss 2.5753   LearningRate 0.0096   Epoch: 13   Global Step: 171270   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:34:22,391-Speed 3301.65 samples/sec   Loss 2.6744   LearningRate 0.0096   Epoch: 13   Global Step: 171280   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:34:25,485-Speed 3311.07 samples/sec   Loss 2.7278   LearningRate 0.0096   Epoch: 13   Global Step: 171290   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:34:28,572-Speed 3318.48 samples/sec   Loss 2.6499   LearningRate 0.0096   Epoch: 13   Global Step: 171300   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:34:31,657-Speed 3320.45 samples/sec   Loss 2.6418   LearningRate 0.0096   Epoch: 13   Global Step: 171310   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:34:34,772-Speed 3289.12 samples/sec   Loss 2.6240   LearningRate 0.0096   Epoch: 13   Global Step: 171320   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:34:37,874-Speed 3302.12 samples/sec   Loss 2.6090   LearningRate 0.0096   Epoch: 13   Global Step: 171330   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:34:40,938-Speed 3342.58 samples/sec   Loss 2.6064   LearningRate 0.0096   Epoch: 13   Global Step: 171340   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:34:44,089-Speed 3250.90 samples/sec   Loss 2.6021   LearningRate 0.0096   Epoch: 13   Global Step: 171350   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:34:47,206-Speed 3286.24 samples/sec   Loss 2.6298   LearningRate 0.0096   Epoch: 13   Global Step: 171360   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:34:50,271-Speed 3342.31 samples/sec   Loss 2.6803   LearningRate 0.0096   Epoch: 13   Global Step: 171370   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:34:53,409-Speed 3263.60 samples/sec   Loss 2.6594   LearningRate 0.0096   Epoch: 13   Global Step: 171380   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:34:56,515-Speed 3298.80 samples/sec   Loss 2.6933   LearningRate 0.0096   Epoch: 13   Global Step: 171390   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:34:59,661-Speed 3256.49 samples/sec   Loss 2.6181   LearningRate 0.0096   Epoch: 13   Global Step: 171400   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:35:02,862-Speed 3199.16 samples/sec   Loss 2.6597   LearningRate 0.0096   Epoch: 13   Global Step: 171410   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:35:05,979-Speed 3286.62 samples/sec   Loss 2.6578   LearningRate 0.0096   Epoch: 13   Global Step: 171420   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:35:09,063-Speed 3321.61 samples/sec   Loss 2.6087   LearningRate 0.0096   Epoch: 13   Global Step: 171430   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:35:12,240-Speed 3224.69 samples/sec   Loss 2.5822   LearningRate 0.0096   Epoch: 13   Global Step: 171440   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:35:15,336-Speed 3308.10 samples/sec   Loss 2.5801   LearningRate 0.0096   Epoch: 13   Global Step: 171450   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:35:18,431-Speed 3309.74 samples/sec   Loss 2.7104   LearningRate 0.0096   Epoch: 13   Global Step: 171460   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:35:21,492-Speed 3345.97 samples/sec   Loss 2.6797   LearningRate 0.0096   Epoch: 13   Global Step: 171470   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:35:24,591-Speed 3306.11 samples/sec   Loss 2.5962   LearningRate 0.0096   Epoch: 13   Global Step: 171480   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:35:27,756-Speed 3235.46 samples/sec   Loss 2.6502   LearningRate 0.0096   Epoch: 13   Global Step: 171490   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:35:30,835-Speed 3327.18 samples/sec   Loss 2.6990   LearningRate 0.0096   Epoch: 13   Global Step: 171500   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:35:33,918-Speed 3322.55 samples/sec   Loss 2.5740   LearningRate 0.0096   Epoch: 13   Global Step: 171510   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:35:37,011-Speed 3311.62 samples/sec   Loss 2.6409   LearningRate 0.0096   Epoch: 13   Global Step: 171520   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:35:40,132-Speed 3282.79 samples/sec   Loss 2.6707   LearningRate 0.0096   Epoch: 13   Global Step: 171530   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:35:43,286-Speed 3247.09 samples/sec   Loss 2.5860   LearningRate 0.0096   Epoch: 13   Global Step: 171540   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:35:46,343-Speed 3350.70 samples/sec   Loss 2.6007   LearningRate 0.0096   Epoch: 13   Global Step: 171550   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:35:49,446-Speed 3301.71 samples/sec   Loss 2.7038   LearningRate 0.0096   Epoch: 13   Global Step: 171560   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:35:52,522-Speed 3330.05 samples/sec   Loss 2.5774   LearningRate 0.0096   Epoch: 13   Global Step: 171570   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:35:55,623-Speed 3302.75 samples/sec   Loss 2.6109   LearningRate 0.0096   Epoch: 13   Global Step: 171580   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:35:58,704-Speed 3324.38 samples/sec   Loss 2.6910   LearningRate 0.0096   Epoch: 13   Global Step: 171590   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:36:01,897-Speed 3207.91 samples/sec   Loss 2.6750   LearningRate 0.0096   Epoch: 13   Global Step: 171600   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:36:04,960-Speed 3344.68 samples/sec   Loss 2.6605   LearningRate 0.0096   Epoch: 13   Global Step: 171610   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:36:08,029-Speed 3338.03 samples/sec   Loss 2.6757   LearningRate 0.0096   Epoch: 13   Global Step: 171620   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:36:11,162-Speed 3269.04 samples/sec   Loss 2.6131   LearningRate 0.0096   Epoch: 13   Global Step: 171630   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:36:14,274-Speed 3291.54 samples/sec   Loss 2.6636   LearningRate 0.0096   Epoch: 13   Global Step: 171640   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:36:17,332-Speed 3349.86 samples/sec   Loss 2.6383   LearningRate 0.0096   Epoch: 13   Global Step: 171650   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:36:20,406-Speed 3332.44 samples/sec   Loss 2.5580   LearningRate 0.0095   Epoch: 13   Global Step: 171660   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:36:23,501-Speed 3309.83 samples/sec   Loss 2.6325   LearningRate 0.0095   Epoch: 13   Global Step: 171670   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:36:26,647-Speed 3255.71 samples/sec   Loss 2.6021   LearningRate 0.0095   Epoch: 13   Global Step: 171680   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:36:29,746-Speed 3306.03 samples/sec   Loss 2.5388   LearningRate 0.0095   Epoch: 13   Global Step: 171690   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:36:32,868-Speed 3281.14 samples/sec   Loss 2.5965   LearningRate 0.0095   Epoch: 13   Global Step: 171700   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:36:35,967-Speed 3305.21 samples/sec   Loss 2.6139   LearningRate 0.0095   Epoch: 13   Global Step: 171710   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:36:39,051-Speed 3322.00 samples/sec   Loss 2.5732   LearningRate 0.0095   Epoch: 13   Global Step: 171720   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:36:42,132-Speed 3323.98 samples/sec   Loss 2.6209   LearningRate 0.0095   Epoch: 13   Global Step: 171730   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:36:45,207-Speed 3331.81 samples/sec   Loss 2.5991   LearningRate 0.0095   Epoch: 13   Global Step: 171740   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:36:48,241-Speed 3375.51 samples/sec   Loss 2.5499   LearningRate 0.0095   Epoch: 13   Global Step: 171750   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:36:51,345-Speed 3300.18 samples/sec   Loss 2.6361   LearningRate 0.0095   Epoch: 13   Global Step: 171760   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:36:54,443-Speed 3306.49 samples/sec   Loss 2.7104   LearningRate 0.0095   Epoch: 13   Global Step: 171770   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:36:57,520-Speed 3328.93 samples/sec   Loss 2.6023   LearningRate 0.0095   Epoch: 13   Global Step: 171780   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:37:00,602-Speed 3323.65 samples/sec   Loss 2.5772   LearningRate 0.0095   Epoch: 13   Global Step: 171790   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:37:03,712-Speed 3293.89 samples/sec   Loss 2.6446   LearningRate 0.0095   Epoch: 13   Global Step: 171800   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:37:06,812-Speed 3304.97 samples/sec   Loss 2.6470   LearningRate 0.0095   Epoch: 13   Global Step: 171810   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:37:09,860-Speed 3360.83 samples/sec   Loss 2.6519   LearningRate 0.0095   Epoch: 13   Global Step: 171820   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:37:12,950-Speed 3314.14 samples/sec   Loss 2.5984   LearningRate 0.0095   Epoch: 13   Global Step: 171830   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:37:16,069-Speed 3284.56 samples/sec   Loss 2.7208   LearningRate 0.0095   Epoch: 13   Global Step: 171840   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:37:19,191-Speed 3281.39 samples/sec   Loss 2.6176   LearningRate 0.0095   Epoch: 13   Global Step: 171850   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:37:22,288-Speed 3307.01 samples/sec   Loss 2.6856   LearningRate 0.0095   Epoch: 13   Global Step: 171860   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:37:25,465-Speed 3224.05 samples/sec   Loss 2.6357   LearningRate 0.0095   Epoch: 13   Global Step: 171870   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:37:28,584-Speed 3284.48 samples/sec   Loss 2.6031   LearningRate 0.0095   Epoch: 13   Global Step: 171880   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:37:31,662-Speed 3327.75 samples/sec   Loss 2.6199   LearningRate 0.0095   Epoch: 13   Global Step: 171890   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:37:34,725-Speed 3344.64 samples/sec   Loss 2.6263   LearningRate 0.0095   Epoch: 13   Global Step: 171900   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:37:37,902-Speed 3223.64 samples/sec   Loss 2.6767   LearningRate 0.0095   Epoch: 13   Global Step: 171910   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:37:40,992-Speed 3315.24 samples/sec   Loss 2.6992   LearningRate 0.0095   Epoch: 13   Global Step: 171920   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:37:44,056-Speed 3343.31 samples/sec   Loss 2.6610   LearningRate 0.0095   Epoch: 13   Global Step: 171930   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:37:47,134-Speed 3327.79 samples/sec   Loss 2.5944   LearningRate 0.0095   Epoch: 13   Global Step: 171940   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:37:50,264-Speed 3272.68 samples/sec   Loss 2.6151   LearningRate 0.0095   Epoch: 13   Global Step: 171950   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:37:53,326-Speed 3346.25 samples/sec   Loss 2.6058   LearningRate 0.0095   Epoch: 13   Global Step: 171960   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:37:56,368-Speed 3367.02 samples/sec   Loss 2.6382   LearningRate 0.0095   Epoch: 13   Global Step: 171970   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:37:59,486-Speed 3285.02 samples/sec   Loss 2.6687   LearningRate 0.0095   Epoch: 13   Global Step: 171980   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:38:02,567-Speed 3324.41 samples/sec   Loss 2.7209   LearningRate 0.0095   Epoch: 13   Global Step: 171990   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:38:05,635-Speed 3339.40 samples/sec   Loss 2.6572   LearningRate 0.0095   Epoch: 13   Global Step: 172000   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:38:08,700-Speed 3341.46 samples/sec   Loss 2.5914   LearningRate 0.0095   Epoch: 13   Global Step: 172010   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:38:11,804-Speed 3300.58 samples/sec   Loss 2.6224   LearningRate 0.0095   Epoch: 13   Global Step: 172020   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:38:14,928-Speed 3279.36 samples/sec   Loss 2.6691   LearningRate 0.0095   Epoch: 13   Global Step: 172030   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:38:18,050-Speed 3280.65 samples/sec   Loss 2.6460   LearningRate 0.0095   Epoch: 13   Global Step: 172040   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:38:21,137-Speed 3318.36 samples/sec   Loss 2.7280   LearningRate 0.0095   Epoch: 13   Global Step: 172050   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:38:24,236-Speed 3305.01 samples/sec   Loss 2.6422   LearningRate 0.0094   Epoch: 13   Global Step: 172060   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:38:27,427-Speed 3210.46 samples/sec   Loss 2.5895   LearningRate 0.0094   Epoch: 13   Global Step: 172070   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:38:30,529-Speed 3301.46 samples/sec   Loss 2.6301   LearningRate 0.0094   Epoch: 13   Global Step: 172080   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:38:33,594-Speed 3342.74 samples/sec   Loss 2.6652   LearningRate 0.0094   Epoch: 13   Global Step: 172090   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:38:36,671-Speed 3328.76 samples/sec   Loss 2.6287   LearningRate 0.0094   Epoch: 13   Global Step: 172100   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:38:39,750-Speed 3326.37 samples/sec   Loss 2.6483   LearningRate 0.0094   Epoch: 13   Global Step: 172110   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:38:42,848-Speed 3307.27 samples/sec   Loss 2.6521   LearningRate 0.0094   Epoch: 13   Global Step: 172120   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:38:45,918-Speed 3336.92 samples/sec   Loss 2.6567   LearningRate 0.0094   Epoch: 13   Global Step: 172130   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:38:48,999-Speed 3324.35 samples/sec   Loss 2.5928   LearningRate 0.0094   Epoch: 13   Global Step: 172140   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:38:52,099-Speed 3303.93 samples/sec   Loss 2.6485   LearningRate 0.0094   Epoch: 13   Global Step: 172150   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:38:55,260-Speed 3241.04 samples/sec   Loss 2.6676   LearningRate 0.0094   Epoch: 13   Global Step: 172160   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:38:58,328-Speed 3337.89 samples/sec   Loss 2.5868   LearningRate 0.0094   Epoch: 13   Global Step: 172170   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:39:01,463-Speed 3267.15 samples/sec   Loss 2.6243   LearningRate 0.0094   Epoch: 13   Global Step: 172180   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:39:04,536-Speed 3333.58 samples/sec   Loss 2.6830   LearningRate 0.0094   Epoch: 13   Global Step: 172190   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:39:07,587-Speed 3357.92 samples/sec   Loss 2.6443   LearningRate 0.0094   Epoch: 13   Global Step: 172200   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:39:10,643-Speed 3351.83 samples/sec   Loss 2.6041   LearningRate 0.0094   Epoch: 13   Global Step: 172210   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:39:13,732-Speed 3315.57 samples/sec   Loss 2.6607   LearningRate 0.0094   Epoch: 13   Global Step: 172220   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:39:16,836-Speed 3300.34 samples/sec   Loss 2.6299   LearningRate 0.0094   Epoch: 13   Global Step: 172230   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:39:19,901-Speed 3341.57 samples/sec   Loss 2.5874   LearningRate 0.0094   Epoch: 13   Global Step: 172240   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:39:22,989-Speed 3318.24 samples/sec   Loss 2.6732   LearningRate 0.0094   Epoch: 13   Global Step: 172250   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:39:26,088-Speed 3305.14 samples/sec   Loss 2.6902   LearningRate 0.0094   Epoch: 13   Global Step: 172260   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:39:29,177-Speed 3315.33 samples/sec   Loss 2.6787   LearningRate 0.0094   Epoch: 13   Global Step: 172270   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:39:32,274-Speed 3307.99 samples/sec   Loss 2.6828   LearningRate 0.0094   Epoch: 13   Global Step: 172280   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:39:35,414-Speed 3261.67 samples/sec   Loss 2.6020   LearningRate 0.0094   Epoch: 13   Global Step: 172290   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:39:38,603-Speed 3212.38 samples/sec   Loss 2.6629   LearningRate 0.0094   Epoch: 13   Global Step: 172300   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:39:41,698-Speed 3310.10 samples/sec   Loss 2.6406   LearningRate 0.0094   Epoch: 13   Global Step: 172310   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:39:44,771-Speed 3333.59 samples/sec   Loss 2.5721   LearningRate 0.0094   Epoch: 13   Global Step: 172320   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:39:47,890-Speed 3283.24 samples/sec   Loss 2.6045   LearningRate 0.0094   Epoch: 13   Global Step: 172330   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:39:51,065-Speed 3227.06 samples/sec   Loss 2.6769   LearningRate 0.0094   Epoch: 13   Global Step: 172340   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:39:54,196-Speed 3271.07 samples/sec   Loss 2.6222   LearningRate 0.0094   Epoch: 13   Global Step: 172350   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:39:57,264-Speed 3338.98 samples/sec   Loss 2.6201   LearningRate 0.0094   Epoch: 13   Global Step: 172360   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:40:00,380-Speed 3287.15 samples/sec   Loss 2.6435   LearningRate 0.0094   Epoch: 13   Global Step: 172370   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:40:03,590-Speed 3191.03 samples/sec   Loss 2.6170   LearningRate 0.0094   Epoch: 13   Global Step: 172380   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:40:06,700-Speed 3294.05 samples/sec   Loss 2.6647   LearningRate 0.0094   Epoch: 13   Global Step: 172390   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:40:09,764-Speed 3342.38 samples/sec   Loss 2.6579   LearningRate 0.0094   Epoch: 13   Global Step: 172400   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:40:12,857-Speed 3312.58 samples/sec   Loss 2.6554   LearningRate 0.0094   Epoch: 13   Global Step: 172410   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:40:15,924-Speed 3340.30 samples/sec   Loss 2.6939   LearningRate 0.0094   Epoch: 13   Global Step: 172420   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:40:18,995-Speed 3334.99 samples/sec   Loss 2.6777   LearningRate 0.0094   Epoch: 13   Global Step: 172430   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:40:22,095-Speed 3304.63 samples/sec   Loss 2.5931   LearningRate 0.0094   Epoch: 13   Global Step: 172440   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:40:25,205-Speed 3293.52 samples/sec   Loss 2.6440   LearningRate 0.0094   Epoch: 13   Global Step: 172450   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:40:28,317-Speed 3291.74 samples/sec   Loss 2.6098   LearningRate 0.0093   Epoch: 13   Global Step: 172460   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:40:31,372-Speed 3353.20 samples/sec   Loss 2.6152   LearningRate 0.0093   Epoch: 13   Global Step: 172470   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:40:34,449-Speed 3328.97 samples/sec   Loss 2.7240   LearningRate 0.0093   Epoch: 13   Global Step: 172480   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:40:37,581-Speed 3270.50 samples/sec   Loss 2.6676   LearningRate 0.0093   Epoch: 13   Global Step: 172490   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:40:40,657-Speed 3329.40 samples/sec   Loss 2.6090   LearningRate 0.0093   Epoch: 13   Global Step: 172500   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:40:43,712-Speed 3352.89 samples/sec   Loss 2.6296   LearningRate 0.0093   Epoch: 13   Global Step: 172510   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:40:46,766-Speed 3354.04 samples/sec   Loss 2.5942   LearningRate 0.0093   Epoch: 13   Global Step: 172520   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:40:49,880-Speed 3289.66 samples/sec   Loss 2.6570   LearningRate 0.0093   Epoch: 13   Global Step: 172530   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:40:52,992-Speed 3291.41 samples/sec   Loss 2.6013   LearningRate 0.0093   Epoch: 13   Global Step: 172540   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:40:56,122-Speed 3272.73 samples/sec   Loss 2.5487   LearningRate 0.0093   Epoch: 13   Global Step: 172550   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:40:59,210-Speed 3316.87 samples/sec   Loss 2.5979   LearningRate 0.0093   Epoch: 13   Global Step: 172560   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:41:02,348-Speed 3265.20 samples/sec   Loss 2.6013   LearningRate 0.0093   Epoch: 13   Global Step: 172570   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:41:05,482-Speed 3268.64 samples/sec   Loss 2.6699   LearningRate 0.0093   Epoch: 13   Global Step: 172580   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:41:08,589-Speed 3296.12 samples/sec   Loss 2.6037   LearningRate 0.0093   Epoch: 13   Global Step: 172590   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:41:11,736-Speed 3255.01 samples/sec   Loss 2.5830   LearningRate 0.0093   Epoch: 13   Global Step: 172600   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:41:14,874-Speed 3264.76 samples/sec   Loss 2.6003   LearningRate 0.0093   Epoch: 13   Global Step: 172610   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:41:17,949-Speed 3330.60 samples/sec   Loss 2.5732   LearningRate 0.0093   Epoch: 13   Global Step: 172620   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:41:21,029-Speed 3325.84 samples/sec   Loss 2.6379   LearningRate 0.0093   Epoch: 13   Global Step: 172630   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:41:24,076-Speed 3361.52 samples/sec   Loss 2.6737   LearningRate 0.0093   Epoch: 13   Global Step: 172640   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:41:27,204-Speed 3274.76 samples/sec   Loss 2.6011   LearningRate 0.0093   Epoch: 13   Global Step: 172650   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:41:30,326-Speed 3280.70 samples/sec   Loss 2.6455   LearningRate 0.0093   Epoch: 13   Global Step: 172660   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:41:33,403-Speed 3328.92 samples/sec   Loss 2.6548   LearningRate 0.0093   Epoch: 13   Global Step: 172670   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:41:36,491-Speed 3317.57 samples/sec   Loss 2.6513   LearningRate 0.0093   Epoch: 13   Global Step: 172680   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:41:39,614-Speed 3279.84 samples/sec   Loss 2.5935   LearningRate 0.0093   Epoch: 13   Global Step: 172690   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:41:42,739-Speed 3277.71 samples/sec   Loss 2.6018   LearningRate 0.0093   Epoch: 13   Global Step: 172700   Fp16 Grad Scale: 4096   Required: 7 hours
Training: 2022-04-27 16:41:45,828-Speed 3316.73 samples/sec   Loss 2.6322   LearningRate 0.0093   Epoch: 13   Global Step: 172710   Fp16 Grad Scale: 4096   Required: 7 hours
Training: 2022-04-27 16:41:48,914-Speed 3318.89 samples/sec   Loss 2.5752   LearningRate 0.0093   Epoch: 13   Global Step: 172720   Fp16 Grad Scale: 4096   Required: 7 hours
Training: 2022-04-27 16:41:52,006-Speed 3313.03 samples/sec   Loss 2.6024   LearningRate 0.0093   Epoch: 13   Global Step: 172730   Fp16 Grad Scale: 4096   Required: 7 hours
Training: 2022-04-27 16:41:55,097-Speed 3314.56 samples/sec   Loss 2.5885   LearningRate 0.0093   Epoch: 13   Global Step: 172740   Fp16 Grad Scale: 4096   Required: 7 hours
Training: 2022-04-27 16:41:58,191-Speed 3310.18 samples/sec   Loss 2.6802   LearningRate 0.0093   Epoch: 13   Global Step: 172750   Fp16 Grad Scale: 4096   Required: 7 hours
Training: 2022-04-27 16:42:01,322-Speed 3272.15 samples/sec   Loss 2.6143   LearningRate 0.0093   Epoch: 13   Global Step: 172760   Fp16 Grad Scale: 4096   Required: 7 hours
Training: 2022-04-27 16:42:04,536-Speed 3186.76 samples/sec   Loss 2.7341   LearningRate 0.0093   Epoch: 13   Global Step: 172770   Fp16 Grad Scale: 4096   Required: 7 hours
Training: 2022-04-27 16:42:07,676-Speed 3261.84 samples/sec   Loss 2.6711   LearningRate 0.0093   Epoch: 13   Global Step: 172780   Fp16 Grad Scale: 4096   Required: 7 hours
Training: 2022-04-27 16:42:10,745-Speed 3337.60 samples/sec   Loss 2.6013   LearningRate 0.0093   Epoch: 13   Global Step: 172790   Fp16 Grad Scale: 4096   Required: 7 hours
Training: 2022-04-27 16:42:13,865-Speed 3283.67 samples/sec   Loss 2.6374   LearningRate 0.0093   Epoch: 13   Global Step: 172800   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:42:16,957-Speed 3312.05 samples/sec   Loss 2.5675   LearningRate 0.0093   Epoch: 13   Global Step: 172810   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:42:20,040-Speed 3322.66 samples/sec   Loss 2.6166   LearningRate 0.0093   Epoch: 13   Global Step: 172820   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:42:23,131-Speed 3314.02 samples/sec   Loss 2.6011   LearningRate 0.0093   Epoch: 13   Global Step: 172830   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:42:26,226-Speed 3309.09 samples/sec   Loss 2.6035   LearningRate 0.0093   Epoch: 13   Global Step: 172840   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:42:29,325-Speed 3305.50 samples/sec   Loss 2.6967   LearningRate 0.0093   Epoch: 13   Global Step: 172850   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:42:32,399-Speed 3332.86 samples/sec   Loss 2.6099   LearningRate 0.0093   Epoch: 13   Global Step: 172860   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:42:35,486-Speed 3317.82 samples/sec   Loss 2.6130   LearningRate 0.0092   Epoch: 13   Global Step: 172870   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:42:38,634-Speed 3254.26 samples/sec   Loss 2.6285   LearningRate 0.0092   Epoch: 13   Global Step: 172880   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:42:41,697-Speed 3344.20 samples/sec   Loss 2.6340   LearningRate 0.0092   Epoch: 13   Global Step: 172890   Fp16 Grad Scale: 8192   Required: 7 hours
Training: 2022-04-27 16:42:44,768-Speed 3335.68 samples/sec   Loss 2.7141   LearningRate 0.0092   Epoch: 13   Global Step: 172900   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:42:47,833-Speed 3342.11 samples/sec   Loss 2.5899   LearningRate 0.0092   Epoch: 13   Global Step: 172910   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:42:50,911-Speed 3327.81 samples/sec   Loss 2.5722   LearningRate 0.0092   Epoch: 13   Global Step: 172920   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:42:54,021-Speed 3294.05 samples/sec   Loss 2.5554   LearningRate 0.0092   Epoch: 13   Global Step: 172930   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:42:57,092-Speed 3335.74 samples/sec   Loss 2.6930   LearningRate 0.0092   Epoch: 13   Global Step: 172940   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:43:00,231-Speed 3263.31 samples/sec   Loss 2.6295   LearningRate 0.0092   Epoch: 13   Global Step: 172950   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:43:03,328-Speed 3307.16 samples/sec   Loss 2.5646   LearningRate 0.0092   Epoch: 13   Global Step: 172960   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:43:06,405-Speed 3329.43 samples/sec   Loss 2.6505   LearningRate 0.0092   Epoch: 13   Global Step: 172970   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:43:09,483-Speed 3327.68 samples/sec   Loss 2.6019   LearningRate 0.0092   Epoch: 13   Global Step: 172980   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:43:12,575-Speed 3312.88 samples/sec   Loss 2.5763   LearningRate 0.0092   Epoch: 13   Global Step: 172990   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:43:15,712-Speed 3265.67 samples/sec   Loss 2.6889   LearningRate 0.0092   Epoch: 13   Global Step: 173000   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:43:18,787-Speed 3330.01 samples/sec   Loss 2.6512   LearningRate 0.0092   Epoch: 13   Global Step: 173010   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:43:21,852-Speed 3342.13 samples/sec   Loss 2.6326   LearningRate 0.0092   Epoch: 13   Global Step: 173020   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:43:25,551-Speed 2769.40 samples/sec   Loss 2.6997   LearningRate 0.0092   Epoch: 13   Global Step: 173030   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:43:28,615-Speed 3342.65 samples/sec   Loss 2.6239   LearningRate 0.0092   Epoch: 13   Global Step: 173040   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:43:31,675-Speed 3347.44 samples/sec   Loss 2.6946   LearningRate 0.0092   Epoch: 13   Global Step: 173050   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:43:34,738-Speed 3343.80 samples/sec   Loss 2.6514   LearningRate 0.0092   Epoch: 13   Global Step: 173060   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:43:37,837-Speed 3305.52 samples/sec   Loss 2.6105   LearningRate 0.0092   Epoch: 13   Global Step: 173070   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:43:40,931-Speed 3311.02 samples/sec   Loss 2.6339   LearningRate 0.0092   Epoch: 13   Global Step: 173080   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:43:44,060-Speed 3273.45 samples/sec   Loss 2.5804   LearningRate 0.0092   Epoch: 13   Global Step: 173090   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-04-27 16:43:47,118-Speed 3349.48 samples/sec   Loss 2.6941   LearningRate 0.0092   Epoch: 13   Global Step: 173100   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:43:50,305-Speed 3214.57 samples/sec   Loss 2.6192   LearningRate 0.0092   Epoch: 13   Global Step: 173110   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:43:53,490-Speed 3216.24 samples/sec   Loss 2.6448   LearningRate 0.0092   Epoch: 13   Global Step: 173120   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:43:56,550-Speed 3347.50 samples/sec   Loss 2.6511   LearningRate 0.0092   Epoch: 13   Global Step: 173130   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:43:59,656-Speed 3297.63 samples/sec   Loss 2.6193   LearningRate 0.0092   Epoch: 13   Global Step: 173140   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-04-27 16:44:02,842-Speed 3215.50 samples/sec   Loss 2.5926   LearningRate 0.0092   Epoch: 13   Global Step: 173150   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:44:05,991-Speed 3253.59 samples/sec   Loss 2.6305   LearningRate 0.0092   Epoch: 13   Global Step: 173160   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:44:09,067-Speed 3329.72 samples/sec   Loss 2.6793   LearningRate 0.0092   Epoch: 13   Global Step: 173170   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:44:12,140-Speed 3333.18 samples/sec   Loss 2.5897   LearningRate 0.0092   Epoch: 13   Global Step: 173180   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:44:15,225-Speed 3320.66 samples/sec   Loss 2.5729   LearningRate 0.0092   Epoch: 13   Global Step: 173190   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:44:18,349-Speed 3278.55 samples/sec   Loss 2.6613   LearningRate 0.0092   Epoch: 13   Global Step: 173200   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:44:21,419-Speed 3337.04 samples/sec   Loss 2.6248   LearningRate 0.0092   Epoch: 13   Global Step: 173210   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:44:24,493-Speed 3332.46 samples/sec   Loss 2.6279   LearningRate 0.0092   Epoch: 13   Global Step: 173220   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:44:27,612-Speed 3283.92 samples/sec   Loss 2.5597   LearningRate 0.0092   Epoch: 13   Global Step: 173230   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:44:30,705-Speed 3312.32 samples/sec   Loss 2.6326   LearningRate 0.0092   Epoch: 13   Global Step: 173240   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:44:33,810-Speed 3298.80 samples/sec   Loss 2.6193   LearningRate 0.0092   Epoch: 13   Global Step: 173250   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:44:36,968-Speed 3243.16 samples/sec   Loss 2.6272   LearningRate 0.0092   Epoch: 13   Global Step: 173260   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:44:40,150-Speed 3219.01 samples/sec   Loss 2.6704   LearningRate 0.0092   Epoch: 13   Global Step: 173270   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:44:43,289-Speed 3263.87 samples/sec   Loss 2.6950   LearningRate 0.0091   Epoch: 13   Global Step: 173280   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:44:46,376-Speed 3317.65 samples/sec   Loss 2.7111   LearningRate 0.0091   Epoch: 13   Global Step: 173290   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:44:49,471-Speed 3310.26 samples/sec   Loss 2.6210   LearningRate 0.0091   Epoch: 13   Global Step: 173300   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:44:52,548-Speed 3328.64 samples/sec   Loss 2.6262   LearningRate 0.0091   Epoch: 13   Global Step: 173310   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:44:55,601-Speed 3355.06 samples/sec   Loss 2.6274   LearningRate 0.0091   Epoch: 13   Global Step: 173320   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:44:58,694-Speed 3311.27 samples/sec   Loss 2.5999   LearningRate 0.0091   Epoch: 13   Global Step: 173330   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:45:01,792-Speed 3306.38 samples/sec   Loss 2.6226   LearningRate 0.0091   Epoch: 13   Global Step: 173340   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:45:04,908-Speed 3287.52 samples/sec   Loss 2.6798   LearningRate 0.0091   Epoch: 13   Global Step: 173350   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:45:08,082-Speed 3226.88 samples/sec   Loss 2.6045   LearningRate 0.0091   Epoch: 13   Global Step: 173360   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:45:11,216-Speed 3268.80 samples/sec   Loss 2.6849   LearningRate 0.0091   Epoch: 13   Global Step: 173370   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 16:45:14,329-Speed 3290.19 samples/sec   Loss 2.6791   LearningRate 0.0091   Epoch: 13   Global Step: 173380   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 16:45:17,386-Speed 3351.66 samples/sec   Loss 2.6972   LearningRate 0.0091   Epoch: 13   Global Step: 173390   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 16:45:20,506-Speed 3283.06 samples/sec   Loss 2.5980   LearningRate 0.0091   Epoch: 13   Global Step: 173400   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 16:45:23,629-Speed 3279.52 samples/sec   Loss 2.6222   LearningRate 0.0091   Epoch: 13   Global Step: 173410   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 16:45:26,760-Speed 3271.93 samples/sec   Loss 2.6519   LearningRate 0.0091   Epoch: 13   Global Step: 173420   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 16:45:29,846-Speed 3319.19 samples/sec   Loss 2.6061   LearningRate 0.0091   Epoch: 13   Global Step: 173430   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 16:45:32,904-Speed 3349.99 samples/sec   Loss 2.6141   LearningRate 0.0091   Epoch: 13   Global Step: 173440   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 16:45:35,994-Speed 3314.47 samples/sec   Loss 2.6594   LearningRate 0.0091   Epoch: 13   Global Step: 173450   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 16:45:39,187-Speed 3208.49 samples/sec   Loss 2.5741   LearningRate 0.0091   Epoch: 13   Global Step: 173460   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 16:45:42,324-Speed 3265.63 samples/sec   Loss 2.6035   LearningRate 0.0091   Epoch: 13   Global Step: 173470   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:45:45,371-Speed 3361.20 samples/sec   Loss 2.6416   LearningRate 0.0091   Epoch: 13   Global Step: 173480   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:45:48,449-Speed 3328.28 samples/sec   Loss 2.6060   LearningRate 0.0091   Epoch: 13   Global Step: 173490   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:45:51,533-Speed 3321.62 samples/sec   Loss 2.6693   LearningRate 0.0091   Epoch: 13   Global Step: 173500   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:45:54,689-Speed 3246.24 samples/sec   Loss 2.7136   LearningRate 0.0091   Epoch: 13   Global Step: 173510   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:45:57,796-Speed 3296.82 samples/sec   Loss 2.6460   LearningRate 0.0091   Epoch: 13   Global Step: 173520   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:46:00,944-Speed 3253.50 samples/sec   Loss 2.7162   LearningRate 0.0091   Epoch: 13   Global Step: 173530   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:46:04,051-Speed 3297.36 samples/sec   Loss 2.6475   LearningRate 0.0091   Epoch: 13   Global Step: 173540   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:46:07,166-Speed 3288.50 samples/sec   Loss 2.6806   LearningRate 0.0091   Epoch: 13   Global Step: 173550   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:46:10,246-Speed 3325.84 samples/sec   Loss 2.6637   LearningRate 0.0091   Epoch: 13   Global Step: 173560   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:46:13,322-Speed 3329.08 samples/sec   Loss 2.6674   LearningRate 0.0091   Epoch: 13   Global Step: 173570   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:46:16,447-Speed 3278.56 samples/sec   Loss 2.6200   LearningRate 0.0091   Epoch: 13   Global Step: 173580   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:46:19,571-Speed 3278.70 samples/sec   Loss 2.6765   LearningRate 0.0091   Epoch: 13   Global Step: 173590   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:46:22,630-Speed 3348.17 samples/sec   Loss 2.6559   LearningRate 0.0091   Epoch: 13   Global Step: 173600   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:46:25,816-Speed 3214.81 samples/sec   Loss 2.5801   LearningRate 0.0091   Epoch: 13   Global Step: 173610   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:46:28,972-Speed 3246.10 samples/sec   Loss 2.6361   LearningRate 0.0091   Epoch: 13   Global Step: 173620   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:46:32,049-Speed 3328.43 samples/sec   Loss 2.5999   LearningRate 0.0091   Epoch: 13   Global Step: 173630   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:46:35,147-Speed 3306.70 samples/sec   Loss 2.6309   LearningRate 0.0091   Epoch: 13   Global Step: 173640   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:46:38,345-Speed 3202.95 samples/sec   Loss 2.6350   LearningRate 0.0091   Epoch: 13   Global Step: 173650   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 16:46:41,410-Speed 3342.15 samples/sec   Loss 2.6452   LearningRate 0.0091   Epoch: 13   Global Step: 173660   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 16:46:44,511-Speed 3303.45 samples/sec   Loss 2.6994   LearningRate 0.0091   Epoch: 13   Global Step: 173670   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 16:46:47,619-Speed 3295.84 samples/sec   Loss 2.6171   LearningRate 0.0091   Epoch: 13   Global Step: 173680   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 16:46:50,757-Speed 3263.24 samples/sec   Loss 2.6669   LearningRate 0.0090   Epoch: 13   Global Step: 173690   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 16:46:53,890-Speed 3269.64 samples/sec   Loss 2.6094   LearningRate 0.0090   Epoch: 13   Global Step: 173700   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 16:46:56,990-Speed 3304.87 samples/sec   Loss 2.6229   LearningRate 0.0090   Epoch: 13   Global Step: 173710   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 16:47:00,115-Speed 3278.08 samples/sec   Loss 2.6574   LearningRate 0.0090   Epoch: 13   Global Step: 173720   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 16:47:03,232-Speed 3286.18 samples/sec   Loss 2.6323   LearningRate 0.0090   Epoch: 13   Global Step: 173730   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 16:47:06,443-Speed 3189.75 samples/sec   Loss 2.5589   LearningRate 0.0090   Epoch: 13   Global Step: 173740   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 16:47:09,532-Speed 3316.12 samples/sec   Loss 2.6719   LearningRate 0.0090   Epoch: 13   Global Step: 173750   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:47:12,744-Speed 3189.24 samples/sec   Loss 2.6826   LearningRate 0.0090   Epoch: 13   Global Step: 173760   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:47:15,913-Speed 3231.94 samples/sec   Loss 2.6745   LearningRate 0.0090   Epoch: 13   Global Step: 173770   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:47:19,069-Speed 3246.01 samples/sec   Loss 2.6120   LearningRate 0.0090   Epoch: 13   Global Step: 173780   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:47:22,125-Speed 3350.71 samples/sec   Loss 2.6520   LearningRate 0.0090   Epoch: 13   Global Step: 173790   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:47:25,258-Speed 3269.91 samples/sec   Loss 2.5880   LearningRate 0.0090   Epoch: 13   Global Step: 173800   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:47:28,406-Speed 3254.28 samples/sec   Loss 2.5462   LearningRate 0.0090   Epoch: 13   Global Step: 173810   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:47:31,465-Speed 3347.87 samples/sec   Loss 2.6817   LearningRate 0.0090   Epoch: 13   Global Step: 173820   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:47:34,540-Speed 3331.75 samples/sec   Loss 2.6123   LearningRate 0.0090   Epoch: 13   Global Step: 173830   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:47:37,644-Speed 3299.68 samples/sec   Loss 2.5878   LearningRate 0.0090   Epoch: 13   Global Step: 173840   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:47:40,744-Speed 3304.53 samples/sec   Loss 2.6542   LearningRate 0.0090   Epoch: 13   Global Step: 173850   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:47:43,872-Speed 3274.38 samples/sec   Loss 2.5466   LearningRate 0.0090   Epoch: 13   Global Step: 173860   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:47:47,006-Speed 3268.87 samples/sec   Loss 2.6258   LearningRate 0.0090   Epoch: 13   Global Step: 173870   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:47:50,151-Speed 3256.90 samples/sec   Loss 2.6290   LearningRate 0.0090   Epoch: 13   Global Step: 173880   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:47:53,494-Speed 3064.13 samples/sec   Loss 2.5898   LearningRate 0.0090   Epoch: 13   Global Step: 173890   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:48:25,275-Speed 322.22 samples/sec   Loss 2.1452   LearningRate 0.0090   Epoch: 14   Global Step: 173900   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:48:28,772-Speed 2928.97 samples/sec   Loss 1.8827   LearningRate 0.0090   Epoch: 14   Global Step: 173910   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:48:31,862-Speed 3315.48 samples/sec   Loss 1.8367   LearningRate 0.0090   Epoch: 14   Global Step: 173920   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:48:34,942-Speed 3325.54 samples/sec   Loss 1.8600   LearningRate 0.0090   Epoch: 14   Global Step: 173930   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:48:38,098-Speed 3245.78 samples/sec   Loss 1.9312   LearningRate 0.0090   Epoch: 14   Global Step: 173940   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:48:41,382-Speed 3119.03 samples/sec   Loss 1.8393   LearningRate 0.0090   Epoch: 14   Global Step: 173950   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:48:44,671-Speed 3113.98 samples/sec   Loss 1.9605   LearningRate 0.0090   Epoch: 14   Global Step: 173960   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:48:47,784-Speed 3291.13 samples/sec   Loss 1.8275   LearningRate 0.0090   Epoch: 14   Global Step: 173970   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:48:51,114-Speed 3075.91 samples/sec   Loss 1.9132   LearningRate 0.0090   Epoch: 14   Global Step: 173980   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:48:54,334-Speed 3181.94 samples/sec   Loss 1.8391   LearningRate 0.0090   Epoch: 14   Global Step: 173990   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:48:57,462-Speed 3274.28 samples/sec   Loss 1.8381   LearningRate 0.0090   Epoch: 14   Global Step: 174000   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:49:00,583-Speed 3282.63 samples/sec   Loss 1.9060   LearningRate 0.0090   Epoch: 14   Global Step: 174010   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:49:03,696-Speed 3289.77 samples/sec   Loss 1.8798   LearningRate 0.0090   Epoch: 14   Global Step: 174020   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:49:06,838-Speed 3260.59 samples/sec   Loss 1.8843   LearningRate 0.0090   Epoch: 14   Global Step: 174030   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:49:10,027-Speed 3211.74 samples/sec   Loss 1.8833   LearningRate 0.0090   Epoch: 14   Global Step: 174040   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:49:13,127-Speed 3303.92 samples/sec   Loss 1.8353   LearningRate 0.0090   Epoch: 14   Global Step: 174050   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:49:16,212-Speed 3320.79 samples/sec   Loss 1.8239   LearningRate 0.0090   Epoch: 14   Global Step: 174060   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:49:19,357-Speed 3257.50 samples/sec   Loss 1.8999   LearningRate 0.0090   Epoch: 14   Global Step: 174070   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:49:22,453-Speed 3307.69 samples/sec   Loss 1.9062   LearningRate 0.0090   Epoch: 14   Global Step: 174080   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:49:25,594-Speed 3260.99 samples/sec   Loss 1.9150   LearningRate 0.0090   Epoch: 14   Global Step: 174090   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:49:28,720-Speed 3277.06 samples/sec   Loss 1.9007   LearningRate 0.0090   Epoch: 14   Global Step: 174100   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:49:31,805-Speed 3320.42 samples/sec   Loss 1.8363   LearningRate 0.0089   Epoch: 14   Global Step: 174110   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:49:34,906-Speed 3302.66 samples/sec   Loss 1.8254   LearningRate 0.0089   Epoch: 14   Global Step: 174120   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:49:37,994-Speed 3317.82 samples/sec   Loss 1.8803   LearningRate 0.0089   Epoch: 14   Global Step: 174130   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:49:41,071-Speed 3328.28 samples/sec   Loss 1.8723   LearningRate 0.0089   Epoch: 14   Global Step: 174140   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:49:44,156-Speed 3320.53 samples/sec   Loss 1.8572   LearningRate 0.0089   Epoch: 14   Global Step: 174150   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:49:47,241-Speed 3321.06 samples/sec   Loss 1.8296   LearningRate 0.0089   Epoch: 14   Global Step: 174160   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:49:50,342-Speed 3303.10 samples/sec   Loss 1.8572   LearningRate 0.0089   Epoch: 14   Global Step: 174170   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:49:53,506-Speed 3236.96 samples/sec   Loss 1.8510   LearningRate 0.0089   Epoch: 14   Global Step: 174180   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:49:56,582-Speed 3330.11 samples/sec   Loss 1.8996   LearningRate 0.0089   Epoch: 14   Global Step: 174190   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:49:59,705-Speed 3279.94 samples/sec   Loss 1.8954   LearningRate 0.0089   Epoch: 14   Global Step: 174200   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:50:02,789-Speed 3321.32 samples/sec   Loss 1.9169   LearningRate 0.0089   Epoch: 14   Global Step: 174210   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:50:05,890-Speed 3303.04 samples/sec   Loss 1.8037   LearningRate 0.0089   Epoch: 14   Global Step: 174220   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:50:09,018-Speed 3275.19 samples/sec   Loss 1.9112   LearningRate 0.0089   Epoch: 14   Global Step: 174230   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:50:12,124-Speed 3297.55 samples/sec   Loss 1.9025   LearningRate 0.0089   Epoch: 14   Global Step: 174240   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:50:15,266-Speed 3260.68 samples/sec   Loss 1.9439   LearningRate 0.0089   Epoch: 14   Global Step: 174250   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:50:18,375-Speed 3294.60 samples/sec   Loss 1.9166   LearningRate 0.0089   Epoch: 14   Global Step: 174260   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:50:21,466-Speed 3314.37 samples/sec   Loss 1.8778   LearningRate 0.0089   Epoch: 14   Global Step: 174270   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:50:24,539-Speed 3333.47 samples/sec   Loss 1.8900   LearningRate 0.0089   Epoch: 14   Global Step: 174280   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:50:27,731-Speed 3208.31 samples/sec   Loss 1.8628   LearningRate 0.0089   Epoch: 14   Global Step: 174290   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:50:30,780-Speed 3359.65 samples/sec   Loss 1.9099   LearningRate 0.0089   Epoch: 14   Global Step: 174300   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:50:33,851-Speed 3336.41 samples/sec   Loss 1.8883   LearningRate 0.0089   Epoch: 14   Global Step: 174310   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:50:36,918-Speed 3339.96 samples/sec   Loss 1.9157   LearningRate 0.0089   Epoch: 14   Global Step: 174320   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:50:39,987-Speed 3337.61 samples/sec   Loss 1.8224   LearningRate 0.0089   Epoch: 14   Global Step: 174330   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:50:43,130-Speed 3258.84 samples/sec   Loss 1.8854   LearningRate 0.0089   Epoch: 14   Global Step: 174340   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:50:46,193-Speed 3344.53 samples/sec   Loss 1.8483   LearningRate 0.0089   Epoch: 14   Global Step: 174350   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:50:49,338-Speed 3256.90 samples/sec   Loss 1.8486   LearningRate 0.0089   Epoch: 14   Global Step: 174360   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:50:52,421-Speed 3321.81 samples/sec   Loss 1.8596   LearningRate 0.0089   Epoch: 14   Global Step: 174370   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:50:55,522-Speed 3303.37 samples/sec   Loss 1.9131   LearningRate 0.0089   Epoch: 14   Global Step: 174380   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:50:58,590-Speed 3338.47 samples/sec   Loss 1.8610   LearningRate 0.0089   Epoch: 14   Global Step: 174390   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:51:01,778-Speed 3213.69 samples/sec   Loss 1.9260   LearningRate 0.0089   Epoch: 14   Global Step: 174400   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:51:04,908-Speed 3272.21 samples/sec   Loss 1.8851   LearningRate 0.0089   Epoch: 14   Global Step: 174410   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:51:08,005-Speed 3307.54 samples/sec   Loss 1.9393   LearningRate 0.0089   Epoch: 14   Global Step: 174420   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:51:11,078-Speed 3333.57 samples/sec   Loss 1.8572   LearningRate 0.0089   Epoch: 14   Global Step: 174430   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:51:14,190-Speed 3290.85 samples/sec   Loss 1.8995   LearningRate 0.0089   Epoch: 14   Global Step: 174440   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:51:17,286-Speed 3308.98 samples/sec   Loss 1.9366   LearningRate 0.0089   Epoch: 14   Global Step: 174450   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:51:20,366-Speed 3325.56 samples/sec   Loss 1.8587   LearningRate 0.0089   Epoch: 14   Global Step: 174460   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:51:23,432-Speed 3341.91 samples/sec   Loss 1.9558   LearningRate 0.0089   Epoch: 14   Global Step: 174470   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:51:26,628-Speed 3204.55 samples/sec   Loss 1.9233   LearningRate 0.0089   Epoch: 14   Global Step: 174480   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:51:29,763-Speed 3266.99 samples/sec   Loss 1.9157   LearningRate 0.0089   Epoch: 14   Global Step: 174490   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:51:32,860-Speed 3307.93 samples/sec   Loss 1.9011   LearningRate 0.0089   Epoch: 14   Global Step: 174500   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:51:36,029-Speed 3232.15 samples/sec   Loss 1.9233   LearningRate 0.0089   Epoch: 14   Global Step: 174510   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:51:39,126-Speed 3307.40 samples/sec   Loss 1.8283   LearningRate 0.0088   Epoch: 14   Global Step: 174520   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:51:42,361-Speed 3166.47 samples/sec   Loss 1.8732   LearningRate 0.0088   Epoch: 14   Global Step: 174530   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:51:45,476-Speed 3288.24 samples/sec   Loss 1.9256   LearningRate 0.0088   Epoch: 14   Global Step: 174540   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:51:48,598-Speed 3280.61 samples/sec   Loss 1.9216   LearningRate 0.0088   Epoch: 14   Global Step: 174550   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:51:51,848-Speed 3152.10 samples/sec   Loss 1.9517   LearningRate 0.0088   Epoch: 14   Global Step: 174560   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:51:54,967-Speed 3284.06 samples/sec   Loss 1.8934   LearningRate 0.0088   Epoch: 14   Global Step: 174570   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:51:58,085-Speed 3284.97 samples/sec   Loss 1.9525   LearningRate 0.0088   Epoch: 14   Global Step: 174580   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:52:01,229-Speed 3258.26 samples/sec   Loss 1.9440   LearningRate 0.0088   Epoch: 14   Global Step: 174590   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:52:04,320-Speed 3313.11 samples/sec   Loss 1.8890   LearningRate 0.0088   Epoch: 14   Global Step: 174600   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:52:07,394-Speed 3332.22 samples/sec   Loss 1.9613   LearningRate 0.0088   Epoch: 14   Global Step: 174610   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:52:10,451-Speed 3351.59 samples/sec   Loss 1.9283   LearningRate 0.0088   Epoch: 14   Global Step: 174620   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:52:13,613-Speed 3239.26 samples/sec   Loss 1.8897   LearningRate 0.0088   Epoch: 14   Global Step: 174630   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:52:16,687-Speed 3332.26 samples/sec   Loss 1.9560   LearningRate 0.0088   Epoch: 14   Global Step: 174640   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:52:19,734-Speed 3360.88 samples/sec   Loss 1.9391   LearningRate 0.0088   Epoch: 14   Global Step: 174650   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 16:52:22,858-Speed 3279.72 samples/sec   Loss 1.9169   LearningRate 0.0088   Epoch: 14   Global Step: 174660   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 16:52:26,031-Speed 3228.22 samples/sec   Loss 1.9197   LearningRate 0.0088   Epoch: 14   Global Step: 174670   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 16:52:29,190-Speed 3242.61 samples/sec   Loss 1.8242   LearningRate 0.0088   Epoch: 14   Global Step: 174680   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 16:52:32,244-Speed 3354.01 samples/sec   Loss 1.9704   LearningRate 0.0088   Epoch: 14   Global Step: 174690   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 16:52:35,377-Speed 3269.31 samples/sec   Loss 1.9368   LearningRate 0.0088   Epoch: 14   Global Step: 174700   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 16:52:38,538-Speed 3240.36 samples/sec   Loss 1.9760   LearningRate 0.0088   Epoch: 14   Global Step: 174710   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 16:52:41,658-Speed 3282.68 samples/sec   Loss 1.9218   LearningRate 0.0088   Epoch: 14   Global Step: 174720   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 16:52:44,756-Speed 3306.47 samples/sec   Loss 1.9568   LearningRate 0.0088   Epoch: 14   Global Step: 174730   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 16:52:47,873-Speed 3286.27 samples/sec   Loss 1.9593   LearningRate 0.0088   Epoch: 14   Global Step: 174740   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 16:52:50,958-Speed 3320.30 samples/sec   Loss 1.9070   LearningRate 0.0088   Epoch: 14   Global Step: 174750   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:52:54,084-Speed 3276.67 samples/sec   Loss 1.9156   LearningRate 0.0088   Epoch: 14   Global Step: 174760   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:52:57,152-Speed 3338.77 samples/sec   Loss 1.9735   LearningRate 0.0088   Epoch: 14   Global Step: 174770   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:53:00,249-Speed 3307.42 samples/sec   Loss 1.9354   LearningRate 0.0088   Epoch: 14   Global Step: 174780   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:53:03,384-Speed 3267.62 samples/sec   Loss 1.9120   LearningRate 0.0088   Epoch: 14   Global Step: 174790   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:53:06,506-Speed 3281.64 samples/sec   Loss 1.9409   LearningRate 0.0088   Epoch: 14   Global Step: 174800   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:53:09,632-Speed 3276.19 samples/sec   Loss 1.9255   LearningRate 0.0088   Epoch: 14   Global Step: 174810   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:53:12,716-Speed 3321.28 samples/sec   Loss 1.9304   LearningRate 0.0088   Epoch: 14   Global Step: 174820   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:53:15,832-Speed 3287.86 samples/sec   Loss 1.9823   LearningRate 0.0088   Epoch: 14   Global Step: 174830   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:53:18,967-Speed 3267.34 samples/sec   Loss 1.9485   LearningRate 0.0088   Epoch: 14   Global Step: 174840   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:53:22,037-Speed 3336.79 samples/sec   Loss 1.9869   LearningRate 0.0088   Epoch: 14   Global Step: 174850   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:53:25,181-Speed 3257.62 samples/sec   Loss 1.9263   LearningRate 0.0088   Epoch: 14   Global Step: 174860   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:53:28,364-Speed 3218.44 samples/sec   Loss 1.9789   LearningRate 0.0088   Epoch: 14   Global Step: 174870   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:53:31,468-Speed 3300.16 samples/sec   Loss 1.9897   LearningRate 0.0088   Epoch: 14   Global Step: 174880   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:53:34,666-Speed 3203.18 samples/sec   Loss 1.9232   LearningRate 0.0088   Epoch: 14   Global Step: 174890   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:53:37,830-Speed 3237.55 samples/sec   Loss 1.9362   LearningRate 0.0088   Epoch: 14   Global Step: 174900   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:53:40,975-Speed 3256.68 samples/sec   Loss 1.9879   LearningRate 0.0088   Epoch: 14   Global Step: 174910   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:53:44,101-Speed 3276.32 samples/sec   Loss 1.8929   LearningRate 0.0088   Epoch: 14   Global Step: 174920   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:53:47,157-Speed 3351.55 samples/sec   Loss 1.9513   LearningRate 0.0088   Epoch: 14   Global Step: 174930   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:53:50,263-Speed 3298.18 samples/sec   Loss 1.9272   LearningRate 0.0087   Epoch: 14   Global Step: 174940   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:53:53,523-Speed 3142.41 samples/sec   Loss 1.9261   LearningRate 0.0087   Epoch: 14   Global Step: 174950   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:53:56,656-Speed 3269.10 samples/sec   Loss 1.9366   LearningRate 0.0087   Epoch: 14   Global Step: 174960   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:53:59,799-Speed 3259.40 samples/sec   Loss 1.9159   LearningRate 0.0087   Epoch: 14   Global Step: 174970   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:54:02,973-Speed 3227.20 samples/sec   Loss 1.9461   LearningRate 0.0087   Epoch: 14   Global Step: 174980   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:54:06,068-Speed 3309.16 samples/sec   Loss 1.9608   LearningRate 0.0087   Epoch: 14   Global Step: 174990   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:54:09,184-Speed 3287.76 samples/sec   Loss 1.9447   LearningRate 0.0087   Epoch: 14   Global Step: 175000   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:54:12,295-Speed 3291.95 samples/sec   Loss 1.9684   LearningRate 0.0087   Epoch: 14   Global Step: 175010   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:54:15,436-Speed 3262.10 samples/sec   Loss 1.9048   LearningRate 0.0087   Epoch: 14   Global Step: 175020   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:54:18,547-Speed 3292.13 samples/sec   Loss 1.9826   LearningRate 0.0087   Epoch: 14   Global Step: 175030   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:54:21,600-Speed 3354.60 samples/sec   Loss 1.9716   LearningRate 0.0087   Epoch: 14   Global Step: 175040   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:54:24,673-Speed 3333.66 samples/sec   Loss 1.9094   LearningRate 0.0087   Epoch: 14   Global Step: 175050   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:54:27,854-Speed 3220.46 samples/sec   Loss 2.0004   LearningRate 0.0087   Epoch: 14   Global Step: 175060   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:54:30,978-Speed 3278.26 samples/sec   Loss 1.9013   LearningRate 0.0087   Epoch: 14   Global Step: 175070   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:54:34,042-Speed 3343.55 samples/sec   Loss 1.9670   LearningRate 0.0087   Epoch: 14   Global Step: 175080   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:54:37,168-Speed 3276.13 samples/sec   Loss 1.9090   LearningRate 0.0087   Epoch: 14   Global Step: 175090   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:54:40,280-Speed 3292.22 samples/sec   Loss 1.8987   LearningRate 0.0087   Epoch: 14   Global Step: 175100   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:54:43,437-Speed 3244.45 samples/sec   Loss 1.9638   LearningRate 0.0087   Epoch: 14   Global Step: 175110   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:54:46,523-Speed 3319.20 samples/sec   Loss 1.9516   LearningRate 0.0087   Epoch: 14   Global Step: 175120   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:54:49,599-Speed 3330.41 samples/sec   Loss 1.9467   LearningRate 0.0087   Epoch: 14   Global Step: 175130   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:54:52,672-Speed 3333.46 samples/sec   Loss 1.9255   LearningRate 0.0087   Epoch: 14   Global Step: 175140   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:54:55,822-Speed 3251.54 samples/sec   Loss 1.9372   LearningRate 0.0087   Epoch: 14   Global Step: 175150   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:54:58,898-Speed 3329.62 samples/sec   Loss 1.8909   LearningRate 0.0087   Epoch: 14   Global Step: 175160   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:55:01,990-Speed 3312.50 samples/sec   Loss 1.9876   LearningRate 0.0087   Epoch: 14   Global Step: 175170   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:55:05,159-Speed 3233.21 samples/sec   Loss 1.9187   LearningRate 0.0087   Epoch: 14   Global Step: 175180   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:55:08,234-Speed 3330.46 samples/sec   Loss 1.8802   LearningRate 0.0087   Epoch: 14   Global Step: 175190   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:55:11,333-Speed 3305.77 samples/sec   Loss 1.9815   LearningRate 0.0087   Epoch: 14   Global Step: 175200   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:55:14,444-Speed 3292.62 samples/sec   Loss 1.9360   LearningRate 0.0087   Epoch: 14   Global Step: 175210   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:55:17,556-Speed 3290.81 samples/sec   Loss 1.9741   LearningRate 0.0087   Epoch: 14   Global Step: 175220   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:55:20,655-Speed 3305.47 samples/sec   Loss 1.9742   LearningRate 0.0087   Epoch: 14   Global Step: 175230   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:55:23,764-Speed 3294.72 samples/sec   Loss 1.8973   LearningRate 0.0087   Epoch: 14   Global Step: 175240   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:55:26,860-Speed 3308.50 samples/sec   Loss 1.9826   LearningRate 0.0087   Epoch: 14   Global Step: 175250   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:55:30,004-Speed 3257.45 samples/sec   Loss 1.9138   LearningRate 0.0087   Epoch: 14   Global Step: 175260   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:55:33,164-Speed 3241.86 samples/sec   Loss 1.9105   LearningRate 0.0087   Epoch: 14   Global Step: 175270   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:55:36,238-Speed 3331.97 samples/sec   Loss 1.9696   LearningRate 0.0087   Epoch: 14   Global Step: 175280   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:55:39,351-Speed 3290.77 samples/sec   Loss 1.8995   LearningRate 0.0087   Epoch: 14   Global Step: 175290   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:55:42,512-Speed 3240.26 samples/sec   Loss 1.9636   LearningRate 0.0087   Epoch: 14   Global Step: 175300   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:55:45,603-Speed 3313.57 samples/sec   Loss 1.9617   LearningRate 0.0087   Epoch: 14   Global Step: 175310   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:55:48,717-Speed 3290.22 samples/sec   Loss 1.9020   LearningRate 0.0087   Epoch: 14   Global Step: 175320   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:55:51,850-Speed 3269.09 samples/sec   Loss 1.9718   LearningRate 0.0087   Epoch: 14   Global Step: 175330   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:55:54,989-Speed 3263.19 samples/sec   Loss 1.9301   LearningRate 0.0087   Epoch: 14   Global Step: 175340   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:55:58,069-Speed 3325.97 samples/sec   Loss 1.9419   LearningRate 0.0087   Epoch: 14   Global Step: 175350   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:56:01,169-Speed 3304.51 samples/sec   Loss 1.9362   LearningRate 0.0086   Epoch: 14   Global Step: 175360   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:56:04,259-Speed 3314.84 samples/sec   Loss 1.9913   LearningRate 0.0086   Epoch: 14   Global Step: 175370   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:56:07,485-Speed 3175.29 samples/sec   Loss 1.9704   LearningRate 0.0086   Epoch: 14   Global Step: 175380   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:56:10,561-Speed 3329.67 samples/sec   Loss 1.9251   LearningRate 0.0086   Epoch: 14   Global Step: 175390   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:56:13,662-Speed 3303.05 samples/sec   Loss 1.9334   LearningRate 0.0086   Epoch: 14   Global Step: 175400   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:56:16,792-Speed 3273.67 samples/sec   Loss 1.9593   LearningRate 0.0086   Epoch: 14   Global Step: 175410   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:56:19,875-Speed 3321.49 samples/sec   Loss 1.9613   LearningRate 0.0086   Epoch: 14   Global Step: 175420   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:56:22,947-Speed 3335.15 samples/sec   Loss 1.9613   LearningRate 0.0086   Epoch: 14   Global Step: 175430   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:56:26,071-Speed 3278.96 samples/sec   Loss 1.9569   LearningRate 0.0086   Epoch: 14   Global Step: 175440   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:56:29,205-Speed 3267.80 samples/sec   Loss 2.0932   LearningRate 0.0086   Epoch: 14   Global Step: 175450   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:56:32,329-Speed 3279.06 samples/sec   Loss 1.9509   LearningRate 0.0086   Epoch: 14   Global Step: 175460   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:56:35,453-Speed 3279.93 samples/sec   Loss 1.9582   LearningRate 0.0086   Epoch: 14   Global Step: 175470   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:56:38,581-Speed 3274.44 samples/sec   Loss 2.0060   LearningRate 0.0086   Epoch: 14   Global Step: 175480   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:56:41,695-Speed 3289.19 samples/sec   Loss 1.9990   LearningRate 0.0086   Epoch: 14   Global Step: 175490   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:56:44,823-Speed 3274.08 samples/sec   Loss 1.9384   LearningRate 0.0086   Epoch: 14   Global Step: 175500   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:56:47,898-Speed 3331.64 samples/sec   Loss 2.0131   LearningRate 0.0086   Epoch: 14   Global Step: 175510   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:56:51,027-Speed 3273.32 samples/sec   Loss 1.9511   LearningRate 0.0086   Epoch: 14   Global Step: 175520   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:56:54,183-Speed 3246.07 samples/sec   Loss 1.9557   LearningRate 0.0086   Epoch: 14   Global Step: 175530   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:56:57,247-Speed 3342.55 samples/sec   Loss 1.9571   LearningRate 0.0086   Epoch: 14   Global Step: 175540   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:57:00,305-Speed 3349.91 samples/sec   Loss 1.9845   LearningRate 0.0086   Epoch: 14   Global Step: 175550   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:57:03,450-Speed 3256.44 samples/sec   Loss 1.9831   LearningRate 0.0086   Epoch: 14   Global Step: 175560   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:57:06,659-Speed 3192.21 samples/sec   Loss 1.9990   LearningRate 0.0086   Epoch: 14   Global Step: 175570   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:57:09,731-Speed 3335.13 samples/sec   Loss 1.9694   LearningRate 0.0086   Epoch: 14   Global Step: 175580   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:57:12,857-Speed 3276.14 samples/sec   Loss 2.0200   LearningRate 0.0086   Epoch: 14   Global Step: 175590   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:57:15,995-Speed 3264.83 samples/sec   Loss 1.9906   LearningRate 0.0086   Epoch: 14   Global Step: 175600   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:57:19,110-Speed 3287.74 samples/sec   Loss 2.0321   LearningRate 0.0086   Epoch: 14   Global Step: 175610   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:57:22,168-Speed 3350.04 samples/sec   Loss 1.9754   LearningRate 0.0086   Epoch: 14   Global Step: 175620   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:57:25,372-Speed 3197.63 samples/sec   Loss 2.0562   LearningRate 0.0086   Epoch: 14   Global Step: 175630   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:57:28,471-Speed 3304.78 samples/sec   Loss 2.0001   LearningRate 0.0086   Epoch: 14   Global Step: 175640   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:57:31,627-Speed 3245.49 samples/sec   Loss 2.0488   LearningRate 0.0086   Epoch: 14   Global Step: 175650   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:57:34,685-Speed 3350.06 samples/sec   Loss 2.0139   LearningRate 0.0086   Epoch: 14   Global Step: 175660   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:57:37,740-Speed 3353.42 samples/sec   Loss 1.9566   LearningRate 0.0086   Epoch: 14   Global Step: 175670   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:57:40,805-Speed 3341.63 samples/sec   Loss 1.9891   LearningRate 0.0086   Epoch: 14   Global Step: 175680   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:57:43,908-Speed 3301.10 samples/sec   Loss 1.9785   LearningRate 0.0086   Epoch: 14   Global Step: 175690   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:57:47,011-Speed 3301.36 samples/sec   Loss 1.9986   LearningRate 0.0086   Epoch: 14   Global Step: 175700   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:57:50,114-Speed 3300.86 samples/sec   Loss 2.0256   LearningRate 0.0086   Epoch: 14   Global Step: 175710   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:57:53,298-Speed 3217.13 samples/sec   Loss 1.9707   LearningRate 0.0086   Epoch: 14   Global Step: 175720   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:57:56,432-Speed 3268.17 samples/sec   Loss 1.9935   LearningRate 0.0086   Epoch: 14   Global Step: 175730   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:57:59,586-Speed 3247.85 samples/sec   Loss 1.9700   LearningRate 0.0086   Epoch: 14   Global Step: 175740   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:58:02,693-Speed 3297.03 samples/sec   Loss 2.0070   LearningRate 0.0086   Epoch: 14   Global Step: 175750   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:58:05,803-Speed 3294.02 samples/sec   Loss 1.9906   LearningRate 0.0086   Epoch: 14   Global Step: 175760   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:58:08,896-Speed 3311.00 samples/sec   Loss 1.9767   LearningRate 0.0086   Epoch: 14   Global Step: 175770   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:58:11,975-Speed 3326.89 samples/sec   Loss 1.9913   LearningRate 0.0086   Epoch: 14   Global Step: 175780   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:58:15,117-Speed 3260.11 samples/sec   Loss 1.9756   LearningRate 0.0085   Epoch: 14   Global Step: 175790   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:58:18,244-Speed 3275.68 samples/sec   Loss 1.9747   LearningRate 0.0085   Epoch: 14   Global Step: 175800   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:58:21,310-Speed 3340.41 samples/sec   Loss 2.0458   LearningRate 0.0085   Epoch: 14   Global Step: 175810   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:58:24,373-Speed 3345.26 samples/sec   Loss 1.9951   LearningRate 0.0085   Epoch: 14   Global Step: 175820   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:58:27,615-Speed 3159.12 samples/sec   Loss 2.0442   LearningRate 0.0085   Epoch: 14   Global Step: 175830   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:58:30,825-Speed 3191.28 samples/sec   Loss 2.0872   LearningRate 0.0085   Epoch: 14   Global Step: 175840   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:58:33,885-Speed 3347.76 samples/sec   Loss 1.9815   LearningRate 0.0085   Epoch: 14   Global Step: 175850   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:58:37,011-Speed 3276.19 samples/sec   Loss 2.0104   LearningRate 0.0085   Epoch: 14   Global Step: 175860   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:58:40,064-Speed 3355.17 samples/sec   Loss 1.9848   LearningRate 0.0085   Epoch: 14   Global Step: 175870   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:58:43,226-Speed 3239.30 samples/sec   Loss 1.9989   LearningRate 0.0085   Epoch: 14   Global Step: 175880   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:58:46,267-Speed 3369.43 samples/sec   Loss 1.9774   LearningRate 0.0085   Epoch: 14   Global Step: 175890   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:58:49,363-Speed 3307.52 samples/sec   Loss 2.0176   LearningRate 0.0085   Epoch: 14   Global Step: 175900   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:58:52,520-Speed 3245.32 samples/sec   Loss 2.0706   LearningRate 0.0085   Epoch: 14   Global Step: 175910   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:58:55,648-Speed 3274.09 samples/sec   Loss 1.9709   LearningRate 0.0085   Epoch: 14   Global Step: 175920   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:58:58,785-Speed 3265.57 samples/sec   Loss 1.9745   LearningRate 0.0085   Epoch: 14   Global Step: 175930   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:59:01,886-Speed 3303.06 samples/sec   Loss 1.9954   LearningRate 0.0085   Epoch: 14   Global Step: 175940   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:59:05,006-Speed 3283.37 samples/sec   Loss 2.0816   LearningRate 0.0085   Epoch: 14   Global Step: 175950   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:59:08,119-Speed 3290.50 samples/sec   Loss 2.0054   LearningRate 0.0085   Epoch: 14   Global Step: 175960   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:59:11,201-Speed 3324.03 samples/sec   Loss 2.0203   LearningRate 0.0085   Epoch: 14   Global Step: 175970   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:59:14,279-Speed 3327.16 samples/sec   Loss 1.9647   LearningRate 0.0085   Epoch: 14   Global Step: 175980   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 16:59:17,390-Speed 3292.96 samples/sec   Loss 2.0710   LearningRate 0.0085   Epoch: 14   Global Step: 175990   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:59:20,500-Speed 3294.22 samples/sec   Loss 2.0418   LearningRate 0.0085   Epoch: 14   Global Step: 176000   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:59:23,622-Speed 3280.41 samples/sec   Loss 1.9947   LearningRate 0.0085   Epoch: 14   Global Step: 176010   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:59:26,718-Speed 3308.53 samples/sec   Loss 1.9958   LearningRate 0.0085   Epoch: 14   Global Step: 176020   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:59:29,808-Speed 3315.26 samples/sec   Loss 2.0033   LearningRate 0.0085   Epoch: 14   Global Step: 176030   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:59:32,869-Speed 3346.86 samples/sec   Loss 2.0111   LearningRate 0.0085   Epoch: 14   Global Step: 176040   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:59:35,949-Speed 3325.25 samples/sec   Loss 2.0260   LearningRate 0.0085   Epoch: 14   Global Step: 176050   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:59:39,029-Speed 3326.64 samples/sec   Loss 2.0497   LearningRate 0.0085   Epoch: 14   Global Step: 176060   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:59:42,102-Speed 3333.04 samples/sec   Loss 2.0670   LearningRate 0.0085   Epoch: 14   Global Step: 176070   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:59:45,177-Speed 3330.84 samples/sec   Loss 2.0914   LearningRate 0.0085   Epoch: 14   Global Step: 176080   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:59:48,361-Speed 3217.49 samples/sec   Loss 2.0193   LearningRate 0.0085   Epoch: 14   Global Step: 176090   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:59:51,440-Speed 3326.77 samples/sec   Loss 2.0244   LearningRate 0.0085   Epoch: 14   Global Step: 176100   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:59:54,504-Speed 3342.43 samples/sec   Loss 1.9950   LearningRate 0.0085   Epoch: 14   Global Step: 176110   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 16:59:57,580-Speed 3330.47 samples/sec   Loss 2.1180   LearningRate 0.0085   Epoch: 14   Global Step: 176120   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:00:00,651-Speed 3335.92 samples/sec   Loss 2.0375   LearningRate 0.0085   Epoch: 14   Global Step: 176130   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:00:03,790-Speed 3263.20 samples/sec   Loss 1.9959   LearningRate 0.0085   Epoch: 14   Global Step: 176140   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:00:06,900-Speed 3293.65 samples/sec   Loss 1.9648   LearningRate 0.0085   Epoch: 14   Global Step: 176150   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:00:09,965-Speed 3341.54 samples/sec   Loss 2.0112   LearningRate 0.0085   Epoch: 14   Global Step: 176160   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:00:13,083-Speed 3286.05 samples/sec   Loss 1.9719   LearningRate 0.0085   Epoch: 14   Global Step: 176170   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:00:16,212-Speed 3272.84 samples/sec   Loss 1.9491   LearningRate 0.0085   Epoch: 14   Global Step: 176180   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:00:19,308-Speed 3308.71 samples/sec   Loss 2.0343   LearningRate 0.0085   Epoch: 14   Global Step: 176190   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 17:00:22,379-Speed 3335.61 samples/sec   Loss 2.0665   LearningRate 0.0085   Epoch: 14   Global Step: 176200   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:00:25,605-Speed 3175.88 samples/sec   Loss 2.0262   LearningRate 0.0084   Epoch: 14   Global Step: 176210   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:00:28,688-Speed 3322.42 samples/sec   Loss 2.0161   LearningRate 0.0084   Epoch: 14   Global Step: 176220   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:00:31,779-Speed 3313.78 samples/sec   Loss 2.0555   LearningRate 0.0084   Epoch: 14   Global Step: 176230   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:00:34,856-Speed 3328.48 samples/sec   Loss 2.0790   LearningRate 0.0084   Epoch: 14   Global Step: 176240   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:00:37,910-Speed 3354.12 samples/sec   Loss 2.0487   LearningRate 0.0084   Epoch: 14   Global Step: 176250   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:00:41,010-Speed 3304.68 samples/sec   Loss 2.0099   LearningRate 0.0084   Epoch: 14   Global Step: 176260   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:00:44,098-Speed 3316.49 samples/sec   Loss 2.1103   LearningRate 0.0084   Epoch: 14   Global Step: 176270   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:00:47,207-Speed 3295.38 samples/sec   Loss 1.9975   LearningRate 0.0084   Epoch: 14   Global Step: 176280   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:00:50,281-Speed 3331.13 samples/sec   Loss 2.0011   LearningRate 0.0084   Epoch: 14   Global Step: 176290   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:00:53,393-Speed 3292.32 samples/sec   Loss 2.0231   LearningRate 0.0084   Epoch: 14   Global Step: 176300   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:00:56,481-Speed 3316.57 samples/sec   Loss 1.9980   LearningRate 0.0084   Epoch: 14   Global Step: 176310   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:00:59,611-Speed 3273.25 samples/sec   Loss 1.9778   LearningRate 0.0084   Epoch: 14   Global Step: 176320   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:01:02,782-Speed 3230.45 samples/sec   Loss 2.0278   LearningRate 0.0084   Epoch: 14   Global Step: 176330   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:01:05,948-Speed 3236.06 samples/sec   Loss 2.1039   LearningRate 0.0084   Epoch: 14   Global Step: 176340   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:01:09,011-Speed 3343.90 samples/sec   Loss 2.0227   LearningRate 0.0084   Epoch: 14   Global Step: 176350   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:01:12,084-Speed 3333.49 samples/sec   Loss 1.9971   LearningRate 0.0084   Epoch: 14   Global Step: 176360   Fp16 Grad Scale: 4096   Required: 6 hours
Training: 2022-04-27 17:01:15,233-Speed 3252.98 samples/sec   Loss 2.0257   LearningRate 0.0084   Epoch: 14   Global Step: 176370   Fp16 Grad Scale: 4096   Required: 6 hours
Training: 2022-04-27 17:01:18,369-Speed 3266.04 samples/sec   Loss 2.0669   LearningRate 0.0084   Epoch: 14   Global Step: 176380   Fp16 Grad Scale: 4096   Required: 6 hours
Training: 2022-04-27 17:01:21,426-Speed 3351.48 samples/sec   Loss 2.0893   LearningRate 0.0084   Epoch: 14   Global Step: 176390   Fp16 Grad Scale: 4096   Required: 6 hours
Training: 2022-04-27 17:01:24,543-Speed 3286.37 samples/sec   Loss 1.9884   LearningRate 0.0084   Epoch: 14   Global Step: 176400   Fp16 Grad Scale: 4096   Required: 6 hours
Training: 2022-04-27 17:01:27,717-Speed 3226.90 samples/sec   Loss 1.9970   LearningRate 0.0084   Epoch: 14   Global Step: 176410   Fp16 Grad Scale: 4096   Required: 6 hours
Training: 2022-04-27 17:01:30,832-Speed 3287.86 samples/sec   Loss 2.0146   LearningRate 0.0084   Epoch: 14   Global Step: 176420   Fp16 Grad Scale: 4096   Required: 6 hours
Training: 2022-04-27 17:01:33,900-Speed 3339.06 samples/sec   Loss 2.0302   LearningRate 0.0084   Epoch: 14   Global Step: 176430   Fp16 Grad Scale: 4096   Required: 6 hours
Training: 2022-04-27 17:01:36,970-Speed 3336.54 samples/sec   Loss 2.1134   LearningRate 0.0084   Epoch: 14   Global Step: 176440   Fp16 Grad Scale: 4096   Required: 6 hours
Training: 2022-04-27 17:01:40,080-Speed 3293.92 samples/sec   Loss 2.0540   LearningRate 0.0084   Epoch: 14   Global Step: 176450   Fp16 Grad Scale: 4096   Required: 6 hours
Training: 2022-04-27 17:01:43,253-Speed 3228.09 samples/sec   Loss 1.9980   LearningRate 0.0084   Epoch: 14   Global Step: 176460   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:01:46,341-Speed 3317.51 samples/sec   Loss 2.0097   LearningRate 0.0084   Epoch: 14   Global Step: 176470   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:01:49,412-Speed 3334.96 samples/sec   Loss 2.0208   LearningRate 0.0084   Epoch: 14   Global Step: 176480   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:01:52,470-Speed 3350.50 samples/sec   Loss 1.9916   LearningRate 0.0084   Epoch: 14   Global Step: 176490   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:01:55,600-Speed 3272.36 samples/sec   Loss 1.9895   LearningRate 0.0084   Epoch: 14   Global Step: 176500   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:01:58,684-Speed 3321.02 samples/sec   Loss 2.0851   LearningRate 0.0084   Epoch: 14   Global Step: 176510   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:02:01,818-Speed 3269.05 samples/sec   Loss 2.0279   LearningRate 0.0084   Epoch: 14   Global Step: 176520   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:02:04,943-Speed 3277.84 samples/sec   Loss 1.9207   LearningRate 0.0084   Epoch: 14   Global Step: 176530   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:02:08,070-Speed 3276.05 samples/sec   Loss 2.0489   LearningRate 0.0084   Epoch: 14   Global Step: 176540   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:02:11,161-Speed 3314.10 samples/sec   Loss 2.1190   LearningRate 0.0084   Epoch: 14   Global Step: 176550   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:02:14,297-Speed 3266.39 samples/sec   Loss 2.0784   LearningRate 0.0084   Epoch: 14   Global Step: 176560   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:02:17,481-Speed 3217.12 samples/sec   Loss 2.0274   LearningRate 0.0084   Epoch: 14   Global Step: 176570   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:02:20,570-Speed 3315.88 samples/sec   Loss 1.9857   LearningRate 0.0084   Epoch: 14   Global Step: 176580   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:02:23,743-Speed 3227.61 samples/sec   Loss 2.0089   LearningRate 0.0084   Epoch: 14   Global Step: 176590   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:02:26,926-Speed 3218.52 samples/sec   Loss 2.0082   LearningRate 0.0084   Epoch: 14   Global Step: 176600   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:02:30,034-Speed 3295.70 samples/sec   Loss 2.0520   LearningRate 0.0084   Epoch: 14   Global Step: 176610   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:02:33,118-Speed 3321.05 samples/sec   Loss 2.0100   LearningRate 0.0084   Epoch: 14   Global Step: 176620   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:02:36,288-Speed 3231.17 samples/sec   Loss 2.0230   LearningRate 0.0084   Epoch: 14   Global Step: 176630   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:02:39,354-Speed 3340.81 samples/sec   Loss 2.0341   LearningRate 0.0083   Epoch: 14   Global Step: 176640   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:02:42,560-Speed 3195.79 samples/sec   Loss 2.0501   LearningRate 0.0083   Epoch: 14   Global Step: 176650   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:02:45,647-Speed 3317.70 samples/sec   Loss 2.0236   LearningRate 0.0083   Epoch: 14   Global Step: 176660   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:02:48,792-Speed 3257.22 samples/sec   Loss 2.0381   LearningRate 0.0083   Epoch: 14   Global Step: 176670   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:02:51,990-Speed 3202.79 samples/sec   Loss 2.0469   LearningRate 0.0083   Epoch: 14   Global Step: 176680   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:02:55,110-Speed 3282.86 samples/sec   Loss 1.9762   LearningRate 0.0083   Epoch: 14   Global Step: 176690   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:02:58,170-Speed 3348.16 samples/sec   Loss 2.0794   LearningRate 0.0083   Epoch: 14   Global Step: 176700   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:03:01,283-Speed 3290.45 samples/sec   Loss 1.9928   LearningRate 0.0083   Epoch: 14   Global Step: 176710   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:03:04,411-Speed 3274.43 samples/sec   Loss 2.0361   LearningRate 0.0083   Epoch: 14   Global Step: 176720   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:03:07,499-Speed 3317.18 samples/sec   Loss 2.0344   LearningRate 0.0083   Epoch: 14   Global Step: 176730   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:03:10,573-Speed 3332.01 samples/sec   Loss 2.0719   LearningRate 0.0083   Epoch: 14   Global Step: 176740   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:03:13,668-Speed 3310.65 samples/sec   Loss 2.0135   LearningRate 0.0083   Epoch: 14   Global Step: 176750   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:03:16,704-Speed 3373.46 samples/sec   Loss 2.0525   LearningRate 0.0083   Epoch: 14   Global Step: 176760   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:03:19,778-Speed 3332.45 samples/sec   Loss 2.0211   LearningRate 0.0083   Epoch: 14   Global Step: 176770   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:03:22,837-Speed 3348.70 samples/sec   Loss 2.0146   LearningRate 0.0083   Epoch: 14   Global Step: 176780   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:03:25,944-Speed 3296.62 samples/sec   Loss 2.0408   LearningRate 0.0083   Epoch: 14   Global Step: 176790   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:03:29,037-Speed 3311.49 samples/sec   Loss 2.0170   LearningRate 0.0083   Epoch: 14   Global Step: 176800   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:03:32,147-Speed 3293.61 samples/sec   Loss 2.0039   LearningRate 0.0083   Epoch: 14   Global Step: 176810   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:03:35,205-Speed 3350.18 samples/sec   Loss 2.0540   LearningRate 0.0083   Epoch: 14   Global Step: 176820   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:03:38,283-Speed 3328.11 samples/sec   Loss 2.0012   LearningRate 0.0083   Epoch: 14   Global Step: 176830   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:03:41,388-Speed 3298.81 samples/sec   Loss 2.0583   LearningRate 0.0083   Epoch: 14   Global Step: 176840   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:03:44,499-Speed 3291.84 samples/sec   Loss 2.0631   LearningRate 0.0083   Epoch: 14   Global Step: 176850   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:03:47,641-Speed 3260.48 samples/sec   Loss 2.0566   LearningRate 0.0083   Epoch: 14   Global Step: 176860   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:03:50,758-Speed 3286.77 samples/sec   Loss 2.0662   LearningRate 0.0083   Epoch: 14   Global Step: 176870   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:03:53,892-Speed 3268.30 samples/sec   Loss 2.1008   LearningRate 0.0083   Epoch: 14   Global Step: 176880   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:03:56,941-Speed 3359.89 samples/sec   Loss 2.0405   LearningRate 0.0083   Epoch: 14   Global Step: 176890   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:04:00,070-Speed 3273.04 samples/sec   Loss 2.0820   LearningRate 0.0083   Epoch: 14   Global Step: 176900   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:04:03,230-Speed 3242.10 samples/sec   Loss 2.0538   LearningRate 0.0083   Epoch: 14   Global Step: 176910   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:04:06,429-Speed 3201.15 samples/sec   Loss 2.0882   LearningRate 0.0083   Epoch: 14   Global Step: 176920   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:04:09,495-Speed 3341.40 samples/sec   Loss 2.1207   LearningRate 0.0083   Epoch: 14   Global Step: 176930   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:04:12,610-Speed 3288.32 samples/sec   Loss 2.0433   LearningRate 0.0083   Epoch: 14   Global Step: 176940   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:04:15,759-Speed 3253.08 samples/sec   Loss 2.0497   LearningRate 0.0083   Epoch: 14   Global Step: 176950   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:04:18,863-Speed 3300.15 samples/sec   Loss 2.0750   LearningRate 0.0083   Epoch: 14   Global Step: 176960   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:04:21,919-Speed 3351.30 samples/sec   Loss 2.0864   LearningRate 0.0083   Epoch: 14   Global Step: 176970   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:04:25,034-Speed 3288.81 samples/sec   Loss 2.0838   LearningRate 0.0083   Epoch: 14   Global Step: 176980   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:04:28,216-Speed 3219.23 samples/sec   Loss 2.1055   LearningRate 0.0083   Epoch: 14   Global Step: 176990   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:04:31,311-Speed 3310.04 samples/sec   Loss 2.0809   LearningRate 0.0083   Epoch: 14   Global Step: 177000   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:04:34,389-Speed 3327.38 samples/sec   Loss 2.1034   LearningRate 0.0083   Epoch: 14   Global Step: 177010   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:04:37,469-Speed 3326.24 samples/sec   Loss 2.0499   LearningRate 0.0083   Epoch: 14   Global Step: 177020   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:04:40,561-Speed 3312.36 samples/sec   Loss 2.0392   LearningRate 0.0083   Epoch: 14   Global Step: 177030   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:04:43,639-Speed 3327.57 samples/sec   Loss 2.0953   LearningRate 0.0083   Epoch: 14   Global Step: 177040   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:04:46,729-Speed 3315.34 samples/sec   Loss 2.0691   LearningRate 0.0083   Epoch: 14   Global Step: 177050   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:04:49,873-Speed 3257.85 samples/sec   Loss 2.0799   LearningRate 0.0083   Epoch: 14   Global Step: 177060   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:04:52,952-Speed 3326.85 samples/sec   Loss 2.1002   LearningRate 0.0082   Epoch: 14   Global Step: 177070   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:04:56,012-Speed 3347.81 samples/sec   Loss 2.1186   LearningRate 0.0082   Epoch: 14   Global Step: 177080   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:04:59,078-Speed 3340.80 samples/sec   Loss 2.0490   LearningRate 0.0082   Epoch: 14   Global Step: 177090   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 17:05:02,213-Speed 3267.73 samples/sec   Loss 2.0198   LearningRate 0.0082   Epoch: 14   Global Step: 177100   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 17:05:05,280-Speed 3339.50 samples/sec   Loss 2.0979   LearningRate 0.0082   Epoch: 14   Global Step: 177110   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 17:05:08,415-Speed 3267.37 samples/sec   Loss 2.0741   LearningRate 0.0082   Epoch: 14   Global Step: 177120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 17:05:11,557-Speed 3260.45 samples/sec   Loss 2.0524   LearningRate 0.0082   Epoch: 14   Global Step: 177130   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 17:05:14,631-Speed 3332.16 samples/sec   Loss 2.0456   LearningRate 0.0082   Epoch: 14   Global Step: 177140   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:05:17,749-Speed 3285.40 samples/sec   Loss 2.0326   LearningRate 0.0082   Epoch: 14   Global Step: 177150   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:05:20,798-Speed 3359.19 samples/sec   Loss 2.1074   LearningRate 0.0082   Epoch: 14   Global Step: 177160   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:05:23,855-Speed 3350.75 samples/sec   Loss 2.0307   LearningRate 0.0082   Epoch: 14   Global Step: 177170   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:05:26,977-Speed 3281.07 samples/sec   Loss 2.1551   LearningRate 0.0082   Epoch: 14   Global Step: 177180   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:05:30,152-Speed 3226.64 samples/sec   Loss 2.0966   LearningRate 0.0082   Epoch: 14   Global Step: 177190   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:05:33,249-Speed 3307.50 samples/sec   Loss 2.0725   LearningRate 0.0082   Epoch: 14   Global Step: 177200   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:05:36,401-Speed 3250.01 samples/sec   Loss 2.0152   LearningRate 0.0082   Epoch: 14   Global Step: 177210   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:05:39,541-Speed 3262.00 samples/sec   Loss 2.0722   LearningRate 0.0082   Epoch: 14   Global Step: 177220   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:05:42,696-Speed 3247.40 samples/sec   Loss 2.0843   LearningRate 0.0082   Epoch: 14   Global Step: 177230   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:05:45,776-Speed 3324.83 samples/sec   Loss 2.0143   LearningRate 0.0082   Epoch: 14   Global Step: 177240   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:05:48,860-Speed 3321.46 samples/sec   Loss 2.0152   LearningRate 0.0082   Epoch: 14   Global Step: 177250   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:05:51,900-Speed 3369.74 samples/sec   Loss 2.0819   LearningRate 0.0082   Epoch: 14   Global Step: 177260   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:05:54,988-Speed 3317.26 samples/sec   Loss 2.0225   LearningRate 0.0082   Epoch: 14   Global Step: 177270   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:05:58,071-Speed 3322.33 samples/sec   Loss 2.0537   LearningRate 0.0082   Epoch: 14   Global Step: 177280   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:06:01,246-Speed 3226.20 samples/sec   Loss 2.0345   LearningRate 0.0082   Epoch: 14   Global Step: 177290   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:06:04,393-Speed 3255.49 samples/sec   Loss 2.0304   LearningRate 0.0082   Epoch: 14   Global Step: 177300   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:06:07,476-Speed 3322.34 samples/sec   Loss 2.0427   LearningRate 0.0082   Epoch: 14   Global Step: 177310   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:06:10,572-Speed 3309.06 samples/sec   Loss 2.1140   LearningRate 0.0082   Epoch: 14   Global Step: 177320   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:06:13,657-Speed 3319.76 samples/sec   Loss 2.0460   LearningRate 0.0082   Epoch: 14   Global Step: 177330   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:06:16,715-Speed 3349.62 samples/sec   Loss 2.0985   LearningRate 0.0082   Epoch: 14   Global Step: 177340   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:06:19,769-Speed 3353.77 samples/sec   Loss 2.0896   LearningRate 0.0082   Epoch: 14   Global Step: 177350   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:06:22,969-Speed 3200.87 samples/sec   Loss 2.0490   LearningRate 0.0082   Epoch: 14   Global Step: 177360   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:06:26,101-Speed 3271.53 samples/sec   Loss 2.1067   LearningRate 0.0082   Epoch: 14   Global Step: 177370   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:06:29,243-Speed 3259.45 samples/sec   Loss 2.1287   LearningRate 0.0082   Epoch: 14   Global Step: 177380   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:06:32,349-Speed 3298.46 samples/sec   Loss 2.0946   LearningRate 0.0082   Epoch: 14   Global Step: 177390   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:06:35,408-Speed 3348.54 samples/sec   Loss 2.0409   LearningRate 0.0082   Epoch: 14   Global Step: 177400   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:06:38,503-Speed 3309.64 samples/sec   Loss 2.0491   LearningRate 0.0082   Epoch: 14   Global Step: 177410   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:06:41,613-Speed 3293.15 samples/sec   Loss 2.0563   LearningRate 0.0082   Epoch: 14   Global Step: 177420   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:06:44,684-Speed 3335.14 samples/sec   Loss 2.0694   LearningRate 0.0082   Epoch: 14   Global Step: 177430   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:06:47,796-Speed 3291.94 samples/sec   Loss 2.0941   LearningRate 0.0082   Epoch: 14   Global Step: 177440   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:06:50,849-Speed 3355.82 samples/sec   Loss 2.1036   LearningRate 0.0082   Epoch: 14   Global Step: 177450   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:06:54,026-Speed 3223.67 samples/sec   Loss 2.0259   LearningRate 0.0082   Epoch: 14   Global Step: 177460   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:06:57,100-Speed 3332.54 samples/sec   Loss 2.1023   LearningRate 0.0082   Epoch: 14   Global Step: 177470   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:07:00,210-Speed 3293.56 samples/sec   Loss 2.1124   LearningRate 0.0082   Epoch: 14   Global Step: 177480   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:07:03,340-Speed 3272.34 samples/sec   Loss 2.1621   LearningRate 0.0082   Epoch: 14   Global Step: 177490   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:07:06,454-Speed 3289.60 samples/sec   Loss 2.0846   LearningRate 0.0082   Epoch: 14   Global Step: 177500   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:07:09,501-Speed 3361.37 samples/sec   Loss 2.0260   LearningRate 0.0081   Epoch: 14   Global Step: 177510   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:07:12,633-Speed 3271.36 samples/sec   Loss 2.0560   LearningRate 0.0081   Epoch: 14   Global Step: 177520   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:07:15,712-Speed 3326.62 samples/sec   Loss 2.0932   LearningRate 0.0081   Epoch: 14   Global Step: 177530   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:07:18,828-Speed 3287.42 samples/sec   Loss 2.1179   LearningRate 0.0081   Epoch: 14   Global Step: 177540   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:07:21,919-Speed 3313.70 samples/sec   Loss 2.1067   LearningRate 0.0081   Epoch: 14   Global Step: 177550   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:07:25,026-Speed 3297.22 samples/sec   Loss 2.0563   LearningRate 0.0081   Epoch: 14   Global Step: 177560   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:07:28,127-Speed 3302.08 samples/sec   Loss 2.1283   LearningRate 0.0081   Epoch: 14   Global Step: 177570   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:07:31,195-Speed 3338.78 samples/sec   Loss 2.0933   LearningRate 0.0081   Epoch: 14   Global Step: 177580   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:07:34,240-Speed 3364.15 samples/sec   Loss 2.0679   LearningRate 0.0081   Epoch: 14   Global Step: 177590   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:07:37,329-Speed 3316.24 samples/sec   Loss 2.0475   LearningRate 0.0081   Epoch: 14   Global Step: 177600   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:07:40,388-Speed 3349.19 samples/sec   Loss 2.0841   LearningRate 0.0081   Epoch: 14   Global Step: 177610   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:07:43,454-Speed 3340.62 samples/sec   Loss 2.0981   LearningRate 0.0081   Epoch: 14   Global Step: 177620   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:07:46,544-Speed 3315.56 samples/sec   Loss 2.0310   LearningRate 0.0081   Epoch: 14   Global Step: 177630   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:07:49,679-Speed 3266.85 samples/sec   Loss 2.1115   LearningRate 0.0081   Epoch: 14   Global Step: 177640   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:07:52,888-Speed 3192.56 samples/sec   Loss 2.1024   LearningRate 0.0081   Epoch: 14   Global Step: 177650   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:07:56,008-Speed 3282.76 samples/sec   Loss 2.0494   LearningRate 0.0081   Epoch: 14   Global Step: 177660   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:07:59,066-Speed 3348.93 samples/sec   Loss 2.1082   LearningRate 0.0081   Epoch: 14   Global Step: 177670   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:08:02,136-Speed 3337.02 samples/sec   Loss 2.0539   LearningRate 0.0081   Epoch: 14   Global Step: 177680   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:08:05,237-Speed 3303.10 samples/sec   Loss 2.0923   LearningRate 0.0081   Epoch: 14   Global Step: 177690   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:08:08,305-Speed 3339.11 samples/sec   Loss 2.0079   LearningRate 0.0081   Epoch: 14   Global Step: 177700   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:08:11,371-Speed 3340.28 samples/sec   Loss 2.0987   LearningRate 0.0081   Epoch: 14   Global Step: 177710   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:08:14,455-Speed 3321.66 samples/sec   Loss 2.0595   LearningRate 0.0081   Epoch: 14   Global Step: 177720   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:08:17,589-Speed 3269.09 samples/sec   Loss 2.1285   LearningRate 0.0081   Epoch: 14   Global Step: 177730   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:08:20,664-Speed 3330.74 samples/sec   Loss 2.0753   LearningRate 0.0081   Epoch: 14   Global Step: 177740   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:08:23,798-Speed 3268.50 samples/sec   Loss 2.1241   LearningRate 0.0081   Epoch: 14   Global Step: 177750   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:08:26,947-Speed 3252.71 samples/sec   Loss 2.0898   LearningRate 0.0081   Epoch: 14   Global Step: 177760   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:08:30,080-Speed 3269.49 samples/sec   Loss 2.0941   LearningRate 0.0081   Epoch: 14   Global Step: 177770   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:08:33,197-Speed 3286.20 samples/sec   Loss 2.0387   LearningRate 0.0081   Epoch: 14   Global Step: 177780   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:08:36,286-Speed 3315.68 samples/sec   Loss 2.0826   LearningRate 0.0081   Epoch: 14   Global Step: 177790   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:08:39,428-Speed 3261.12 samples/sec   Loss 2.0853   LearningRate 0.0081   Epoch: 14   Global Step: 177800   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:08:42,652-Speed 3177.00 samples/sec   Loss 2.0609   LearningRate 0.0081   Epoch: 14   Global Step: 177810   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:08:45,684-Speed 3378.18 samples/sec   Loss 2.0736   LearningRate 0.0081   Epoch: 14   Global Step: 177820   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:08:48,755-Speed 3335.86 samples/sec   Loss 2.1209   LearningRate 0.0081   Epoch: 14   Global Step: 177830   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:08:51,878-Speed 3279.55 samples/sec   Loss 2.0769   LearningRate 0.0081   Epoch: 14   Global Step: 177840   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:08:54,979-Speed 3303.24 samples/sec   Loss 2.1143   LearningRate 0.0081   Epoch: 14   Global Step: 177850   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:08:58,036-Speed 3351.07 samples/sec   Loss 2.1299   LearningRate 0.0081   Epoch: 14   Global Step: 177860   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:09:01,105-Speed 3336.68 samples/sec   Loss 2.1125   LearningRate 0.0081   Epoch: 14   Global Step: 177870   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:09:04,171-Speed 3341.25 samples/sec   Loss 2.1380   LearningRate 0.0081   Epoch: 14   Global Step: 177880   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:09:07,254-Speed 3323.23 samples/sec   Loss 2.1380   LearningRate 0.0081   Epoch: 14   Global Step: 177890   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:09:10,404-Speed 3251.61 samples/sec   Loss 2.1102   LearningRate 0.0081   Epoch: 14   Global Step: 177900   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:09:13,520-Speed 3286.70 samples/sec   Loss 2.1392   LearningRate 0.0081   Epoch: 14   Global Step: 177910   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:09:16,663-Speed 3258.94 samples/sec   Loss 2.0449   LearningRate 0.0081   Epoch: 14   Global Step: 177920   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:09:19,798-Speed 3267.52 samples/sec   Loss 2.1185   LearningRate 0.0081   Epoch: 14   Global Step: 177930   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:09:22,853-Speed 3352.86 samples/sec   Loss 2.1155   LearningRate 0.0080   Epoch: 14   Global Step: 177940   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:09:25,940-Speed 3318.49 samples/sec   Loss 2.1450   LearningRate 0.0080   Epoch: 14   Global Step: 177950   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:09:29,057-Speed 3286.10 samples/sec   Loss 2.0994   LearningRate 0.0080   Epoch: 14   Global Step: 177960   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:09:32,148-Speed 3314.52 samples/sec   Loss 2.1071   LearningRate 0.0080   Epoch: 14   Global Step: 177970   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:09:35,254-Speed 3297.09 samples/sec   Loss 2.0776   LearningRate 0.0080   Epoch: 14   Global Step: 177980   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:09:38,381-Speed 3276.12 samples/sec   Loss 2.0407   LearningRate 0.0080   Epoch: 14   Global Step: 177990   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:09:41,471-Speed 3315.41 samples/sec   Loss 2.0056   LearningRate 0.0080   Epoch: 14   Global Step: 178000   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:09:44,527-Speed 3352.13 samples/sec   Loss 2.1075   LearningRate 0.0080   Epoch: 14   Global Step: 178010   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:09:47,654-Speed 3274.99 samples/sec   Loss 2.1005   LearningRate 0.0080   Epoch: 14   Global Step: 178020   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:09:50,777-Speed 3280.06 samples/sec   Loss 2.1439   LearningRate 0.0080   Epoch: 14   Global Step: 178030   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:09:53,929-Speed 3250.03 samples/sec   Loss 2.0961   LearningRate 0.0080   Epoch: 14   Global Step: 178040   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:09:57,020-Speed 3314.53 samples/sec   Loss 2.0688   LearningRate 0.0080   Epoch: 14   Global Step: 178050   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:10:00,150-Speed 3271.59 samples/sec   Loss 2.1025   LearningRate 0.0080   Epoch: 14   Global Step: 178060   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:10:03,251-Speed 3303.80 samples/sec   Loss 2.1217   LearningRate 0.0080   Epoch: 14   Global Step: 178070   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:10:06,356-Speed 3299.01 samples/sec   Loss 2.0708   LearningRate 0.0080   Epoch: 14   Global Step: 178080   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:10:09,457-Speed 3303.37 samples/sec   Loss 2.0784   LearningRate 0.0080   Epoch: 14   Global Step: 178090   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:10:12,555-Speed 3306.44 samples/sec   Loss 2.0240   LearningRate 0.0080   Epoch: 14   Global Step: 178100   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:10:15,681-Speed 3276.45 samples/sec   Loss 2.0893   LearningRate 0.0080   Epoch: 14   Global Step: 178110   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:10:18,769-Speed 3317.24 samples/sec   Loss 2.0823   LearningRate 0.0080   Epoch: 14   Global Step: 178120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 17:10:21,838-Speed 3337.82 samples/sec   Loss 2.0417   LearningRate 0.0080   Epoch: 14   Global Step: 178130   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:10:24,920-Speed 3323.23 samples/sec   Loss 2.1077   LearningRate 0.0080   Epoch: 14   Global Step: 178140   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:10:28,020-Speed 3304.23 samples/sec   Loss 2.0889   LearningRate 0.0080   Epoch: 14   Global Step: 178150   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:10:31,179-Speed 3243.02 samples/sec   Loss 2.1223   LearningRate 0.0080   Epoch: 14   Global Step: 178160   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:10:34,258-Speed 3326.19 samples/sec   Loss 2.0364   LearningRate 0.0080   Epoch: 14   Global Step: 178170   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:10:37,378-Speed 3283.38 samples/sec   Loss 2.1269   LearningRate 0.0080   Epoch: 14   Global Step: 178180   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:10:40,469-Speed 3313.45 samples/sec   Loss 2.0777   LearningRate 0.0080   Epoch: 14   Global Step: 178190   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:10:43,607-Speed 3264.78 samples/sec   Loss 2.0968   LearningRate 0.0080   Epoch: 14   Global Step: 178200   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:10:46,714-Speed 3296.94 samples/sec   Loss 2.0822   LearningRate 0.0080   Epoch: 14   Global Step: 178210   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:10:49,855-Speed 3261.76 samples/sec   Loss 2.0379   LearningRate 0.0080   Epoch: 14   Global Step: 178220   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:10:53,008-Speed 3248.48 samples/sec   Loss 2.1768   LearningRate 0.0080   Epoch: 14   Global Step: 178230   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:10:56,072-Speed 3342.37 samples/sec   Loss 2.1163   LearningRate 0.0080   Epoch: 14   Global Step: 178240   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:10:59,185-Speed 3290.32 samples/sec   Loss 2.1267   LearningRate 0.0080   Epoch: 14   Global Step: 178250   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:11:02,276-Speed 3314.88 samples/sec   Loss 2.0659   LearningRate 0.0080   Epoch: 14   Global Step: 178260   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:11:05,389-Speed 3290.24 samples/sec   Loss 2.1069   LearningRate 0.0080   Epoch: 14   Global Step: 178270   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:11:08,511-Speed 3280.62 samples/sec   Loss 2.1034   LearningRate 0.0080   Epoch: 14   Global Step: 178280   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:11:11,632-Speed 3282.46 samples/sec   Loss 2.1141   LearningRate 0.0080   Epoch: 14   Global Step: 178290   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:11:14,757-Speed 3277.70 samples/sec   Loss 2.0659   LearningRate 0.0080   Epoch: 14   Global Step: 178300   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:11:17,847-Speed 3314.30 samples/sec   Loss 2.1022   LearningRate 0.0080   Epoch: 14   Global Step: 178310   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:11:20,964-Speed 3286.53 samples/sec   Loss 2.0969   LearningRate 0.0080   Epoch: 14   Global Step: 178320   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:11:24,099-Speed 3267.56 samples/sec   Loss 2.1324   LearningRate 0.0080   Epoch: 14   Global Step: 178330   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:11:27,297-Speed 3202.55 samples/sec   Loss 2.0840   LearningRate 0.0080   Epoch: 14   Global Step: 178340   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:11:30,363-Speed 3340.87 samples/sec   Loss 2.0905   LearningRate 0.0080   Epoch: 14   Global Step: 178350   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:11:33,415-Speed 3356.72 samples/sec   Loss 2.1388   LearningRate 0.0080   Epoch: 14   Global Step: 178360   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:11:36,595-Speed 3221.23 samples/sec   Loss 2.1029   LearningRate 0.0080   Epoch: 14   Global Step: 178370   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:11:39,730-Speed 3266.88 samples/sec   Loss 2.1182   LearningRate 0.0079   Epoch: 14   Global Step: 178380   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:11:42,852-Speed 3281.30 samples/sec   Loss 2.1344   LearningRate 0.0079   Epoch: 14   Global Step: 178390   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:11:45,925-Speed 3332.78 samples/sec   Loss 2.1438   LearningRate 0.0079   Epoch: 14   Global Step: 178400   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:11:49,071-Speed 3256.57 samples/sec   Loss 2.0786   LearningRate 0.0079   Epoch: 14   Global Step: 178410   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:11:52,168-Speed 3308.01 samples/sec   Loss 2.1712   LearningRate 0.0079   Epoch: 14   Global Step: 178420   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:11:55,259-Speed 3313.66 samples/sec   Loss 2.1135   LearningRate 0.0079   Epoch: 14   Global Step: 178430   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:11:58,338-Speed 3326.83 samples/sec   Loss 2.1888   LearningRate 0.0079   Epoch: 14   Global Step: 178440   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:12:01,433-Speed 3309.14 samples/sec   Loss 2.1346   LearningRate 0.0079   Epoch: 14   Global Step: 178450   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:12:04,537-Speed 3300.27 samples/sec   Loss 2.1493   LearningRate 0.0079   Epoch: 14   Global Step: 178460   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:12:07,670-Speed 3269.74 samples/sec   Loss 2.1129   LearningRate 0.0079   Epoch: 14   Global Step: 178470   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:12:10,745-Speed 3330.16 samples/sec   Loss 2.1818   LearningRate 0.0079   Epoch: 14   Global Step: 178480   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:12:13,882-Speed 3266.23 samples/sec   Loss 2.1062   LearningRate 0.0079   Epoch: 14   Global Step: 178490   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:12:17,018-Speed 3266.61 samples/sec   Loss 2.1124   LearningRate 0.0079   Epoch: 14   Global Step: 178500   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:12:20,081-Speed 3343.27 samples/sec   Loss 2.0867   LearningRate 0.0079   Epoch: 14   Global Step: 178510   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:12:23,165-Speed 3321.23 samples/sec   Loss 2.0926   LearningRate 0.0079   Epoch: 14   Global Step: 178520   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:12:26,291-Speed 3277.60 samples/sec   Loss 2.1496   LearningRate 0.0079   Epoch: 14   Global Step: 178530   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:12:29,368-Speed 3328.71 samples/sec   Loss 2.1698   LearningRate 0.0079   Epoch: 14   Global Step: 178540   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:12:32,446-Speed 3327.93 samples/sec   Loss 2.1050   LearningRate 0.0079   Epoch: 14   Global Step: 178550   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:12:35,543-Speed 3306.99 samples/sec   Loss 2.0798   LearningRate 0.0079   Epoch: 14   Global Step: 178560   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:12:38,653-Speed 3294.09 samples/sec   Loss 2.1369   LearningRate 0.0079   Epoch: 14   Global Step: 178570   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:12:41,780-Speed 3275.07 samples/sec   Loss 2.0945   LearningRate 0.0079   Epoch: 14   Global Step: 178580   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:12:44,911-Speed 3272.38 samples/sec   Loss 2.1236   LearningRate 0.0079   Epoch: 14   Global Step: 178590   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:12:48,089-Speed 3223.19 samples/sec   Loss 2.1243   LearningRate 0.0079   Epoch: 14   Global Step: 178600   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:12:51,269-Speed 3220.20 samples/sec   Loss 2.0804   LearningRate 0.0079   Epoch: 14   Global Step: 178610   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:12:54,434-Speed 3237.08 samples/sec   Loss 2.1164   LearningRate 0.0079   Epoch: 14   Global Step: 178620   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:12:57,558-Speed 3278.36 samples/sec   Loss 2.1162   LearningRate 0.0079   Epoch: 14   Global Step: 178630   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:13:00,685-Speed 3275.54 samples/sec   Loss 2.1754   LearningRate 0.0079   Epoch: 14   Global Step: 178640   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:13:03,839-Speed 3247.82 samples/sec   Loss 2.1005   LearningRate 0.0079   Epoch: 14   Global Step: 178650   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:13:06,945-Speed 3297.92 samples/sec   Loss 2.1418   LearningRate 0.0079   Epoch: 14   Global Step: 178660   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:13:10,032-Speed 3318.37 samples/sec   Loss 2.0467   LearningRate 0.0079   Epoch: 14   Global Step: 178670   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:13:13,123-Speed 3313.62 samples/sec   Loss 2.1208   LearningRate 0.0079   Epoch: 14   Global Step: 178680   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:13:16,193-Speed 3336.58 samples/sec   Loss 2.1430   LearningRate 0.0079   Epoch: 14   Global Step: 178690   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:13:19,280-Speed 3319.01 samples/sec   Loss 2.1156   LearningRate 0.0079   Epoch: 14   Global Step: 178700   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:13:22,390-Speed 3292.47 samples/sec   Loss 2.1326   LearningRate 0.0079   Epoch: 14   Global Step: 178710   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:13:25,538-Speed 3254.09 samples/sec   Loss 2.0950   LearningRate 0.0079   Epoch: 14   Global Step: 178720   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:13:28,676-Speed 3265.05 samples/sec   Loss 2.1784   LearningRate 0.0079   Epoch: 14   Global Step: 178730   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:13:31,823-Speed 3254.67 samples/sec   Loss 2.0987   LearningRate 0.0079   Epoch: 14   Global Step: 178740   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:13:34,978-Speed 3246.11 samples/sec   Loss 2.1120   LearningRate 0.0079   Epoch: 14   Global Step: 178750   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:13:38,155-Speed 3224.07 samples/sec   Loss 2.1434   LearningRate 0.0079   Epoch: 14   Global Step: 178760   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:13:41,248-Speed 3311.96 samples/sec   Loss 2.1858   LearningRate 0.0079   Epoch: 14   Global Step: 178770   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:13:44,342-Speed 3310.91 samples/sec   Loss 2.1189   LearningRate 0.0079   Epoch: 14   Global Step: 178780   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:13:47,486-Speed 3256.96 samples/sec   Loss 2.1108   LearningRate 0.0079   Epoch: 14   Global Step: 178790   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:13:50,564-Speed 3328.34 samples/sec   Loss 2.1454   LearningRate 0.0079   Epoch: 14   Global Step: 178800   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:13:53,671-Speed 3296.79 samples/sec   Loss 2.0650   LearningRate 0.0079   Epoch: 14   Global Step: 178810   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:13:56,766-Speed 3309.35 samples/sec   Loss 2.1341   LearningRate 0.0078   Epoch: 14   Global Step: 178820   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:13:59,884-Speed 3285.45 samples/sec   Loss 2.1344   LearningRate 0.0078   Epoch: 14   Global Step: 178830   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:14:03,041-Speed 3243.78 samples/sec   Loss 2.0217   LearningRate 0.0078   Epoch: 14   Global Step: 178840   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:14:06,119-Speed 3328.11 samples/sec   Loss 2.1221   LearningRate 0.0078   Epoch: 14   Global Step: 178850   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:14:09,189-Speed 3337.23 samples/sec   Loss 2.2143   LearningRate 0.0078   Epoch: 14   Global Step: 178860   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:14:12,360-Speed 3230.29 samples/sec   Loss 2.0729   LearningRate 0.0078   Epoch: 14   Global Step: 178870   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:14:15,462-Speed 3301.05 samples/sec   Loss 2.1544   LearningRate 0.0078   Epoch: 14   Global Step: 178880   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:14:18,557-Speed 3311.60 samples/sec   Loss 2.0783   LearningRate 0.0078   Epoch: 14   Global Step: 178890   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:14:21,639-Speed 3323.19 samples/sec   Loss 2.1310   LearningRate 0.0078   Epoch: 14   Global Step: 178900   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:14:24,711-Speed 3334.53 samples/sec   Loss 2.1083   LearningRate 0.0078   Epoch: 14   Global Step: 178910   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:14:27,823-Speed 3292.41 samples/sec   Loss 2.1339   LearningRate 0.0078   Epoch: 14   Global Step: 178920   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:14:30,929-Speed 3297.24 samples/sec   Loss 2.1432   LearningRate 0.0078   Epoch: 14   Global Step: 178930   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:14:34,005-Speed 3330.20 samples/sec   Loss 2.1146   LearningRate 0.0078   Epoch: 14   Global Step: 178940   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:14:37,097-Speed 3312.58 samples/sec   Loss 2.1109   LearningRate 0.0078   Epoch: 14   Global Step: 178950   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:14:40,186-Speed 3315.78 samples/sec   Loss 2.0784   LearningRate 0.0078   Epoch: 14   Global Step: 178960   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:14:43,296-Speed 3293.84 samples/sec   Loss 2.1418   LearningRate 0.0078   Epoch: 14   Global Step: 178970   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:14:46,404-Speed 3295.82 samples/sec   Loss 2.1044   LearningRate 0.0078   Epoch: 14   Global Step: 178980   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:14:49,539-Speed 3267.41 samples/sec   Loss 2.0819   LearningRate 0.0078   Epoch: 14   Global Step: 178990   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:14:52,682-Speed 3258.77 samples/sec   Loss 2.1460   LearningRate 0.0078   Epoch: 14   Global Step: 179000   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:14:55,767-Speed 3320.82 samples/sec   Loss 2.1600   LearningRate 0.0078   Epoch: 14   Global Step: 179010   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:14:58,872-Speed 3298.12 samples/sec   Loss 2.1597   LearningRate 0.0078   Epoch: 14   Global Step: 179020   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:15:02,007-Speed 3268.08 samples/sec   Loss 2.1169   LearningRate 0.0078   Epoch: 14   Global Step: 179030   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:15:05,098-Speed 3313.99 samples/sec   Loss 2.0875   LearningRate 0.0078   Epoch: 14   Global Step: 179040   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:15:08,177-Speed 3326.99 samples/sec   Loss 2.0713   LearningRate 0.0078   Epoch: 14   Global Step: 179050   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:15:11,278-Speed 3302.85 samples/sec   Loss 2.1566   LearningRate 0.0078   Epoch: 14   Global Step: 179060   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:15:14,356-Speed 3328.24 samples/sec   Loss 2.1379   LearningRate 0.0078   Epoch: 14   Global Step: 179070   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:15:17,555-Speed 3201.77 samples/sec   Loss 2.0780   LearningRate 0.0078   Epoch: 14   Global Step: 179080   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:15:20,622-Speed 3339.51 samples/sec   Loss 2.0756   LearningRate 0.0078   Epoch: 14   Global Step: 179090   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:15:23,725-Speed 3301.13 samples/sec   Loss 2.1204   LearningRate 0.0078   Epoch: 14   Global Step: 179100   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:15:26,824-Speed 3306.15 samples/sec   Loss 2.1656   LearningRate 0.0078   Epoch: 14   Global Step: 179110   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:15:29,896-Speed 3333.75 samples/sec   Loss 2.1154   LearningRate 0.0078   Epoch: 14   Global Step: 179120   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:15:32,982-Speed 3318.80 samples/sec   Loss 2.1401   LearningRate 0.0078   Epoch: 14   Global Step: 179130   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:15:36,074-Speed 3313.68 samples/sec   Loss 2.1171   LearningRate 0.0078   Epoch: 14   Global Step: 179140   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:15:39,156-Speed 3322.59 samples/sec   Loss 2.1735   LearningRate 0.0078   Epoch: 14   Global Step: 179150   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:15:42,247-Speed 3314.23 samples/sec   Loss 2.0890   LearningRate 0.0078   Epoch: 14   Global Step: 179160   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:15:45,315-Speed 3338.90 samples/sec   Loss 2.1455   LearningRate 0.0078   Epoch: 14   Global Step: 179170   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:15:48,477-Speed 3239.28 samples/sec   Loss 2.1216   LearningRate 0.0078   Epoch: 14   Global Step: 179180   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:15:51,596-Speed 3284.02 samples/sec   Loss 2.1546   LearningRate 0.0078   Epoch: 14   Global Step: 179190   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:15:54,717-Speed 3282.28 samples/sec   Loss 2.1107   LearningRate 0.0078   Epoch: 14   Global Step: 179200   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:15:57,812-Speed 3310.25 samples/sec   Loss 2.0864   LearningRate 0.0078   Epoch: 14   Global Step: 179210   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:16:00,897-Speed 3320.52 samples/sec   Loss 2.1796   LearningRate 0.0078   Epoch: 14   Global Step: 179220   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:16:04,072-Speed 3226.43 samples/sec   Loss 2.1883   LearningRate 0.0078   Epoch: 14   Global Step: 179230   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:16:07,195-Speed 3279.23 samples/sec   Loss 2.1734   LearningRate 0.0078   Epoch: 14   Global Step: 179240   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:16:10,279-Speed 3321.19 samples/sec   Loss 2.1718   LearningRate 0.0078   Epoch: 14   Global Step: 179250   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:16:13,360-Speed 3325.37 samples/sec   Loss 2.1144   LearningRate 0.0078   Epoch: 14   Global Step: 179260   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:16:16,444-Speed 3321.21 samples/sec   Loss 2.0973   LearningRate 0.0077   Epoch: 14   Global Step: 179270   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:16:19,575-Speed 3271.79 samples/sec   Loss 2.1069   LearningRate 0.0077   Epoch: 14   Global Step: 179280   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:16:22,670-Speed 3309.89 samples/sec   Loss 2.1301   LearningRate 0.0077   Epoch: 14   Global Step: 179290   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:16:25,801-Speed 3270.87 samples/sec   Loss 2.1181   LearningRate 0.0077   Epoch: 14   Global Step: 179300   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:16:28,897-Speed 3308.70 samples/sec   Loss 2.0762   LearningRate 0.0077   Epoch: 14   Global Step: 179310   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:16:32,001-Speed 3300.68 samples/sec   Loss 2.0662   LearningRate 0.0077   Epoch: 14   Global Step: 179320   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:16:35,130-Speed 3272.73 samples/sec   Loss 2.1172   LearningRate 0.0077   Epoch: 14   Global Step: 179330   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:16:38,326-Speed 3205.63 samples/sec   Loss 2.1054   LearningRate 0.0077   Epoch: 14   Global Step: 179340   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:16:41,523-Speed 3203.68 samples/sec   Loss 2.1359   LearningRate 0.0077   Epoch: 14   Global Step: 179350   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:16:44,684-Speed 3240.06 samples/sec   Loss 2.1423   LearningRate 0.0077   Epoch: 14   Global Step: 179360   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:16:47,760-Speed 3330.30 samples/sec   Loss 2.1157   LearningRate 0.0077   Epoch: 14   Global Step: 179370   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:16:50,862-Speed 3301.84 samples/sec   Loss 2.1570   LearningRate 0.0077   Epoch: 14   Global Step: 179380   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:16:54,071-Speed 3191.42 samples/sec   Loss 2.1285   LearningRate 0.0077   Epoch: 14   Global Step: 179390   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:16:57,169-Speed 3306.53 samples/sec   Loss 2.2192   LearningRate 0.0077   Epoch: 14   Global Step: 179400   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:17:00,308-Speed 3263.65 samples/sec   Loss 2.1699   LearningRate 0.0077   Epoch: 14   Global Step: 179410   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:17:03,506-Speed 3203.52 samples/sec   Loss 2.0821   LearningRate 0.0077   Epoch: 14   Global Step: 179420   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:17:06,683-Speed 3224.11 samples/sec   Loss 2.2026   LearningRate 0.0077   Epoch: 14   Global Step: 179430   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:17:09,766-Speed 3321.50 samples/sec   Loss 2.1624   LearningRate 0.0077   Epoch: 14   Global Step: 179440   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:17:12,883-Speed 3286.65 samples/sec   Loss 2.0597   LearningRate 0.0077   Epoch: 14   Global Step: 179450   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:17:15,990-Speed 3297.34 samples/sec   Loss 2.0887   LearningRate 0.0077   Epoch: 14   Global Step: 179460   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:17:19,151-Speed 3240.30 samples/sec   Loss 2.1461   LearningRate 0.0077   Epoch: 14   Global Step: 179470   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:17:22,256-Speed 3298.38 samples/sec   Loss 2.1266   LearningRate 0.0077   Epoch: 14   Global Step: 179480   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:17:25,384-Speed 3274.50 samples/sec   Loss 2.1781   LearningRate 0.0077   Epoch: 14   Global Step: 179490   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:17:28,547-Speed 3238.74 samples/sec   Loss 2.0823   LearningRate 0.0077   Epoch: 14   Global Step: 179500   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:17:31,632-Speed 3320.95 samples/sec   Loss 2.1394   LearningRate 0.0077   Epoch: 14   Global Step: 179510   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:17:34,709-Speed 3328.44 samples/sec   Loss 2.0906   LearningRate 0.0077   Epoch: 14   Global Step: 179520   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:17:37,809-Speed 3303.72 samples/sec   Loss 2.1514   LearningRate 0.0077   Epoch: 14   Global Step: 179530   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:17:40,941-Speed 3270.86 samples/sec   Loss 2.1392   LearningRate 0.0077   Epoch: 14   Global Step: 179540   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:17:44,045-Speed 3300.43 samples/sec   Loss 2.1241   LearningRate 0.0077   Epoch: 14   Global Step: 179550   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:17:47,171-Speed 3276.29 samples/sec   Loss 2.1028   LearningRate 0.0077   Epoch: 14   Global Step: 179560   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:17:50,273-Speed 3301.56 samples/sec   Loss 2.1799   LearningRate 0.0077   Epoch: 14   Global Step: 179570   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:17:53,433-Speed 3242.22 samples/sec   Loss 2.1256   LearningRate 0.0077   Epoch: 14   Global Step: 179580   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:17:56,529-Speed 3307.68 samples/sec   Loss 2.0955   LearningRate 0.0077   Epoch: 14   Global Step: 179590   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:17:59,694-Speed 3236.81 samples/sec   Loss 2.1057   LearningRate 0.0077   Epoch: 14   Global Step: 179600   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:18:02,816-Speed 3281.63 samples/sec   Loss 2.1162   LearningRate 0.0077   Epoch: 14   Global Step: 179610   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:18:05,937-Speed 3281.74 samples/sec   Loss 2.1292   LearningRate 0.0077   Epoch: 14   Global Step: 179620   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:18:09,006-Speed 3337.55 samples/sec   Loss 2.1494   LearningRate 0.0077   Epoch: 14   Global Step: 179630   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:18:12,071-Speed 3342.31 samples/sec   Loss 2.1751   LearningRate 0.0077   Epoch: 14   Global Step: 179640   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:18:15,163-Speed 3312.50 samples/sec   Loss 2.0975   LearningRate 0.0077   Epoch: 14   Global Step: 179650   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:18:18,328-Speed 3236.65 samples/sec   Loss 2.1561   LearningRate 0.0077   Epoch: 14   Global Step: 179660   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:18:21,402-Speed 3332.51 samples/sec   Loss 2.1682   LearningRate 0.0077   Epoch: 14   Global Step: 179670   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:18:24,566-Speed 3236.64 samples/sec   Loss 2.1735   LearningRate 0.0077   Epoch: 14   Global Step: 179680   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:18:27,648-Speed 3323.54 samples/sec   Loss 2.1062   LearningRate 0.0077   Epoch: 14   Global Step: 179690   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:18:30,743-Speed 3310.89 samples/sec   Loss 2.1748   LearningRate 0.0077   Epoch: 14   Global Step: 179700   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:18:33,839-Speed 3307.53 samples/sec   Loss 2.1079   LearningRate 0.0077   Epoch: 14   Global Step: 179710   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:18:36,918-Speed 3327.37 samples/sec   Loss 2.1659   LearningRate 0.0076   Epoch: 14   Global Step: 179720   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:18:40,015-Speed 3307.84 samples/sec   Loss 2.1375   LearningRate 0.0076   Epoch: 14   Global Step: 179730   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:18:43,241-Speed 3174.37 samples/sec   Loss 2.0843   LearningRate 0.0076   Epoch: 14   Global Step: 179740   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:18:46,302-Speed 3346.80 samples/sec   Loss 2.1330   LearningRate 0.0076   Epoch: 14   Global Step: 179750   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:18:49,459-Speed 3244.96 samples/sec   Loss 2.1213   LearningRate 0.0076   Epoch: 14   Global Step: 179760   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:18:52,570-Speed 3292.01 samples/sec   Loss 2.1491   LearningRate 0.0076   Epoch: 14   Global Step: 179770   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:18:55,644-Speed 3332.67 samples/sec   Loss 2.2198   LearningRate 0.0076   Epoch: 14   Global Step: 179780   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:18:58,746-Speed 3302.40 samples/sec   Loss 2.1321   LearningRate 0.0076   Epoch: 14   Global Step: 179790   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:19:01,860-Speed 3289.67 samples/sec   Loss 2.1088   LearningRate 0.0076   Epoch: 14   Global Step: 179800   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:19:04,985-Speed 3277.87 samples/sec   Loss 2.1416   LearningRate 0.0076   Epoch: 14   Global Step: 179810   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:19:08,035-Speed 3357.95 samples/sec   Loss 2.1370   LearningRate 0.0076   Epoch: 14   Global Step: 179820   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:19:11,118-Speed 3322.53 samples/sec   Loss 2.1497   LearningRate 0.0076   Epoch: 14   Global Step: 179830   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:19:14,237-Speed 3283.83 samples/sec   Loss 2.1195   LearningRate 0.0076   Epoch: 14   Global Step: 179840   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:19:17,399-Speed 3240.08 samples/sec   Loss 2.1111   LearningRate 0.0076   Epoch: 14   Global Step: 179850   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:19:20,502-Speed 3301.11 samples/sec   Loss 2.1342   LearningRate 0.0076   Epoch: 14   Global Step: 179860   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:19:23,642-Speed 3261.90 samples/sec   Loss 2.1094   LearningRate 0.0076   Epoch: 14   Global Step: 179870   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:19:26,766-Speed 3278.93 samples/sec   Loss 2.1382   LearningRate 0.0076   Epoch: 14   Global Step: 179880   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:19:29,908-Speed 3259.84 samples/sec   Loss 2.1772   LearningRate 0.0076   Epoch: 14   Global Step: 179890   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:19:33,019-Speed 3291.97 samples/sec   Loss 2.1617   LearningRate 0.0076   Epoch: 14   Global Step: 179900   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:19:36,135-Speed 3288.51 samples/sec   Loss 2.1934   LearningRate 0.0076   Epoch: 14   Global Step: 179910   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:19:39,202-Speed 3338.78 samples/sec   Loss 2.1044   LearningRate 0.0076   Epoch: 14   Global Step: 179920   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:19:42,301-Speed 3306.11 samples/sec   Loss 2.1424   LearningRate 0.0076   Epoch: 14   Global Step: 179930   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:19:45,426-Speed 3278.00 samples/sec   Loss 2.1336   LearningRate 0.0076   Epoch: 14   Global Step: 179940   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:19:48,545-Speed 3283.87 samples/sec   Loss 2.1512   LearningRate 0.0076   Epoch: 14   Global Step: 179950   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:19:51,723-Speed 3222.86 samples/sec   Loss 2.2403   LearningRate 0.0076   Epoch: 14   Global Step: 179960   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:19:54,883-Speed 3240.92 samples/sec   Loss 2.1435   LearningRate 0.0076   Epoch: 14   Global Step: 179970   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:19:57,967-Speed 3322.46 samples/sec   Loss 2.1609   LearningRate 0.0076   Epoch: 14   Global Step: 179980   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:20:01,166-Speed 3202.02 samples/sec   Loss 2.1884   LearningRate 0.0076   Epoch: 14   Global Step: 179990   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:20:04,251-Speed 3319.87 samples/sec   Loss 2.2012   LearningRate 0.0076   Epoch: 14   Global Step: 180000   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:20:07,364-Speed 3290.33 samples/sec   Loss 2.1178   LearningRate 0.0076   Epoch: 14   Global Step: 180010   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:20:10,524-Speed 3242.19 samples/sec   Loss 2.1494   LearningRate 0.0076   Epoch: 14   Global Step: 180020   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:20:13,722-Speed 3203.19 samples/sec   Loss 2.1383   LearningRate 0.0076   Epoch: 14   Global Step: 180030   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:20:16,825-Speed 3300.57 samples/sec   Loss 2.2087   LearningRate 0.0076   Epoch: 14   Global Step: 180040   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:20:19,917-Speed 3313.02 samples/sec   Loss 2.1390   LearningRate 0.0076   Epoch: 14   Global Step: 180050   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:20:22,978-Speed 3346.52 samples/sec   Loss 2.1602   LearningRate 0.0076   Epoch: 14   Global Step: 180060   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:20:26,040-Speed 3346.06 samples/sec   Loss 2.0596   LearningRate 0.0076   Epoch: 14   Global Step: 180070   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:20:29,129-Speed 3315.56 samples/sec   Loss 2.1479   LearningRate 0.0076   Epoch: 14   Global Step: 180080   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:20:32,242-Speed 3290.13 samples/sec   Loss 2.0893   LearningRate 0.0076   Epoch: 14   Global Step: 180090   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:20:35,335-Speed 3312.46 samples/sec   Loss 2.1698   LearningRate 0.0076   Epoch: 14   Global Step: 180100   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:20:38,421-Speed 3319.22 samples/sec   Loss 2.1432   LearningRate 0.0076   Epoch: 14   Global Step: 180110   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:20:41,601-Speed 3221.29 samples/sec   Loss 2.2162   LearningRate 0.0076   Epoch: 14   Global Step: 180120   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 17:20:44,661-Speed 3347.28 samples/sec   Loss 2.1208   LearningRate 0.0076   Epoch: 14   Global Step: 180130   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:20:47,738-Speed 3329.53 samples/sec   Loss 2.0935   LearningRate 0.0076   Epoch: 14   Global Step: 180140   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:20:50,824-Speed 3319.10 samples/sec   Loss 2.0910   LearningRate 0.0076   Epoch: 14   Global Step: 180150   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:20:53,991-Speed 3234.51 samples/sec   Loss 2.1610   LearningRate 0.0076   Epoch: 14   Global Step: 180160   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:20:57,045-Speed 3353.77 samples/sec   Loss 2.2226   LearningRate 0.0075   Epoch: 14   Global Step: 180170   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:21:00,145-Speed 3304.53 samples/sec   Loss 2.1410   LearningRate 0.0075   Epoch: 14   Global Step: 180180   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:21:03,219-Speed 3331.69 samples/sec   Loss 2.1987   LearningRate 0.0075   Epoch: 14   Global Step: 180190   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:21:06,356-Speed 3266.46 samples/sec   Loss 2.1103   LearningRate 0.0075   Epoch: 14   Global Step: 180200   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:21:09,416-Speed 3347.59 samples/sec   Loss 2.1746   LearningRate 0.0075   Epoch: 14   Global Step: 180210   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:21:12,502-Speed 3318.69 samples/sec   Loss 2.1416   LearningRate 0.0075   Epoch: 14   Global Step: 180220   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:21:15,625-Speed 3280.47 samples/sec   Loss 2.2302   LearningRate 0.0075   Epoch: 14   Global Step: 180230   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:21:18,817-Speed 3208.38 samples/sec   Loss 2.1881   LearningRate 0.0075   Epoch: 14   Global Step: 180240   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:21:21,913-Speed 3308.98 samples/sec   Loss 2.1530   LearningRate 0.0075   Epoch: 14   Global Step: 180250   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:21:25,011-Speed 3306.13 samples/sec   Loss 2.1550   LearningRate 0.0075   Epoch: 14   Global Step: 180260   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:21:28,152-Speed 3261.11 samples/sec   Loss 2.1831   LearningRate 0.0075   Epoch: 14   Global Step: 180270   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:21:31,241-Speed 3315.72 samples/sec   Loss 2.0740   LearningRate 0.0075   Epoch: 14   Global Step: 180280   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:21:34,296-Speed 3353.62 samples/sec   Loss 2.1628   LearningRate 0.0075   Epoch: 14   Global Step: 180290   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:21:37,428-Speed 3270.40 samples/sec   Loss 2.1502   LearningRate 0.0075   Epoch: 14   Global Step: 180300   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:21:40,545-Speed 3286.06 samples/sec   Loss 2.2136   LearningRate 0.0075   Epoch: 14   Global Step: 180310   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:21:43,688-Speed 3259.30 samples/sec   Loss 2.1614   LearningRate 0.0075   Epoch: 14   Global Step: 180320   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:21:46,780-Speed 3313.30 samples/sec   Loss 2.1799   LearningRate 0.0075   Epoch: 14   Global Step: 180330   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:21:49,849-Speed 3337.69 samples/sec   Loss 2.2503   LearningRate 0.0075   Epoch: 14   Global Step: 180340   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:21:52,971-Speed 3280.75 samples/sec   Loss 2.2116   LearningRate 0.0075   Epoch: 14   Global Step: 180350   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:21:56,062-Speed 3314.28 samples/sec   Loss 2.1496   LearningRate 0.0075   Epoch: 14   Global Step: 180360   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:21:59,124-Speed 3344.54 samples/sec   Loss 2.1994   LearningRate 0.0075   Epoch: 14   Global Step: 180370   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:22:02,209-Speed 3319.97 samples/sec   Loss 2.0857   LearningRate 0.0075   Epoch: 14   Global Step: 180380   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:22:05,350-Speed 3261.72 samples/sec   Loss 2.1768   LearningRate 0.0075   Epoch: 14   Global Step: 180390   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:22:08,413-Speed 3344.71 samples/sec   Loss 2.1858   LearningRate 0.0075   Epoch: 14   Global Step: 180400   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:22:11,472-Speed 3348.62 samples/sec   Loss 2.1630   LearningRate 0.0075   Epoch: 14   Global Step: 180410   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:22:14,540-Speed 3338.17 samples/sec   Loss 2.1070   LearningRate 0.0075   Epoch: 14   Global Step: 180420   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:22:17,689-Speed 3253.45 samples/sec   Loss 2.1425   LearningRate 0.0075   Epoch: 14   Global Step: 180430   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:22:20,762-Speed 3333.17 samples/sec   Loss 2.2116   LearningRate 0.0075   Epoch: 14   Global Step: 180440   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:22:23,819-Speed 3350.68 samples/sec   Loss 2.1689   LearningRate 0.0075   Epoch: 14   Global Step: 180450   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:22:26,900-Speed 3325.18 samples/sec   Loss 2.0964   LearningRate 0.0075   Epoch: 14   Global Step: 180460   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:22:30,012-Speed 3290.89 samples/sec   Loss 2.1366   LearningRate 0.0075   Epoch: 14   Global Step: 180470   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:22:33,084-Speed 3334.61 samples/sec   Loss 2.1095   LearningRate 0.0075   Epoch: 14   Global Step: 180480   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:22:36,237-Speed 3248.50 samples/sec   Loss 2.2039   LearningRate 0.0075   Epoch: 14   Global Step: 180490   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:22:39,372-Speed 3268.05 samples/sec   Loss 2.1576   LearningRate 0.0075   Epoch: 14   Global Step: 180500   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:22:42,448-Speed 3329.88 samples/sec   Loss 2.1125   LearningRate 0.0075   Epoch: 14   Global Step: 180510   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:22:45,517-Speed 3338.31 samples/sec   Loss 2.1640   LearningRate 0.0075   Epoch: 14   Global Step: 180520   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:22:48,659-Speed 3259.71 samples/sec   Loss 2.1340   LearningRate 0.0075   Epoch: 14   Global Step: 180530   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:22:51,828-Speed 3232.51 samples/sec   Loss 2.1208   LearningRate 0.0075   Epoch: 14   Global Step: 180540   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:22:54,975-Speed 3254.49 samples/sec   Loss 2.1252   LearningRate 0.0075   Epoch: 14   Global Step: 180550   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:22:58,028-Speed 3355.95 samples/sec   Loss 2.1688   LearningRate 0.0075   Epoch: 14   Global Step: 180560   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:23:01,116-Speed 3316.10 samples/sec   Loss 2.0849   LearningRate 0.0075   Epoch: 14   Global Step: 180570   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:23:04,284-Speed 3233.57 samples/sec   Loss 2.0987   LearningRate 0.0075   Epoch: 14   Global Step: 180580   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:23:07,374-Speed 3315.09 samples/sec   Loss 2.1983   LearningRate 0.0075   Epoch: 14   Global Step: 180590   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:23:10,468-Speed 3310.92 samples/sec   Loss 2.0801   LearningRate 0.0075   Epoch: 14   Global Step: 180600   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:23:13,657-Speed 3211.32 samples/sec   Loss 2.2334   LearningRate 0.0075   Epoch: 14   Global Step: 180610   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:23:16,776-Speed 3284.12 samples/sec   Loss 2.1451   LearningRate 0.0074   Epoch: 14   Global Step: 180620   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:23:19,833-Speed 3351.36 samples/sec   Loss 2.1693   LearningRate 0.0074   Epoch: 14   Global Step: 180630   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:23:22,900-Speed 3340.40 samples/sec   Loss 2.1949   LearningRate 0.0074   Epoch: 14   Global Step: 180640   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:23:26,064-Speed 3237.33 samples/sec   Loss 2.1204   LearningRate 0.0074   Epoch: 14   Global Step: 180650   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:23:29,208-Speed 3258.04 samples/sec   Loss 2.2553   LearningRate 0.0074   Epoch: 14   Global Step: 180660   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:23:32,306-Speed 3305.91 samples/sec   Loss 2.1249   LearningRate 0.0074   Epoch: 14   Global Step: 180670   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:23:35,396-Speed 3315.72 samples/sec   Loss 2.1872   LearningRate 0.0074   Epoch: 14   Global Step: 180680   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:23:38,478-Speed 3323.50 samples/sec   Loss 2.2221   LearningRate 0.0074   Epoch: 14   Global Step: 180690   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:23:41,590-Speed 3291.19 samples/sec   Loss 2.1473   LearningRate 0.0074   Epoch: 14   Global Step: 180700   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:23:44,679-Speed 3316.25 samples/sec   Loss 2.1488   LearningRate 0.0074   Epoch: 14   Global Step: 180710   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:23:47,910-Speed 3170.26 samples/sec   Loss 2.1052   LearningRate 0.0074   Epoch: 14   Global Step: 180720   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:23:50,997-Speed 3318.00 samples/sec   Loss 2.1627   LearningRate 0.0074   Epoch: 14   Global Step: 180730   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:23:54,166-Speed 3231.72 samples/sec   Loss 2.1411   LearningRate 0.0074   Epoch: 14   Global Step: 180740   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:23:57,261-Speed 3309.97 samples/sec   Loss 2.1293   LearningRate 0.0074   Epoch: 14   Global Step: 180750   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:24:00,371-Speed 3293.82 samples/sec   Loss 2.1084   LearningRate 0.0074   Epoch: 14   Global Step: 180760   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:24:03,519-Speed 3253.93 samples/sec   Loss 2.2010   LearningRate 0.0074   Epoch: 14   Global Step: 180770   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:24:06,631-Speed 3291.90 samples/sec   Loss 2.1706   LearningRate 0.0074   Epoch: 14   Global Step: 180780   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:24:09,714-Speed 3322.39 samples/sec   Loss 2.1890   LearningRate 0.0074   Epoch: 14   Global Step: 180790   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:24:12,822-Speed 3295.41 samples/sec   Loss 2.1958   LearningRate 0.0074   Epoch: 14   Global Step: 180800   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:24:15,930-Speed 3296.68 samples/sec   Loss 2.2049   LearningRate 0.0074   Epoch: 14   Global Step: 180810   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:24:19,008-Speed 3326.96 samples/sec   Loss 2.1693   LearningRate 0.0074   Epoch: 14   Global Step: 180820   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:24:22,108-Speed 3304.60 samples/sec   Loss 2.2091   LearningRate 0.0074   Epoch: 14   Global Step: 180830   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:24:25,175-Speed 3339.59 samples/sec   Loss 2.1844   LearningRate 0.0074   Epoch: 14   Global Step: 180840   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:24:28,329-Speed 3248.69 samples/sec   Loss 2.1661   LearningRate 0.0074   Epoch: 14   Global Step: 180850   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:24:31,465-Speed 3265.90 samples/sec   Loss 2.1828   LearningRate 0.0074   Epoch: 14   Global Step: 180860   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:24:34,560-Speed 3309.10 samples/sec   Loss 2.1983   LearningRate 0.0074   Epoch: 14   Global Step: 180870   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:24:37,701-Speed 3260.95 samples/sec   Loss 2.1109   LearningRate 0.0074   Epoch: 14   Global Step: 180880   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:24:40,800-Speed 3305.49 samples/sec   Loss 2.0977   LearningRate 0.0074   Epoch: 14   Global Step: 180890   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:24:43,866-Speed 3340.48 samples/sec   Loss 2.1225   LearningRate 0.0074   Epoch: 14   Global Step: 180900   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:24:46,984-Speed 3286.08 samples/sec   Loss 2.1222   LearningRate 0.0074   Epoch: 14   Global Step: 180910   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:24:50,077-Speed 3311.00 samples/sec   Loss 2.2213   LearningRate 0.0074   Epoch: 14   Global Step: 180920   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:24:53,157-Speed 3325.80 samples/sec   Loss 2.1351   LearningRate 0.0074   Epoch: 14   Global Step: 180930   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:24:56,268-Speed 3292.82 samples/sec   Loss 2.1402   LearningRate 0.0074   Epoch: 14   Global Step: 180940   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:24:59,413-Speed 3257.13 samples/sec   Loss 2.1139   LearningRate 0.0074   Epoch: 14   Global Step: 180950   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:25:02,519-Speed 3297.91 samples/sec   Loss 2.1744   LearningRate 0.0074   Epoch: 14   Global Step: 180960   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:25:05,638-Speed 3283.71 samples/sec   Loss 2.1500   LearningRate 0.0074   Epoch: 14   Global Step: 180970   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:25:08,769-Speed 3271.66 samples/sec   Loss 2.1616   LearningRate 0.0074   Epoch: 14   Global Step: 180980   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:25:11,863-Speed 3310.66 samples/sec   Loss 2.1280   LearningRate 0.0074   Epoch: 14   Global Step: 180990   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:25:15,034-Speed 3230.73 samples/sec   Loss 2.1280   LearningRate 0.0074   Epoch: 14   Global Step: 181000   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:25:18,233-Speed 3201.19 samples/sec   Loss 2.0821   LearningRate 0.0074   Epoch: 14   Global Step: 181010   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:25:21,311-Speed 3328.75 samples/sec   Loss 2.1298   LearningRate 0.0074   Epoch: 14   Global Step: 181020   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:25:24,437-Speed 3276.06 samples/sec   Loss 2.1269   LearningRate 0.0074   Epoch: 14   Global Step: 181030   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:25:27,576-Speed 3263.87 samples/sec   Loss 2.1877   LearningRate 0.0074   Epoch: 14   Global Step: 181040   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:25:30,701-Speed 3277.29 samples/sec   Loss 2.1572   LearningRate 0.0074   Epoch: 14   Global Step: 181050   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:25:33,779-Speed 3328.76 samples/sec   Loss 2.2047   LearningRate 0.0074   Epoch: 14   Global Step: 181060   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:25:36,886-Speed 3296.35 samples/sec   Loss 2.1719   LearningRate 0.0074   Epoch: 14   Global Step: 181070   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:25:39,962-Speed 3329.61 samples/sec   Loss 2.1622   LearningRate 0.0073   Epoch: 14   Global Step: 181080   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:25:43,077-Speed 3288.12 samples/sec   Loss 2.1983   LearningRate 0.0073   Epoch: 14   Global Step: 181090   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:25:46,186-Speed 3295.71 samples/sec   Loss 2.2250   LearningRate 0.0073   Epoch: 14   Global Step: 181100   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:25:49,287-Speed 3302.76 samples/sec   Loss 2.1425   LearningRate 0.0073   Epoch: 14   Global Step: 181110   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:25:52,393-Speed 3297.92 samples/sec   Loss 2.1339   LearningRate 0.0073   Epoch: 14   Global Step: 181120   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:25:55,509-Speed 3286.82 samples/sec   Loss 2.2001   LearningRate 0.0073   Epoch: 14   Global Step: 181130   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:25:58,616-Speed 3296.91 samples/sec   Loss 2.1809   LearningRate 0.0073   Epoch: 14   Global Step: 181140   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:26:01,703-Speed 3318.36 samples/sec   Loss 2.1512   LearningRate 0.0073   Epoch: 14   Global Step: 181150   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:26:04,823-Speed 3283.17 samples/sec   Loss 2.1840   LearningRate 0.0073   Epoch: 14   Global Step: 181160   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:26:07,917-Speed 3310.74 samples/sec   Loss 2.1103   LearningRate 0.0073   Epoch: 14   Global Step: 181170   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:26:10,988-Speed 3335.56 samples/sec   Loss 2.1907   LearningRate 0.0073   Epoch: 14   Global Step: 181180   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:26:14,111-Speed 3279.90 samples/sec   Loss 2.1872   LearningRate 0.0073   Epoch: 14   Global Step: 181190   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:26:17,201-Speed 3315.35 samples/sec   Loss 2.2030   LearningRate 0.0073   Epoch: 14   Global Step: 181200   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:26:20,284-Speed 3322.89 samples/sec   Loss 2.1549   LearningRate 0.0073   Epoch: 14   Global Step: 181210   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:26:23,413-Speed 3273.13 samples/sec   Loss 2.1467   LearningRate 0.0073   Epoch: 14   Global Step: 181220   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:26:26,515-Speed 3301.94 samples/sec   Loss 2.2068   LearningRate 0.0073   Epoch: 14   Global Step: 181230   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:26:29,594-Speed 3327.07 samples/sec   Loss 2.1572   LearningRate 0.0073   Epoch: 14   Global Step: 181240   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:26:32,695-Speed 3303.52 samples/sec   Loss 2.1356   LearningRate 0.0073   Epoch: 14   Global Step: 181250   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:26:35,812-Speed 3285.94 samples/sec   Loss 2.1820   LearningRate 0.0073   Epoch: 14   Global Step: 181260   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:26:38,942-Speed 3272.38 samples/sec   Loss 2.1988   LearningRate 0.0073   Epoch: 14   Global Step: 181270   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:26:42,040-Speed 3306.00 samples/sec   Loss 2.1802   LearningRate 0.0073   Epoch: 14   Global Step: 181280   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:26:45,129-Speed 3316.89 samples/sec   Loss 2.2771   LearningRate 0.0073   Epoch: 14   Global Step: 181290   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:26:48,222-Speed 3310.68 samples/sec   Loss 2.1288   LearningRate 0.0073   Epoch: 14   Global Step: 181300   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:26:51,327-Speed 3298.87 samples/sec   Loss 2.1731   LearningRate 0.0073   Epoch: 14   Global Step: 181310   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:26:54,424-Speed 3307.68 samples/sec   Loss 2.1060   LearningRate 0.0073   Epoch: 14   Global Step: 181320   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:26:57,503-Speed 3327.63 samples/sec   Loss 2.1744   LearningRate 0.0073   Epoch: 14   Global Step: 181330   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:27:00,684-Speed 3219.18 samples/sec   Loss 2.1374   LearningRate 0.0073   Epoch: 14   Global Step: 181340   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:27:03,800-Speed 3287.42 samples/sec   Loss 2.2215   LearningRate 0.0073   Epoch: 14   Global Step: 181350   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:27:06,916-Speed 3287.36 samples/sec   Loss 2.1934   LearningRate 0.0073   Epoch: 14   Global Step: 181360   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:27:09,987-Speed 3335.56 samples/sec   Loss 2.1585   LearningRate 0.0073   Epoch: 14   Global Step: 181370   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:27:13,111-Speed 3278.68 samples/sec   Loss 2.1445   LearningRate 0.0073   Epoch: 14   Global Step: 181380   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:27:16,202-Speed 3313.92 samples/sec   Loss 2.1382   LearningRate 0.0073   Epoch: 14   Global Step: 181390   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:27:19,284-Speed 3323.91 samples/sec   Loss 2.1607   LearningRate 0.0073   Epoch: 14   Global Step: 181400   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:27:22,390-Speed 3297.55 samples/sec   Loss 2.1472   LearningRate 0.0073   Epoch: 14   Global Step: 181410   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:27:25,516-Speed 3277.87 samples/sec   Loss 2.2065   LearningRate 0.0073   Epoch: 14   Global Step: 181420   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:27:28,657-Speed 3260.93 samples/sec   Loss 2.1519   LearningRate 0.0073   Epoch: 14   Global Step: 181430   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:27:31,800-Speed 3259.22 samples/sec   Loss 2.2122   LearningRate 0.0073   Epoch: 14   Global Step: 181440   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:27:34,938-Speed 3263.90 samples/sec   Loss 2.1758   LearningRate 0.0073   Epoch: 14   Global Step: 181450   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:27:38,115-Speed 3224.37 samples/sec   Loss 2.1852   LearningRate 0.0073   Epoch: 14   Global Step: 181460   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:27:41,310-Speed 3206.71 samples/sec   Loss 2.1909   LearningRate 0.0073   Epoch: 14   Global Step: 181470   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:27:44,414-Speed 3299.96 samples/sec   Loss 2.1391   LearningRate 0.0073   Epoch: 14   Global Step: 181480   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:27:47,572-Speed 3243.60 samples/sec   Loss 2.2388   LearningRate 0.0073   Epoch: 14   Global Step: 181490   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:27:50,673-Speed 3303.00 samples/sec   Loss 2.1318   LearningRate 0.0073   Epoch: 14   Global Step: 181500   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:27:53,773-Speed 3304.08 samples/sec   Loss 2.1218   LearningRate 0.0073   Epoch: 14   Global Step: 181510   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:27:56,871-Speed 3306.99 samples/sec   Loss 2.1813   LearningRate 0.0073   Epoch: 14   Global Step: 181520   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:27:59,976-Speed 3297.72 samples/sec   Loss 2.1752   LearningRate 0.0073   Epoch: 14   Global Step: 181530   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:28:03,067-Speed 3314.62 samples/sec   Loss 2.1594   LearningRate 0.0072   Epoch: 14   Global Step: 181540   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:28:06,140-Speed 3332.93 samples/sec   Loss 2.1564   LearningRate 0.0072   Epoch: 14   Global Step: 181550   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:28:09,229-Speed 3316.45 samples/sec   Loss 2.1102   LearningRate 0.0072   Epoch: 14   Global Step: 181560   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:28:12,371-Speed 3259.29 samples/sec   Loss 2.1976   LearningRate 0.0072   Epoch: 14   Global Step: 181570   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:28:15,497-Speed 3277.12 samples/sec   Loss 2.1034   LearningRate 0.0072   Epoch: 14   Global Step: 181580   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:28:18,592-Speed 3309.54 samples/sec   Loss 2.2115   LearningRate 0.0072   Epoch: 14   Global Step: 181590   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:28:21,659-Speed 3339.86 samples/sec   Loss 2.1351   LearningRate 0.0072   Epoch: 14   Global Step: 181600   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:28:24,809-Speed 3251.75 samples/sec   Loss 2.1750   LearningRate 0.0072   Epoch: 14   Global Step: 181610   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:28:27,936-Speed 3275.56 samples/sec   Loss 2.1848   LearningRate 0.0072   Epoch: 14   Global Step: 181620   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:28:31,035-Speed 3304.95 samples/sec   Loss 2.1694   LearningRate 0.0072   Epoch: 14   Global Step: 181630   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:28:34,169-Speed 3268.55 samples/sec   Loss 2.1389   LearningRate 0.0072   Epoch: 14   Global Step: 181640   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:28:37,284-Speed 3288.83 samples/sec   Loss 2.1850   LearningRate 0.0072   Epoch: 14   Global Step: 181650   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:28:40,383-Speed 3305.51 samples/sec   Loss 2.1607   LearningRate 0.0072   Epoch: 14   Global Step: 181660   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:28:43,487-Speed 3299.17 samples/sec   Loss 2.2094   LearningRate 0.0072   Epoch: 14   Global Step: 181670   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:28:46,589-Speed 3302.28 samples/sec   Loss 2.1772   LearningRate 0.0072   Epoch: 14   Global Step: 181680   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:28:49,694-Speed 3299.33 samples/sec   Loss 2.1891   LearningRate 0.0072   Epoch: 14   Global Step: 181690   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:28:52,811-Speed 3286.71 samples/sec   Loss 2.1369   LearningRate 0.0072   Epoch: 14   Global Step: 181700   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:28:55,891-Speed 3325.99 samples/sec   Loss 2.0924   LearningRate 0.0072   Epoch: 14   Global Step: 181710   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:28:58,967-Speed 3330.10 samples/sec   Loss 2.1420   LearningRate 0.0072   Epoch: 14   Global Step: 181720   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:29:02,096-Speed 3273.93 samples/sec   Loss 2.1772   LearningRate 0.0072   Epoch: 14   Global Step: 181730   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:29:05,236-Speed 3262.00 samples/sec   Loss 2.1834   LearningRate 0.0072   Epoch: 14   Global Step: 181740   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:29:08,313-Speed 3328.45 samples/sec   Loss 2.2038   LearningRate 0.0072   Epoch: 14   Global Step: 181750   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:29:11,423-Speed 3294.51 samples/sec   Loss 2.2076   LearningRate 0.0072   Epoch: 14   Global Step: 181760   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:29:14,502-Speed 3326.15 samples/sec   Loss 2.2046   LearningRate 0.0072   Epoch: 14   Global Step: 181770   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:29:18,203-Speed 2767.65 samples/sec   Loss 2.1769   LearningRate 0.0072   Epoch: 14   Global Step: 181780   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:29:21,260-Speed 3350.17 samples/sec   Loss 2.1930   LearningRate 0.0072   Epoch: 14   Global Step: 181790   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:29:24,313-Speed 3355.39 samples/sec   Loss 2.1837   LearningRate 0.0072   Epoch: 14   Global Step: 181800   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:29:27,364-Speed 3357.29 samples/sec   Loss 2.1396   LearningRate 0.0072   Epoch: 14   Global Step: 181810   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:29:30,408-Speed 3365.44 samples/sec   Loss 2.2140   LearningRate 0.0072   Epoch: 14   Global Step: 181820   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:29:33,525-Speed 3286.59 samples/sec   Loss 2.2239   LearningRate 0.0072   Epoch: 14   Global Step: 181830   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:29:36,594-Speed 3337.84 samples/sec   Loss 2.2225   LearningRate 0.0072   Epoch: 14   Global Step: 181840   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:29:39,743-Speed 3252.38 samples/sec   Loss 2.1915   LearningRate 0.0072   Epoch: 14   Global Step: 181850   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:29:42,810-Speed 3340.02 samples/sec   Loss 2.1764   LearningRate 0.0072   Epoch: 14   Global Step: 181860   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:29:45,921-Speed 3292.38 samples/sec   Loss 2.1878   LearningRate 0.0072   Epoch: 14   Global Step: 181870   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:29:49,002-Speed 3324.62 samples/sec   Loss 2.1816   LearningRate 0.0072   Epoch: 14   Global Step: 181880   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:29:52,123-Speed 3281.49 samples/sec   Loss 2.1940   LearningRate 0.0072   Epoch: 14   Global Step: 181890   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:29:55,224-Speed 3303.73 samples/sec   Loss 2.1882   LearningRate 0.0072   Epoch: 14   Global Step: 181900   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:29:58,259-Speed 3375.67 samples/sec   Loss 2.1654   LearningRate 0.0072   Epoch: 14   Global Step: 181910   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:30:01,384-Speed 3277.34 samples/sec   Loss 2.1334   LearningRate 0.0072   Epoch: 14   Global Step: 181920   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:30:04,572-Speed 3213.00 samples/sec   Loss 2.2433   LearningRate 0.0072   Epoch: 14   Global Step: 181930   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:30:07,706-Speed 3269.44 samples/sec   Loss 2.2156   LearningRate 0.0072   Epoch: 14   Global Step: 181940   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:30:10,795-Speed 3315.88 samples/sec   Loss 2.1428   LearningRate 0.0072   Epoch: 14   Global Step: 181950   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:30:13,900-Speed 3298.10 samples/sec   Loss 2.2650   LearningRate 0.0072   Epoch: 14   Global Step: 181960   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:30:17,034-Speed 3268.38 samples/sec   Loss 2.1518   LearningRate 0.0072   Epoch: 14   Global Step: 181970   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:30:20,168-Speed 3268.85 samples/sec   Loss 2.0888   LearningRate 0.0072   Epoch: 14   Global Step: 181980   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:30:23,271-Speed 3301.00 samples/sec   Loss 2.2273   LearningRate 0.0072   Epoch: 14   Global Step: 181990   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:30:26,365-Speed 3311.18 samples/sec   Loss 2.1505   LearningRate 0.0071   Epoch: 14   Global Step: 182000   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:30:29,529-Speed 3236.86 samples/sec   Loss 2.2405   LearningRate 0.0071   Epoch: 14   Global Step: 182010   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:30:32,599-Speed 3336.81 samples/sec   Loss 2.1607   LearningRate 0.0071   Epoch: 14   Global Step: 182020   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:30:35,750-Speed 3250.74 samples/sec   Loss 2.2322   LearningRate 0.0071   Epoch: 14   Global Step: 182030   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:30:38,891-Speed 3261.81 samples/sec   Loss 2.2004   LearningRate 0.0071   Epoch: 14   Global Step: 182040   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:30:41,981-Speed 3315.01 samples/sec   Loss 2.1597   LearningRate 0.0071   Epoch: 14   Global Step: 182050   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:30:45,038-Speed 3350.95 samples/sec   Loss 2.1799   LearningRate 0.0071   Epoch: 14   Global Step: 182060   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:30:48,102-Speed 3342.55 samples/sec   Loss 2.2367   LearningRate 0.0071   Epoch: 14   Global Step: 182070   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:30:51,162-Speed 3347.33 samples/sec   Loss 2.1527   LearningRate 0.0071   Epoch: 14   Global Step: 182080   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:30:54,860-Speed 2770.13 samples/sec   Loss 2.2048   LearningRate 0.0071   Epoch: 14   Global Step: 182090   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:30:58,542-Speed 2781.99 samples/sec   Loss 2.1551   LearningRate 0.0071   Epoch: 14   Global Step: 182100   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:31:02,965-Speed 2315.91 samples/sec   Loss 2.1398   LearningRate 0.0071   Epoch: 14   Global Step: 182110   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:31:06,628-Speed 2796.11 samples/sec   Loss 2.1597   LearningRate 0.0071   Epoch: 14   Global Step: 182120   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:31:09,676-Speed 3361.13 samples/sec   Loss 2.1182   LearningRate 0.0071   Epoch: 14   Global Step: 182130   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:31:12,773-Speed 3307.08 samples/sec   Loss 2.1950   LearningRate 0.0071   Epoch: 14   Global Step: 182140   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:31:15,858-Speed 3320.47 samples/sec   Loss 2.2522   LearningRate 0.0071   Epoch: 14   Global Step: 182150   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:31:18,954-Speed 3308.07 samples/sec   Loss 2.1165   LearningRate 0.0071   Epoch: 14   Global Step: 182160   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:31:22,025-Speed 3336.32 samples/sec   Loss 2.1161   LearningRate 0.0071   Epoch: 14   Global Step: 182170   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:31:25,179-Speed 3247.52 samples/sec   Loss 2.1570   LearningRate 0.0071   Epoch: 14   Global Step: 182180   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:31:28,287-Speed 3296.25 samples/sec   Loss 2.1512   LearningRate 0.0071   Epoch: 14   Global Step: 182190   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:31:31,406-Speed 3283.71 samples/sec   Loss 2.1837   LearningRate 0.0071   Epoch: 14   Global Step: 182200   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:31:34,491-Speed 3320.86 samples/sec   Loss 2.1411   LearningRate 0.0071   Epoch: 14   Global Step: 182210   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:31:37,560-Speed 3337.13 samples/sec   Loss 2.1843   LearningRate 0.0071   Epoch: 14   Global Step: 182220   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:31:40,711-Speed 3250.37 samples/sec   Loss 2.2535   LearningRate 0.0071   Epoch: 14   Global Step: 182230   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:31:43,819-Speed 3296.30 samples/sec   Loss 2.1472   LearningRate 0.0071   Epoch: 14   Global Step: 182240   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:31:46,915-Speed 3308.21 samples/sec   Loss 2.1626   LearningRate 0.0071   Epoch: 14   Global Step: 182250   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:31:50,024-Speed 3295.61 samples/sec   Loss 2.2029   LearningRate 0.0071   Epoch: 14   Global Step: 182260   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:31:53,173-Speed 3252.42 samples/sec   Loss 2.2018   LearningRate 0.0071   Epoch: 14   Global Step: 182270   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:31:56,288-Speed 3287.98 samples/sec   Loss 2.2250   LearningRate 0.0071   Epoch: 14   Global Step: 182280   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:31:59,426-Speed 3264.37 samples/sec   Loss 2.1381   LearningRate 0.0071   Epoch: 14   Global Step: 182290   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:32:02,500-Speed 3332.37 samples/sec   Loss 2.1617   LearningRate 0.0071   Epoch: 14   Global Step: 182300   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:32:05,571-Speed 3336.19 samples/sec   Loss 2.1564   LearningRate 0.0071   Epoch: 14   Global Step: 182310   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:32:08,706-Speed 3267.51 samples/sec   Loss 2.1481   LearningRate 0.0071   Epoch: 14   Global Step: 182320   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:32:11,827-Speed 3281.64 samples/sec   Loss 2.1339   LearningRate 0.0071   Epoch: 14   Global Step: 182330   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:32:14,907-Speed 3325.34 samples/sec   Loss 2.1653   LearningRate 0.0071   Epoch: 14   Global Step: 182340   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:32:18,090-Speed 3218.18 samples/sec   Loss 2.2244   LearningRate 0.0071   Epoch: 14   Global Step: 182350   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:32:21,126-Speed 3374.11 samples/sec   Loss 2.1324   LearningRate 0.0071   Epoch: 14   Global Step: 182360   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:32:24,213-Speed 3318.11 samples/sec   Loss 2.1505   LearningRate 0.0071   Epoch: 14   Global Step: 182370   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:32:27,358-Speed 3256.69 samples/sec   Loss 2.2207   LearningRate 0.0071   Epoch: 14   Global Step: 182380   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:32:30,495-Speed 3265.37 samples/sec   Loss 2.1875   LearningRate 0.0071   Epoch: 14   Global Step: 182390   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:32:33,551-Speed 3352.50 samples/sec   Loss 2.2281   LearningRate 0.0071   Epoch: 14   Global Step: 182400   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:32:36,685-Speed 3268.74 samples/sec   Loss 2.1771   LearningRate 0.0071   Epoch: 14   Global Step: 182410   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:32:39,757-Speed 3334.69 samples/sec   Loss 2.1854   LearningRate 0.0071   Epoch: 14   Global Step: 182420   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:32:42,942-Speed 3216.20 samples/sec   Loss 2.1816   LearningRate 0.0071   Epoch: 14   Global Step: 182430   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:32:46,027-Speed 3320.33 samples/sec   Loss 2.2089   LearningRate 0.0071   Epoch: 14   Global Step: 182440   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:32:49,171-Speed 3257.70 samples/sec   Loss 2.1885   LearningRate 0.0071   Epoch: 14   Global Step: 182450   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:32:52,253-Speed 3323.37 samples/sec   Loss 2.1397   LearningRate 0.0070   Epoch: 14   Global Step: 182460   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:32:55,337-Speed 3321.79 samples/sec   Loss 2.1833   LearningRate 0.0070   Epoch: 14   Global Step: 182470   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:32:58,424-Speed 3317.95 samples/sec   Loss 2.2077   LearningRate 0.0070   Epoch: 14   Global Step: 182480   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:33:01,589-Speed 3236.67 samples/sec   Loss 2.2045   LearningRate 0.0070   Epoch: 14   Global Step: 182490   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:33:04,711-Speed 3280.43 samples/sec   Loss 2.1841   LearningRate 0.0070   Epoch: 14   Global Step: 182500   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:33:07,812-Speed 3303.70 samples/sec   Loss 2.1512   LearningRate 0.0070   Epoch: 14   Global Step: 182510   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:33:10,892-Speed 3325.30 samples/sec   Loss 2.2619   LearningRate 0.0070   Epoch: 14   Global Step: 182520   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:33:14,004-Speed 3291.07 samples/sec   Loss 2.1597   LearningRate 0.0070   Epoch: 14   Global Step: 182530   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:33:17,088-Speed 3322.38 samples/sec   Loss 2.2296   LearningRate 0.0070   Epoch: 14   Global Step: 182540   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:33:20,197-Speed 3294.52 samples/sec   Loss 2.2626   LearningRate 0.0070   Epoch: 14   Global Step: 182550   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:33:23,261-Speed 3343.28 samples/sec   Loss 2.1833   LearningRate 0.0070   Epoch: 14   Global Step: 182560   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:33:26,356-Speed 3309.39 samples/sec   Loss 2.1731   LearningRate 0.0070   Epoch: 14   Global Step: 182570   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:33:29,418-Speed 3345.69 samples/sec   Loss 2.2198   LearningRate 0.0070   Epoch: 14   Global Step: 182580   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:33:32,528-Speed 3293.29 samples/sec   Loss 2.1762   LearningRate 0.0070   Epoch: 14   Global Step: 182590   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:33:35,608-Speed 3325.70 samples/sec   Loss 2.2233   LearningRate 0.0070   Epoch: 14   Global Step: 182600   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:33:38,771-Speed 3238.74 samples/sec   Loss 2.2430   LearningRate 0.0070   Epoch: 14   Global Step: 182610   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:33:41,870-Speed 3304.78 samples/sec   Loss 2.1677   LearningRate 0.0070   Epoch: 14   Global Step: 182620   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:33:44,953-Speed 3322.37 samples/sec   Loss 2.1782   LearningRate 0.0070   Epoch: 14   Global Step: 182630   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:33:48,087-Speed 3268.82 samples/sec   Loss 2.2018   LearningRate 0.0070   Epoch: 14   Global Step: 182640   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:33:51,174-Speed 3318.47 samples/sec   Loss 2.2018   LearningRate 0.0070   Epoch: 14   Global Step: 182650   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:33:54,290-Speed 3286.62 samples/sec   Loss 2.1766   LearningRate 0.0070   Epoch: 14   Global Step: 182660   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:33:57,359-Speed 3337.83 samples/sec   Loss 2.1822   LearningRate 0.0070   Epoch: 14   Global Step: 182670   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:34:00,484-Speed 3278.39 samples/sec   Loss 2.1693   LearningRate 0.0070   Epoch: 14   Global Step: 182680   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:34:03,576-Speed 3312.95 samples/sec   Loss 2.2575   LearningRate 0.0070   Epoch: 14   Global Step: 182690   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:34:06,627-Speed 3357.23 samples/sec   Loss 2.1605   LearningRate 0.0070   Epoch: 14   Global Step: 182700   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:34:09,688-Speed 3345.78 samples/sec   Loss 2.1482   LearningRate 0.0070   Epoch: 14   Global Step: 182710   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:34:12,855-Speed 3234.93 samples/sec   Loss 2.1775   LearningRate 0.0070   Epoch: 14   Global Step: 182720   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:34:15,978-Speed 3280.01 samples/sec   Loss 2.2026   LearningRate 0.0070   Epoch: 14   Global Step: 182730   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:34:19,098-Speed 3282.43 samples/sec   Loss 2.2400   LearningRate 0.0070   Epoch: 14   Global Step: 182740   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:34:22,154-Speed 3352.07 samples/sec   Loss 2.2356   LearningRate 0.0070   Epoch: 14   Global Step: 182750   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:34:25,224-Speed 3336.75 samples/sec   Loss 2.1826   LearningRate 0.0070   Epoch: 14   Global Step: 182760   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:34:28,275-Speed 3357.89 samples/sec   Loss 2.2005   LearningRate 0.0070   Epoch: 14   Global Step: 182770   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:34:31,340-Speed 3341.45 samples/sec   Loss 2.1698   LearningRate 0.0070   Epoch: 14   Global Step: 182780   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:34:34,466-Speed 3276.60 samples/sec   Loss 2.2755   LearningRate 0.0070   Epoch: 14   Global Step: 182790   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:34:37,657-Speed 3209.92 samples/sec   Loss 2.1776   LearningRate 0.0070   Epoch: 14   Global Step: 182800   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:34:40,797-Speed 3262.93 samples/sec   Loss 2.1616   LearningRate 0.0070   Epoch: 14   Global Step: 182810   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:34:43,953-Speed 3245.17 samples/sec   Loss 2.1926   LearningRate 0.0070   Epoch: 14   Global Step: 182820   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:34:47,037-Speed 3321.17 samples/sec   Loss 2.2168   LearningRate 0.0070   Epoch: 14   Global Step: 182830   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:34:50,186-Speed 3252.98 samples/sec   Loss 2.1849   LearningRate 0.0070   Epoch: 14   Global Step: 182840   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:34:53,357-Speed 3230.39 samples/sec   Loss 2.2269   LearningRate 0.0070   Epoch: 14   Global Step: 182850   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:34:56,480-Speed 3279.75 samples/sec   Loss 2.1642   LearningRate 0.0070   Epoch: 14   Global Step: 182860   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:34:59,645-Speed 3236.71 samples/sec   Loss 2.2505   LearningRate 0.0070   Epoch: 14   Global Step: 182870   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:35:02,817-Speed 3229.11 samples/sec   Loss 2.1320   LearningRate 0.0070   Epoch: 14   Global Step: 182880   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:35:05,958-Speed 3261.25 samples/sec   Loss 2.1209   LearningRate 0.0070   Epoch: 14   Global Step: 182890   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:35:09,035-Speed 3329.35 samples/sec   Loss 2.1628   LearningRate 0.0070   Epoch: 14   Global Step: 182900   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:35:12,250-Speed 3185.50 samples/sec   Loss 2.2062   LearningRate 0.0070   Epoch: 14   Global Step: 182910   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:35:15,402-Speed 3250.36 samples/sec   Loss 2.2184   LearningRate 0.0070   Epoch: 14   Global Step: 182920   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:35:18,488-Speed 3318.80 samples/sec   Loss 2.2152   LearningRate 0.0069   Epoch: 14   Global Step: 182930   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:35:21,557-Speed 3338.25 samples/sec   Loss 2.1926   LearningRate 0.0069   Epoch: 14   Global Step: 182940   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:35:24,630-Speed 3332.85 samples/sec   Loss 2.1082   LearningRate 0.0069   Epoch: 14   Global Step: 182950   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:35:27,727-Speed 3306.71 samples/sec   Loss 2.2087   LearningRate 0.0069   Epoch: 14   Global Step: 182960   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:35:30,938-Speed 3191.00 samples/sec   Loss 2.2095   LearningRate 0.0069   Epoch: 14   Global Step: 182970   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:35:34,041-Speed 3300.27 samples/sec   Loss 2.1617   LearningRate 0.0069   Epoch: 14   Global Step: 182980   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:35:37,249-Speed 3193.56 samples/sec   Loss 2.2042   LearningRate 0.0069   Epoch: 14   Global Step: 182990   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:35:40,346-Speed 3307.58 samples/sec   Loss 2.2372   LearningRate 0.0069   Epoch: 14   Global Step: 183000   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:35:43,438-Speed 3312.38 samples/sec   Loss 2.1454   LearningRate 0.0069   Epoch: 14   Global Step: 183010   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:35:46,531-Speed 3312.06 samples/sec   Loss 2.2539   LearningRate 0.0069   Epoch: 14   Global Step: 183020   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:35:49,629-Speed 3306.74 samples/sec   Loss 2.1682   LearningRate 0.0069   Epoch: 14   Global Step: 183030   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:35:52,759-Speed 3271.86 samples/sec   Loss 2.1720   LearningRate 0.0069   Epoch: 14   Global Step: 183040   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:35:55,856-Speed 3308.04 samples/sec   Loss 2.1532   LearningRate 0.0069   Epoch: 14   Global Step: 183050   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:35:58,957-Speed 3302.99 samples/sec   Loss 2.1873   LearningRate 0.0069   Epoch: 14   Global Step: 183060   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:36:02,113-Speed 3245.95 samples/sec   Loss 2.1828   LearningRate 0.0069   Epoch: 14   Global Step: 183070   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:36:05,290-Speed 3223.05 samples/sec   Loss 2.2936   LearningRate 0.0069   Epoch: 14   Global Step: 183080   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:36:08,432-Speed 3260.40 samples/sec   Loss 2.1921   LearningRate 0.0069   Epoch: 14   Global Step: 183090   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:36:11,552-Speed 3283.14 samples/sec   Loss 2.2515   LearningRate 0.0069   Epoch: 14   Global Step: 183100   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:36:14,683-Speed 3271.35 samples/sec   Loss 2.1204   LearningRate 0.0069   Epoch: 14   Global Step: 183110   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:36:17,837-Speed 3248.17 samples/sec   Loss 2.1613   LearningRate 0.0069   Epoch: 14   Global Step: 183120   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:36:20,925-Speed 3317.93 samples/sec   Loss 2.1691   LearningRate 0.0069   Epoch: 14   Global Step: 183130   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:36:24,095-Speed 3230.82 samples/sec   Loss 2.2427   LearningRate 0.0069   Epoch: 14   Global Step: 183140   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:36:27,291-Speed 3204.74 samples/sec   Loss 2.1670   LearningRate 0.0069   Epoch: 14   Global Step: 183150   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:36:30,512-Speed 3180.82 samples/sec   Loss 2.1789   LearningRate 0.0069   Epoch: 14   Global Step: 183160   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:36:33,630-Speed 3285.54 samples/sec   Loss 2.1833   LearningRate 0.0069   Epoch: 14   Global Step: 183170   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:36:36,733-Speed 3300.42 samples/sec   Loss 2.1915   LearningRate 0.0069   Epoch: 14   Global Step: 183180   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:36:39,941-Speed 3192.90 samples/sec   Loss 2.1776   LearningRate 0.0069   Epoch: 14   Global Step: 183190   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:36:43,070-Speed 3273.57 samples/sec   Loss 2.1863   LearningRate 0.0069   Epoch: 14   Global Step: 183200   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:36:46,147-Speed 3329.41 samples/sec   Loss 2.2118   LearningRate 0.0069   Epoch: 14   Global Step: 183210   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:36:49,278-Speed 3271.53 samples/sec   Loss 2.1697   LearningRate 0.0069   Epoch: 14   Global Step: 183220   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:36:52,372-Speed 3310.65 samples/sec   Loss 2.1651   LearningRate 0.0069   Epoch: 14   Global Step: 183230   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:36:55,506-Speed 3268.61 samples/sec   Loss 2.2408   LearningRate 0.0069   Epoch: 14   Global Step: 183240   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:36:58,624-Speed 3285.72 samples/sec   Loss 2.1765   LearningRate 0.0069   Epoch: 14   Global Step: 183250   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:37:01,693-Speed 3337.19 samples/sec   Loss 2.2542   LearningRate 0.0069   Epoch: 14   Global Step: 183260   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:37:04,875-Speed 3219.60 samples/sec   Loss 2.1900   LearningRate 0.0069   Epoch: 14   Global Step: 183270   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:37:08,014-Speed 3262.44 samples/sec   Loss 2.2548   LearningRate 0.0069   Epoch: 14   Global Step: 183280   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:37:11,081-Speed 3340.09 samples/sec   Loss 2.2512   LearningRate 0.0069   Epoch: 14   Global Step: 183290   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:37:14,178-Speed 3307.87 samples/sec   Loss 2.1569   LearningRate 0.0069   Epoch: 14   Global Step: 183300   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:37:17,300-Speed 3281.48 samples/sec   Loss 2.2019   LearningRate 0.0069   Epoch: 14   Global Step: 183310   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:37:20,367-Speed 3338.67 samples/sec   Loss 2.1376   LearningRate 0.0069   Epoch: 14   Global Step: 183320   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:37:23,513-Speed 3256.36 samples/sec   Loss 2.2308   LearningRate 0.0069   Epoch: 14   Global Step: 183330   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:37:26,604-Speed 3314.45 samples/sec   Loss 2.1990   LearningRate 0.0069   Epoch: 14   Global Step: 183340   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:37:29,792-Speed 3212.82 samples/sec   Loss 2.1848   LearningRate 0.0069   Epoch: 14   Global Step: 183350   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:37:32,880-Speed 3316.52 samples/sec   Loss 2.1972   LearningRate 0.0069   Epoch: 14   Global Step: 183360   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:37:36,061-Speed 3221.00 samples/sec   Loss 2.2579   LearningRate 0.0069   Epoch: 14   Global Step: 183370   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:37:39,226-Speed 3235.40 samples/sec   Loss 2.1933   LearningRate 0.0069   Epoch: 14   Global Step: 183380   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:37:42,350-Speed 3279.70 samples/sec   Loss 2.1959   LearningRate 0.0069   Epoch: 14   Global Step: 183390   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:37:45,427-Speed 3329.84 samples/sec   Loss 2.1739   LearningRate 0.0069   Epoch: 14   Global Step: 183400   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:37:48,502-Speed 3330.49 samples/sec   Loss 2.1714   LearningRate 0.0068   Epoch: 14   Global Step: 183410   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:37:51,612-Speed 3294.27 samples/sec   Loss 2.1704   LearningRate 0.0068   Epoch: 14   Global Step: 183420   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:37:54,716-Speed 3300.11 samples/sec   Loss 2.2204   LearningRate 0.0068   Epoch: 14   Global Step: 183430   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:37:57,802-Speed 3319.10 samples/sec   Loss 2.1836   LearningRate 0.0068   Epoch: 14   Global Step: 183440   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:38:00,917-Speed 3288.21 samples/sec   Loss 2.1257   LearningRate 0.0068   Epoch: 14   Global Step: 183450   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:38:04,080-Speed 3238.53 samples/sec   Loss 2.1810   LearningRate 0.0068   Epoch: 14   Global Step: 183460   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:38:07,240-Speed 3241.33 samples/sec   Loss 2.1676   LearningRate 0.0068   Epoch: 14   Global Step: 183470   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:38:10,296-Speed 3352.42 samples/sec   Loss 2.1659   LearningRate 0.0068   Epoch: 14   Global Step: 183480   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-04-27 17:38:13,410-Speed 3289.45 samples/sec   Loss 2.2079   LearningRate 0.0068   Epoch: 14   Global Step: 183490   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:38:16,544-Speed 3268.63 samples/sec   Loss 2.2187   LearningRate 0.0068   Epoch: 14   Global Step: 183500   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:38:19,636-Speed 3313.33 samples/sec   Loss 2.2072   LearningRate 0.0068   Epoch: 14   Global Step: 183510   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:38:22,714-Speed 3327.97 samples/sec   Loss 2.2014   LearningRate 0.0068   Epoch: 14   Global Step: 183520   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:38:25,831-Speed 3285.82 samples/sec   Loss 2.1764   LearningRate 0.0068   Epoch: 14   Global Step: 183530   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:38:28,976-Speed 3257.56 samples/sec   Loss 2.1743   LearningRate 0.0068   Epoch: 14   Global Step: 183540   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:38:32,100-Speed 3278.58 samples/sec   Loss 2.2376   LearningRate 0.0068   Epoch: 14   Global Step: 183550   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:38:35,191-Speed 3312.94 samples/sec   Loss 2.1811   LearningRate 0.0068   Epoch: 14   Global Step: 183560   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:38:38,352-Speed 3240.80 samples/sec   Loss 2.1890   LearningRate 0.0068   Epoch: 14   Global Step: 183570   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:38:41,442-Speed 3315.20 samples/sec   Loss 2.2275   LearningRate 0.0068   Epoch: 14   Global Step: 183580   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:38:44,491-Speed 3360.04 samples/sec   Loss 2.2218   LearningRate 0.0068   Epoch: 14   Global Step: 183590   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:38:47,559-Speed 3338.74 samples/sec   Loss 2.2068   LearningRate 0.0068   Epoch: 14   Global Step: 183600   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:38:50,655-Speed 3308.38 samples/sec   Loss 2.2431   LearningRate 0.0068   Epoch: 14   Global Step: 183610   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:38:53,756-Speed 3303.17 samples/sec   Loss 2.2152   LearningRate 0.0068   Epoch: 14   Global Step: 183620   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:38:56,806-Speed 3358.75 samples/sec   Loss 2.1505   LearningRate 0.0068   Epoch: 14   Global Step: 183630   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:38:59,865-Speed 3348.24 samples/sec   Loss 2.1801   LearningRate 0.0068   Epoch: 14   Global Step: 183640   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:39:03,012-Speed 3255.29 samples/sec   Loss 2.2338   LearningRate 0.0068   Epoch: 14   Global Step: 183650   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:39:06,095-Speed 3321.43 samples/sec   Loss 2.2411   LearningRate 0.0068   Epoch: 14   Global Step: 183660   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:39:09,235-Speed 3262.77 samples/sec   Loss 2.1903   LearningRate 0.0068   Epoch: 14   Global Step: 183670   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:39:12,383-Speed 3254.17 samples/sec   Loss 2.1662   LearningRate 0.0068   Epoch: 14   Global Step: 183680   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:39:15,495-Speed 3291.81 samples/sec   Loss 2.1815   LearningRate 0.0068   Epoch: 14   Global Step: 183690   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:39:18,650-Speed 3246.10 samples/sec   Loss 2.2147   LearningRate 0.0068   Epoch: 14   Global Step: 183700   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:39:21,723-Speed 3333.51 samples/sec   Loss 2.1738   LearningRate 0.0068   Epoch: 14   Global Step: 183710   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:39:24,876-Speed 3248.64 samples/sec   Loss 2.2099   LearningRate 0.0068   Epoch: 14   Global Step: 183720   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:39:27,997-Speed 3282.59 samples/sec   Loss 2.3130   LearningRate 0.0068   Epoch: 14   Global Step: 183730   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:39:31,147-Speed 3251.22 samples/sec   Loss 2.2278   LearningRate 0.0068   Epoch: 14   Global Step: 183740   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:39:34,205-Speed 3349.73 samples/sec   Loss 2.2006   LearningRate 0.0068   Epoch: 14   Global Step: 183750   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:39:37,300-Speed 3309.25 samples/sec   Loss 2.1399   LearningRate 0.0068   Epoch: 14   Global Step: 183760   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:39:40,369-Speed 3338.15 samples/sec   Loss 2.1570   LearningRate 0.0068   Epoch: 14   Global Step: 183770   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:39:43,467-Speed 3306.40 samples/sec   Loss 2.2756   LearningRate 0.0068   Epoch: 14   Global Step: 183780   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:39:46,552-Speed 3320.29 samples/sec   Loss 2.1968   LearningRate 0.0068   Epoch: 14   Global Step: 183790   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:39:49,686-Speed 3268.06 samples/sec   Loss 2.1431   LearningRate 0.0068   Epoch: 14   Global Step: 183800   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:39:52,804-Speed 3285.78 samples/sec   Loss 2.1473   LearningRate 0.0068   Epoch: 14   Global Step: 183810   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:39:55,850-Speed 3363.03 samples/sec   Loss 2.2037   LearningRate 0.0068   Epoch: 14   Global Step: 183820   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:39:59,008-Speed 3243.34 samples/sec   Loss 2.2102   LearningRate 0.0068   Epoch: 14   Global Step: 183830   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:40:02,170-Speed 3239.97 samples/sec   Loss 2.0925   LearningRate 0.0068   Epoch: 14   Global Step: 183840   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:40:05,309-Speed 3262.67 samples/sec   Loss 2.1755   LearningRate 0.0068   Epoch: 14   Global Step: 183850   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:40:08,426-Speed 3287.28 samples/sec   Loss 2.2661   LearningRate 0.0068   Epoch: 14   Global Step: 183860   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:40:11,505-Speed 3326.90 samples/sec   Loss 2.2265   LearningRate 0.0068   Epoch: 14   Global Step: 183870   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:40:14,610-Speed 3298.12 samples/sec   Loss 2.1706   LearningRate 0.0067   Epoch: 14   Global Step: 183880   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:40:17,701-Speed 3314.09 samples/sec   Loss 2.1714   LearningRate 0.0067   Epoch: 14   Global Step: 183890   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:40:20,789-Speed 3317.08 samples/sec   Loss 2.1255   LearningRate 0.0067   Epoch: 14   Global Step: 183900   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:40:23,940-Speed 3250.39 samples/sec   Loss 2.2136   LearningRate 0.0067   Epoch: 14   Global Step: 183910   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:40:27,032-Speed 3313.35 samples/sec   Loss 2.2368   LearningRate 0.0067   Epoch: 14   Global Step: 183920   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:40:30,259-Speed 3174.38 samples/sec   Loss 2.2147   LearningRate 0.0067   Epoch: 14   Global Step: 183930   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:40:33,416-Speed 3244.22 samples/sec   Loss 2.1978   LearningRate 0.0067   Epoch: 14   Global Step: 183940   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:40:36,485-Speed 3338.23 samples/sec   Loss 2.1329   LearningRate 0.0067   Epoch: 14   Global Step: 183950   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:40:39,686-Speed 3199.91 samples/sec   Loss 2.1336   LearningRate 0.0067   Epoch: 14   Global Step: 183960   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:40:42,800-Speed 3289.46 samples/sec   Loss 2.1458   LearningRate 0.0067   Epoch: 14   Global Step: 183970   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:40:45,853-Speed 3355.25 samples/sec   Loss 2.1641   LearningRate 0.0067   Epoch: 14   Global Step: 183980   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:40:48,997-Speed 3258.07 samples/sec   Loss 2.2215   LearningRate 0.0067   Epoch: 14   Global Step: 183990   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:40:52,127-Speed 3272.12 samples/sec   Loss 2.1989   LearningRate 0.0067   Epoch: 14   Global Step: 184000   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:40:55,221-Speed 3310.38 samples/sec   Loss 2.1814   LearningRate 0.0067   Epoch: 14   Global Step: 184010   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:40:58,328-Speed 3297.90 samples/sec   Loss 2.2027   LearningRate 0.0067   Epoch: 14   Global Step: 184020   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:41:01,427-Speed 3305.33 samples/sec   Loss 2.1697   LearningRate 0.0067   Epoch: 14   Global Step: 184030   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:41:04,588-Speed 3240.09 samples/sec   Loss 2.2410   LearningRate 0.0067   Epoch: 14   Global Step: 184040   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:41:07,780-Speed 3209.05 samples/sec   Loss 2.2163   LearningRate 0.0067   Epoch: 14   Global Step: 184050   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:41:10,851-Speed 3335.27 samples/sec   Loss 2.2101   LearningRate 0.0067   Epoch: 14   Global Step: 184060   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:41:13,955-Speed 3300.56 samples/sec   Loss 2.1089   LearningRate 0.0067   Epoch: 14   Global Step: 184070   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:41:17,120-Speed 3236.48 samples/sec   Loss 2.1278   LearningRate 0.0067   Epoch: 14   Global Step: 184080   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:41:20,253-Speed 3269.68 samples/sec   Loss 2.1230   LearningRate 0.0067   Epoch: 14   Global Step: 184090   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:41:23,430-Speed 3223.98 samples/sec   Loss 2.2110   LearningRate 0.0067   Epoch: 14   Global Step: 184100   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:41:26,566-Speed 3267.06 samples/sec   Loss 2.2083   LearningRate 0.0067   Epoch: 14   Global Step: 184110   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:41:29,654-Speed 3315.96 samples/sec   Loss 2.1905   LearningRate 0.0067   Epoch: 14   Global Step: 184120   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:41:32,728-Speed 3333.35 samples/sec   Loss 2.2101   LearningRate 0.0067   Epoch: 14   Global Step: 184130   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:41:35,788-Speed 3347.24 samples/sec   Loss 2.2009   LearningRate 0.0067   Epoch: 14   Global Step: 184140   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:41:38,851-Speed 3344.01 samples/sec   Loss 2.2401   LearningRate 0.0067   Epoch: 14   Global Step: 184150   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:41:41,967-Speed 3287.35 samples/sec   Loss 2.1406   LearningRate 0.0067   Epoch: 14   Global Step: 184160   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:41:45,056-Speed 3315.91 samples/sec   Loss 2.2344   LearningRate 0.0067   Epoch: 14   Global Step: 184170   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:41:48,174-Speed 3284.64 samples/sec   Loss 2.2474   LearningRate 0.0067   Epoch: 14   Global Step: 184180   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:41:51,307-Speed 3269.96 samples/sec   Loss 2.1945   LearningRate 0.0067   Epoch: 14   Global Step: 184190   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:41:54,518-Speed 3190.22 samples/sec   Loss 2.1929   LearningRate 0.0067   Epoch: 14   Global Step: 184200   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:41:57,618-Speed 3303.94 samples/sec   Loss 2.2879   LearningRate 0.0067   Epoch: 14   Global Step: 184210   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:42:00,686-Speed 3339.34 samples/sec   Loss 2.2190   LearningRate 0.0067   Epoch: 14   Global Step: 184220   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:42:03,799-Speed 3290.69 samples/sec   Loss 2.2081   LearningRate 0.0067   Epoch: 14   Global Step: 184230   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:42:06,961-Speed 3238.93 samples/sec   Loss 2.1751   LearningRate 0.0067   Epoch: 14   Global Step: 184240   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:42:10,062-Speed 3302.97 samples/sec   Loss 2.2182   LearningRate 0.0067   Epoch: 14   Global Step: 184250   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:42:13,190-Speed 3274.83 samples/sec   Loss 2.2356   LearningRate 0.0067   Epoch: 14   Global Step: 184260   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:42:16,327-Speed 3265.24 samples/sec   Loss 2.1613   LearningRate 0.0067   Epoch: 14   Global Step: 184270   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:42:19,399-Speed 3334.93 samples/sec   Loss 2.2046   LearningRate 0.0067   Epoch: 14   Global Step: 184280   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:42:22,514-Speed 3288.26 samples/sec   Loss 2.1782   LearningRate 0.0067   Epoch: 14   Global Step: 184290   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:42:25,573-Speed 3349.21 samples/sec   Loss 2.1443   LearningRate 0.0067   Epoch: 14   Global Step: 184300   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:42:28,707-Speed 3267.62 samples/sec   Loss 2.2006   LearningRate 0.0067   Epoch: 14   Global Step: 184310   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:42:31,777-Speed 3336.41 samples/sec   Loss 2.2106   LearningRate 0.0067   Epoch: 14   Global Step: 184320   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:42:34,850-Speed 3334.29 samples/sec   Loss 2.1232   LearningRate 0.0067   Epoch: 14   Global Step: 184330   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:42:37,966-Speed 3286.76 samples/sec   Loss 2.1865   LearningRate 0.0067   Epoch: 14   Global Step: 184340   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:42:41,045-Speed 3327.08 samples/sec   Loss 2.1212   LearningRate 0.0067   Epoch: 14   Global Step: 184350   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:42:44,162-Speed 3286.18 samples/sec   Loss 2.1826   LearningRate 0.0066   Epoch: 14   Global Step: 184360   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:42:47,262-Speed 3304.43 samples/sec   Loss 2.2371   LearningRate 0.0066   Epoch: 14   Global Step: 184370   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:42:50,335-Speed 3333.47 samples/sec   Loss 2.1041   LearningRate 0.0066   Epoch: 14   Global Step: 184380   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-04-27 17:42:53,431-Speed 3308.86 samples/sec   Loss 2.1855   LearningRate 0.0066   Epoch: 14   Global Step: 184390   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:42:56,538-Speed 3296.18 samples/sec   Loss 2.1572   LearningRate 0.0066   Epoch: 14   Global Step: 184400   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:42:59,638-Speed 3304.01 samples/sec   Loss 2.2423   LearningRate 0.0066   Epoch: 14   Global Step: 184410   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:43:02,798-Speed 3242.27 samples/sec   Loss 2.2575   LearningRate 0.0066   Epoch: 14   Global Step: 184420   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:43:05,864-Speed 3340.54 samples/sec   Loss 2.1988   LearningRate 0.0066   Epoch: 14   Global Step: 184430   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:43:08,955-Speed 3314.32 samples/sec   Loss 2.2569   LearningRate 0.0066   Epoch: 14   Global Step: 184440   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:43:12,021-Speed 3340.48 samples/sec   Loss 2.2067   LearningRate 0.0066   Epoch: 14   Global Step: 184450   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:43:15,132-Speed 3292.84 samples/sec   Loss 2.1794   LearningRate 0.0066   Epoch: 14   Global Step: 184460   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:43:18,261-Speed 3274.09 samples/sec   Loss 2.1796   LearningRate 0.0066   Epoch: 14   Global Step: 184470   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:43:21,321-Speed 3347.59 samples/sec   Loss 2.1280   LearningRate 0.0066   Epoch: 14   Global Step: 184480   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:43:24,493-Speed 3229.45 samples/sec   Loss 2.0993   LearningRate 0.0066   Epoch: 14   Global Step: 184490   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:43:27,633-Speed 3261.31 samples/sec   Loss 2.1918   LearningRate 0.0066   Epoch: 14   Global Step: 184500   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:43:30,717-Speed 3321.88 samples/sec   Loss 2.2432   LearningRate 0.0066   Epoch: 14   Global Step: 184510   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:43:33,808-Speed 3314.15 samples/sec   Loss 2.1403   LearningRate 0.0066   Epoch: 14   Global Step: 184520   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:43:36,902-Speed 3309.94 samples/sec   Loss 2.2023   LearningRate 0.0066   Epoch: 14   Global Step: 184530   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:43:40,043-Speed 3261.57 samples/sec   Loss 2.1916   LearningRate 0.0066   Epoch: 14   Global Step: 184540   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:43:43,178-Speed 3267.23 samples/sec   Loss 2.2499   LearningRate 0.0066   Epoch: 14   Global Step: 184550   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:43:46,253-Speed 3331.66 samples/sec   Loss 2.1578   LearningRate 0.0066   Epoch: 14   Global Step: 184560   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:43:49,316-Speed 3343.37 samples/sec   Loss 2.1588   LearningRate 0.0066   Epoch: 14   Global Step: 184570   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:43:52,379-Speed 3343.96 samples/sec   Loss 2.1858   LearningRate 0.0066   Epoch: 14   Global Step: 184580   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:43:55,481-Speed 3302.94 samples/sec   Loss 2.2565   LearningRate 0.0066   Epoch: 14   Global Step: 184590   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:43:58,591-Speed 3293.35 samples/sec   Loss 2.2261   LearningRate 0.0066   Epoch: 14   Global Step: 184600   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:44:01,793-Speed 3199.67 samples/sec   Loss 2.1316   LearningRate 0.0066   Epoch: 14   Global Step: 184610   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:44:04,915-Speed 3280.28 samples/sec   Loss 2.1483   LearningRate 0.0066   Epoch: 14   Global Step: 184620   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:44:07,980-Speed 3342.16 samples/sec   Loss 2.1546   LearningRate 0.0066   Epoch: 14   Global Step: 184630   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:44:11,061-Speed 3325.36 samples/sec   Loss 2.1893   LearningRate 0.0066   Epoch: 14   Global Step: 184640   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:44:14,193-Speed 3269.90 samples/sec   Loss 2.2594   LearningRate 0.0066   Epoch: 14   Global Step: 184650   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:44:17,356-Speed 3238.84 samples/sec   Loss 2.1572   LearningRate 0.0066   Epoch: 14   Global Step: 184660   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:44:20,405-Speed 3359.88 samples/sec   Loss 2.1628   LearningRate 0.0066   Epoch: 14   Global Step: 184670   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:44:23,540-Speed 3266.89 samples/sec   Loss 2.1562   LearningRate 0.0066   Epoch: 14   Global Step: 184680   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:44:26,706-Speed 3235.84 samples/sec   Loss 2.1633   LearningRate 0.0066   Epoch: 14   Global Step: 184690   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:44:29,827-Speed 3282.18 samples/sec   Loss 2.2720   LearningRate 0.0066   Epoch: 14   Global Step: 184700   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:44:32,903-Speed 3329.63 samples/sec   Loss 2.2004   LearningRate 0.0066   Epoch: 14   Global Step: 184710   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:44:36,076-Speed 3229.03 samples/sec   Loss 2.2355   LearningRate 0.0066   Epoch: 14   Global Step: 184720   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:44:39,193-Speed 3285.44 samples/sec   Loss 2.2174   LearningRate 0.0066   Epoch: 14   Global Step: 184730   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:44:42,304-Speed 3293.12 samples/sec   Loss 2.2172   LearningRate 0.0066   Epoch: 14   Global Step: 184740   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:44:45,413-Speed 3295.27 samples/sec   Loss 2.1920   LearningRate 0.0066   Epoch: 14   Global Step: 184750   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-04-27 17:44:48,551-Speed 3263.74 samples/sec   Loss 2.1646   LearningRate 0.0066   Epoch: 14   Global Step: 184760   Fp16 Grad Scale: 8192   Required: 6 hours
Training: 2022-04-27 17:44:51,772-Speed 3180.58 samples/sec   Loss 2.1200   LearningRate 0.0066   Epoch: 14   Global Step: 184770   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:44:54,916-Speed 3257.21 samples/sec   Loss 2.1336   LearningRate 0.0066   Epoch: 14   Global Step: 184780   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:44:58,025-Speed 3295.34 samples/sec   Loss 2.2120   LearningRate 0.0066   Epoch: 14   Global Step: 184790   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:45:01,198-Speed 3228.12 samples/sec   Loss 2.2383   LearningRate 0.0066   Epoch: 14   Global Step: 184800   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:45:04,457-Speed 3143.40 samples/sec   Loss 2.2196   LearningRate 0.0066   Epoch: 14   Global Step: 184810   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:45:07,584-Speed 3275.44 samples/sec   Loss 2.2102   LearningRate 0.0066   Epoch: 14   Global Step: 184820   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:45:10,655-Speed 3335.97 samples/sec   Loss 2.2254   LearningRate 0.0066   Epoch: 14   Global Step: 184830   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:45:13,853-Speed 3202.11 samples/sec   Loss 2.1558   LearningRate 0.0066   Epoch: 14   Global Step: 184840   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:45:16,976-Speed 3280.31 samples/sec   Loss 2.1988   LearningRate 0.0065   Epoch: 14   Global Step: 184850   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:45:20,074-Speed 3306.90 samples/sec   Loss 2.2007   LearningRate 0.0065   Epoch: 14   Global Step: 184860   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:45:23,198-Speed 3278.29 samples/sec   Loss 2.2205   LearningRate 0.0065   Epoch: 14   Global Step: 184870   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:45:26,298-Speed 3304.86 samples/sec   Loss 2.1839   LearningRate 0.0065   Epoch: 14   Global Step: 184880   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:45:29,442-Speed 3257.52 samples/sec   Loss 2.2253   LearningRate 0.0065   Epoch: 14   Global Step: 184890   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:45:32,567-Speed 3277.97 samples/sec   Loss 2.0979   LearningRate 0.0065   Epoch: 14   Global Step: 184900   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:45:35,716-Speed 3252.55 samples/sec   Loss 2.2167   LearningRate 0.0065   Epoch: 14   Global Step: 184910   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:45:38,877-Speed 3240.77 samples/sec   Loss 2.2330   LearningRate 0.0065   Epoch: 14   Global Step: 184920   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:45:42,001-Speed 3278.94 samples/sec   Loss 2.1889   LearningRate 0.0065   Epoch: 14   Global Step: 184930   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:45:45,085-Speed 3322.51 samples/sec   Loss 2.1203   LearningRate 0.0065   Epoch: 14   Global Step: 184940   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:45:48,262-Speed 3223.88 samples/sec   Loss 2.2709   LearningRate 0.0065   Epoch: 14   Global Step: 184950   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:45:51,349-Speed 3318.57 samples/sec   Loss 2.1781   LearningRate 0.0065   Epoch: 14   Global Step: 184960   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:45:54,583-Speed 3167.46 samples/sec   Loss 2.1708   LearningRate 0.0065   Epoch: 14   Global Step: 184970   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:45:57,672-Speed 3316.08 samples/sec   Loss 2.2215   LearningRate 0.0065   Epoch: 14   Global Step: 184980   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:46:00,762-Speed 3314.29 samples/sec   Loss 2.1737   LearningRate 0.0065   Epoch: 14   Global Step: 184990   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:46:03,929-Speed 3234.70 samples/sec   Loss 2.1801   LearningRate 0.0065   Epoch: 14   Global Step: 185000   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:46:07,079-Speed 3251.71 samples/sec   Loss 2.2161   LearningRate 0.0065   Epoch: 14   Global Step: 185010   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:46:10,188-Speed 3294.36 samples/sec   Loss 2.1136   LearningRate 0.0065   Epoch: 14   Global Step: 185020   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:46:13,288-Speed 3304.10 samples/sec   Loss 2.2640   LearningRate 0.0065   Epoch: 14   Global Step: 185030   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:46:16,387-Speed 3305.58 samples/sec   Loss 2.1882   LearningRate 0.0065   Epoch: 14   Global Step: 185040   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:46:19,534-Speed 3255.35 samples/sec   Loss 2.2305   LearningRate 0.0065   Epoch: 14   Global Step: 185050   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:46:22,635-Speed 3303.20 samples/sec   Loss 2.2727   LearningRate 0.0065   Epoch: 14   Global Step: 185060   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 17:46:25,750-Speed 3288.32 samples/sec   Loss 2.1891   LearningRate 0.0065   Epoch: 14   Global Step: 185070   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:46:28,875-Speed 3277.83 samples/sec   Loss 2.2166   LearningRate 0.0065   Epoch: 14   Global Step: 185080   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:46:31,980-Speed 3299.06 samples/sec   Loss 2.2112   LearningRate 0.0065   Epoch: 14   Global Step: 185090   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:46:35,125-Speed 3256.58 samples/sec   Loss 2.1633   LearningRate 0.0065   Epoch: 14   Global Step: 185100   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:46:38,221-Speed 3309.03 samples/sec   Loss 2.1539   LearningRate 0.0065   Epoch: 14   Global Step: 185110   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:46:41,286-Speed 3341.87 samples/sec   Loss 2.1245   LearningRate 0.0065   Epoch: 14   Global Step: 185120   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:46:44,357-Speed 3335.97 samples/sec   Loss 2.2252   LearningRate 0.0065   Epoch: 14   Global Step: 185130   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:46:47,470-Speed 3290.44 samples/sec   Loss 2.1422   LearningRate 0.0065   Epoch: 14   Global Step: 185140   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:46:50,582-Speed 3291.21 samples/sec   Loss 2.2553   LearningRate 0.0065   Epoch: 14   Global Step: 185150   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:46:53,696-Speed 3289.14 samples/sec   Loss 2.1636   LearningRate 0.0065   Epoch: 14   Global Step: 185160   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:46:56,801-Speed 3299.23 samples/sec   Loss 2.2177   LearningRate 0.0065   Epoch: 14   Global Step: 185170   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:46:59,886-Speed 3319.97 samples/sec   Loss 2.2178   LearningRate 0.0065   Epoch: 14   Global Step: 185180   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:47:02,985-Speed 3305.14 samples/sec   Loss 2.1869   LearningRate 0.0065   Epoch: 14   Global Step: 185190   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:47:06,138-Speed 3249.04 samples/sec   Loss 2.1656   LearningRate 0.0065   Epoch: 14   Global Step: 185200   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:47:09,272-Speed 3269.01 samples/sec   Loss 2.2386   LearningRate 0.0065   Epoch: 14   Global Step: 185210   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:47:12,373-Speed 3302.80 samples/sec   Loss 2.2486   LearningRate 0.0065   Epoch: 14   Global Step: 185220   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:47:15,521-Speed 3254.06 samples/sec   Loss 2.1888   LearningRate 0.0065   Epoch: 14   Global Step: 185230   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:47:18,633-Speed 3291.40 samples/sec   Loss 2.2256   LearningRate 0.0065   Epoch: 14   Global Step: 185240   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:47:21,744-Speed 3292.06 samples/sec   Loss 2.1532   LearningRate 0.0065   Epoch: 14   Global Step: 185250   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:47:24,919-Speed 3226.91 samples/sec   Loss 2.2358   LearningRate 0.0065   Epoch: 14   Global Step: 185260   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:47:28,102-Speed 3217.76 samples/sec   Loss 2.1786   LearningRate 0.0065   Epoch: 14   Global Step: 185270   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:47:31,211-Speed 3294.86 samples/sec   Loss 2.1626   LearningRate 0.0065   Epoch: 14   Global Step: 185280   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:47:34,318-Speed 3297.21 samples/sec   Loss 2.1130   LearningRate 0.0065   Epoch: 14   Global Step: 185290   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:47:37,437-Speed 3284.55 samples/sec   Loss 2.1628   LearningRate 0.0065   Epoch: 14   Global Step: 185300   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:47:40,595-Speed 3242.92 samples/sec   Loss 2.1643   LearningRate 0.0065   Epoch: 14   Global Step: 185310   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:47:43,739-Speed 3258.37 samples/sec   Loss 2.2185   LearningRate 0.0065   Epoch: 14   Global Step: 185320   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:47:46,881-Speed 3259.46 samples/sec   Loss 2.2075   LearningRate 0.0064   Epoch: 14   Global Step: 185330   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:47:50,074-Speed 3207.98 samples/sec   Loss 2.2482   LearningRate 0.0064   Epoch: 14   Global Step: 185340   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:47:53,200-Speed 3276.62 samples/sec   Loss 2.1796   LearningRate 0.0064   Epoch: 14   Global Step: 185350   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:47:56,272-Speed 3334.25 samples/sec   Loss 2.1561   LearningRate 0.0064   Epoch: 14   Global Step: 185360   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:47:59,358-Speed 3319.54 samples/sec   Loss 2.1187   LearningRate 0.0064   Epoch: 14   Global Step: 185370   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:48:02,535-Speed 3224.47 samples/sec   Loss 2.1887   LearningRate 0.0064   Epoch: 14   Global Step: 185380   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:48:05,639-Speed 3299.25 samples/sec   Loss 2.1308   LearningRate 0.0064   Epoch: 14   Global Step: 185390   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:48:08,720-Speed 3325.70 samples/sec   Loss 2.1920   LearningRate 0.0064   Epoch: 14   Global Step: 185400   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:48:11,872-Speed 3249.78 samples/sec   Loss 2.2095   LearningRate 0.0064   Epoch: 14   Global Step: 185410   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:48:15,016-Speed 3257.76 samples/sec   Loss 2.1611   LearningRate 0.0064   Epoch: 14   Global Step: 185420   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:48:18,114-Speed 3306.44 samples/sec   Loss 2.1912   LearningRate 0.0064   Epoch: 14   Global Step: 185430   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:48:21,223-Speed 3294.40 samples/sec   Loss 2.2271   LearningRate 0.0064   Epoch: 14   Global Step: 185440   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:48:24,315-Speed 3313.09 samples/sec   Loss 2.1742   LearningRate 0.0064   Epoch: 14   Global Step: 185450   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:48:27,517-Speed 3198.77 samples/sec   Loss 2.2334   LearningRate 0.0064   Epoch: 14   Global Step: 185460   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:48:30,618-Speed 3302.88 samples/sec   Loss 2.1903   LearningRate 0.0064   Epoch: 14   Global Step: 185470   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:48:33,720-Speed 3302.52 samples/sec   Loss 2.1711   LearningRate 0.0064   Epoch: 14   Global Step: 185480   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:48:36,866-Speed 3255.32 samples/sec   Loss 2.2429   LearningRate 0.0064   Epoch: 14   Global Step: 185490   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:48:40,028-Speed 3239.94 samples/sec   Loss 2.2274   LearningRate 0.0064   Epoch: 14   Global Step: 185500   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:48:43,147-Speed 3284.14 samples/sec   Loss 2.1729   LearningRate 0.0064   Epoch: 14   Global Step: 185510   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:48:46,225-Speed 3328.17 samples/sec   Loss 2.1592   LearningRate 0.0064   Epoch: 14   Global Step: 185520   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:48:49,332-Speed 3297.34 samples/sec   Loss 2.2006   LearningRate 0.0064   Epoch: 14   Global Step: 185530   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:48:52,508-Speed 3224.57 samples/sec   Loss 2.1994   LearningRate 0.0064   Epoch: 14   Global Step: 185540   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:48:55,633-Speed 3278.69 samples/sec   Loss 2.1687   LearningRate 0.0064   Epoch: 14   Global Step: 185550   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:48:58,746-Speed 3290.57 samples/sec   Loss 2.1959   LearningRate 0.0064   Epoch: 14   Global Step: 185560   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:49:01,952-Speed 3194.64 samples/sec   Loss 2.2033   LearningRate 0.0064   Epoch: 14   Global Step: 185570   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:49:05,025-Speed 3332.78 samples/sec   Loss 2.1817   LearningRate 0.0064   Epoch: 14   Global Step: 185580   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:49:08,108-Speed 3323.22 samples/sec   Loss 2.1337   LearningRate 0.0064   Epoch: 14   Global Step: 185590   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:49:11,215-Speed 3295.94 samples/sec   Loss 2.1870   LearningRate 0.0064   Epoch: 14   Global Step: 185600   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:49:14,356-Speed 3261.30 samples/sec   Loss 2.1995   LearningRate 0.0064   Epoch: 14   Global Step: 185610   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:49:17,518-Speed 3240.13 samples/sec   Loss 2.1976   LearningRate 0.0064   Epoch: 14   Global Step: 185620   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:49:20,629-Speed 3292.46 samples/sec   Loss 2.2436   LearningRate 0.0064   Epoch: 14   Global Step: 185630   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:49:23,724-Speed 3309.36 samples/sec   Loss 2.1085   LearningRate 0.0064   Epoch: 14   Global Step: 185640   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:49:26,825-Speed 3303.70 samples/sec   Loss 2.2468   LearningRate 0.0064   Epoch: 14   Global Step: 185650   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:49:29,930-Speed 3299.01 samples/sec   Loss 2.1970   LearningRate 0.0064   Epoch: 14   Global Step: 185660   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:49:33,024-Speed 3309.95 samples/sec   Loss 2.1489   LearningRate 0.0064   Epoch: 14   Global Step: 185670   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:49:36,153-Speed 3273.94 samples/sec   Loss 2.2308   LearningRate 0.0064   Epoch: 14   Global Step: 185680   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:49:39,290-Speed 3265.21 samples/sec   Loss 2.1804   LearningRate 0.0064   Epoch: 14   Global Step: 185690   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:49:42,429-Speed 3263.29 samples/sec   Loss 2.1063   LearningRate 0.0064   Epoch: 14   Global Step: 185700   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:49:45,535-Speed 3298.40 samples/sec   Loss 2.2377   LearningRate 0.0064   Epoch: 14   Global Step: 185710   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:49:48,654-Speed 3283.62 samples/sec   Loss 2.2515   LearningRate 0.0064   Epoch: 14   Global Step: 185720   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:49:51,811-Speed 3245.21 samples/sec   Loss 2.1789   LearningRate 0.0064   Epoch: 14   Global Step: 185730   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 17:49:54,905-Speed 3310.05 samples/sec   Loss 2.1582   LearningRate 0.0064   Epoch: 14   Global Step: 185740   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:49:58,011-Speed 3298.00 samples/sec   Loss 2.1754   LearningRate 0.0064   Epoch: 14   Global Step: 185750   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:50:01,140-Speed 3273.65 samples/sec   Loss 2.1951   LearningRate 0.0064   Epoch: 14   Global Step: 185760   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:50:04,267-Speed 3275.70 samples/sec   Loss 2.1945   LearningRate 0.0064   Epoch: 14   Global Step: 185770   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:50:07,414-Speed 3255.35 samples/sec   Loss 2.1865   LearningRate 0.0064   Epoch: 14   Global Step: 185780   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:50:10,519-Speed 3298.46 samples/sec   Loss 2.1312   LearningRate 0.0064   Epoch: 14   Global Step: 185790   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:50:13,639-Speed 3283.65 samples/sec   Loss 2.1518   LearningRate 0.0064   Epoch: 14   Global Step: 185800   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:50:16,744-Speed 3299.14 samples/sec   Loss 2.1673   LearningRate 0.0064   Epoch: 14   Global Step: 185810   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:50:19,825-Speed 3324.19 samples/sec   Loss 2.1979   LearningRate 0.0064   Epoch: 14   Global Step: 185820   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:50:22,905-Speed 3326.34 samples/sec   Loss 2.2280   LearningRate 0.0063   Epoch: 14   Global Step: 185830   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:50:26,000-Speed 3308.90 samples/sec   Loss 2.2249   LearningRate 0.0063   Epoch: 14   Global Step: 185840   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:50:29,070-Speed 3336.40 samples/sec   Loss 2.2040   LearningRate 0.0063   Epoch: 14   Global Step: 185850   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:50:32,125-Speed 3353.87 samples/sec   Loss 2.2078   LearningRate 0.0063   Epoch: 14   Global Step: 185860   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:50:35,292-Speed 3233.78 samples/sec   Loss 2.2484   LearningRate 0.0063   Epoch: 14   Global Step: 185870   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:50:38,364-Speed 3334.28 samples/sec   Loss 2.1114   LearningRate 0.0063   Epoch: 14   Global Step: 185880   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:50:41,524-Speed 3241.03 samples/sec   Loss 2.1920   LearningRate 0.0063   Epoch: 14   Global Step: 185890   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:50:44,608-Speed 3322.54 samples/sec   Loss 2.2266   LearningRate 0.0063   Epoch: 14   Global Step: 185900   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:50:47,705-Speed 3307.20 samples/sec   Loss 2.1834   LearningRate 0.0063   Epoch: 14   Global Step: 185910   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:50:50,888-Speed 3217.57 samples/sec   Loss 2.1585   LearningRate 0.0063   Epoch: 14   Global Step: 185920   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:50:54,015-Speed 3275.76 samples/sec   Loss 2.0996   LearningRate 0.0063   Epoch: 14   Global Step: 185930   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:50:57,085-Speed 3336.35 samples/sec   Loss 2.1578   LearningRate 0.0063   Epoch: 14   Global Step: 185940   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:51:00,212-Speed 3276.48 samples/sec   Loss 2.1821   LearningRate 0.0063   Epoch: 14   Global Step: 185950   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:51:03,398-Speed 3215.09 samples/sec   Loss 2.1986   LearningRate 0.0063   Epoch: 14   Global Step: 185960   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:51:06,551-Speed 3248.44 samples/sec   Loss 2.1194   LearningRate 0.0063   Epoch: 14   Global Step: 185970   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:51:09,652-Speed 3302.84 samples/sec   Loss 2.1945   LearningRate 0.0063   Epoch: 14   Global Step: 185980   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:51:12,780-Speed 3274.66 samples/sec   Loss 2.1902   LearningRate 0.0063   Epoch: 14   Global Step: 185990   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:51:15,909-Speed 3273.55 samples/sec   Loss 2.1773   LearningRate 0.0063   Epoch: 14   Global Step: 186000   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:51:19,006-Speed 3308.26 samples/sec   Loss 2.2523   LearningRate 0.0063   Epoch: 14   Global Step: 186010   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:51:22,104-Speed 3305.57 samples/sec   Loss 2.2124   LearningRate 0.0063   Epoch: 14   Global Step: 186020   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:51:25,186-Speed 3323.66 samples/sec   Loss 2.2204   LearningRate 0.0063   Epoch: 14   Global Step: 186030   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:51:28,293-Speed 3296.82 samples/sec   Loss 2.1442   LearningRate 0.0063   Epoch: 14   Global Step: 186040   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:51:31,374-Speed 3324.86 samples/sec   Loss 2.1860   LearningRate 0.0063   Epoch: 14   Global Step: 186050   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:51:34,467-Speed 3312.19 samples/sec   Loss 2.1941   LearningRate 0.0063   Epoch: 14   Global Step: 186060   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:51:37,559-Speed 3311.98 samples/sec   Loss 2.1733   LearningRate 0.0063   Epoch: 14   Global Step: 186070   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:51:40,704-Speed 3256.88 samples/sec   Loss 2.2061   LearningRate 0.0063   Epoch: 14   Global Step: 186080   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:51:43,794-Speed 3315.72 samples/sec   Loss 2.1367   LearningRate 0.0063   Epoch: 14   Global Step: 186090   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:51:46,883-Speed 3315.43 samples/sec   Loss 2.2436   LearningRate 0.0063   Epoch: 14   Global Step: 186100   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:51:49,969-Speed 3319.62 samples/sec   Loss 2.1613   LearningRate 0.0063   Epoch: 14   Global Step: 186110   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:51:53,080-Speed 3292.91 samples/sec   Loss 2.1114   LearningRate 0.0063   Epoch: 14   Global Step: 186120   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:51:56,196-Speed 3287.30 samples/sec   Loss 2.2174   LearningRate 0.0063   Epoch: 14   Global Step: 186130   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:51:59,268-Speed 3334.59 samples/sec   Loss 2.1952   LearningRate 0.0063   Epoch: 14   Global Step: 186140   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:52:02,366-Speed 3306.39 samples/sec   Loss 2.1803   LearningRate 0.0063   Epoch: 14   Global Step: 186150   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:52:05,478-Speed 3291.36 samples/sec   Loss 2.1802   LearningRate 0.0063   Epoch: 14   Global Step: 186160   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:52:08,581-Speed 3301.40 samples/sec   Loss 2.2070   LearningRate 0.0063   Epoch: 14   Global Step: 186170   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:52:11,695-Speed 3289.26 samples/sec   Loss 2.1357   LearningRate 0.0063   Epoch: 14   Global Step: 186180   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:52:14,836-Speed 3260.39 samples/sec   Loss 2.1425   LearningRate 0.0063   Epoch: 14   Global Step: 186190   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:52:17,975-Speed 3263.21 samples/sec   Loss 2.2017   LearningRate 0.0063   Epoch: 14   Global Step: 186200   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:52:21,042-Speed 3340.52 samples/sec   Loss 2.1978   LearningRate 0.0063   Epoch: 14   Global Step: 186210   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:52:24,215-Speed 3227.73 samples/sec   Loss 2.1833   LearningRate 0.0063   Epoch: 14   Global Step: 186220   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:52:27,335-Speed 3283.52 samples/sec   Loss 2.1921   LearningRate 0.0063   Epoch: 14   Global Step: 186230   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:52:30,525-Speed 3210.31 samples/sec   Loss 2.1905   LearningRate 0.0063   Epoch: 14   Global Step: 186240   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:52:33,611-Speed 3320.22 samples/sec   Loss 2.1380   LearningRate 0.0063   Epoch: 14   Global Step: 186250   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:52:36,741-Speed 3272.42 samples/sec   Loss 2.1577   LearningRate 0.0063   Epoch: 14   Global Step: 186260   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:52:39,871-Speed 3272.01 samples/sec   Loss 2.1961   LearningRate 0.0063   Epoch: 14   Global Step: 186270   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:52:42,981-Speed 3293.66 samples/sec   Loss 2.0792   LearningRate 0.0063   Epoch: 14   Global Step: 186280   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:52:46,042-Speed 3346.45 samples/sec   Loss 2.2293   LearningRate 0.0063   Epoch: 14   Global Step: 186290   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:52:49,108-Speed 3340.80 samples/sec   Loss 2.1764   LearningRate 0.0063   Epoch: 14   Global Step: 186300   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:52:52,478-Speed 3039.75 samples/sec   Loss 2.1778   LearningRate 0.0063   Epoch: 14   Global Step: 186310   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:53:25,320-Speed 311.81 samples/sec   Loss 1.9082   LearningRate 0.0062   Epoch: 15   Global Step: 186320   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:53:28,725-Speed 3008.47 samples/sec   Loss 1.5385   LearningRate 0.0062   Epoch: 15   Global Step: 186330   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:53:31,912-Speed 3213.73 samples/sec   Loss 1.4778   LearningRate 0.0062   Epoch: 15   Global Step: 186340   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:53:34,997-Speed 3320.55 samples/sec   Loss 1.5979   LearningRate 0.0062   Epoch: 15   Global Step: 186350   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:53:38,182-Speed 3216.18 samples/sec   Loss 1.5778   LearningRate 0.0062   Epoch: 15   Global Step: 186360   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:53:41,257-Speed 3330.99 samples/sec   Loss 1.5635   LearningRate 0.0062   Epoch: 15   Global Step: 186370   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:53:44,381-Speed 3279.64 samples/sec   Loss 1.5261   LearningRate 0.0062   Epoch: 15   Global Step: 186380   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:53:47,598-Speed 3183.88 samples/sec   Loss 1.5827   LearningRate 0.0062   Epoch: 15   Global Step: 186390   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:53:50,680-Speed 3324.16 samples/sec   Loss 1.4979   LearningRate 0.0062   Epoch: 15   Global Step: 186400   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:53:53,787-Speed 3296.97 samples/sec   Loss 1.5849   LearningRate 0.0062   Epoch: 15   Global Step: 186410   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:53:56,869-Speed 3323.07 samples/sec   Loss 1.5419   LearningRate 0.0062   Epoch: 15   Global Step: 186420   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:54:00,096-Speed 3174.62 samples/sec   Loss 1.5539   LearningRate 0.0062   Epoch: 15   Global Step: 186430   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:54:03,327-Speed 3169.68 samples/sec   Loss 1.5540   LearningRate 0.0062   Epoch: 15   Global Step: 186440   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:54:06,631-Speed 3100.44 samples/sec   Loss 1.5229   LearningRate 0.0062   Epoch: 15   Global Step: 186450   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:54:09,926-Speed 3108.52 samples/sec   Loss 1.6084   LearningRate 0.0062   Epoch: 15   Global Step: 186460   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:54:13,019-Speed 3311.72 samples/sec   Loss 1.5457   LearningRate 0.0062   Epoch: 15   Global Step: 186470   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:54:16,124-Speed 3299.09 samples/sec   Loss 1.5452   LearningRate 0.0062   Epoch: 15   Global Step: 186480   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:54:19,219-Speed 3309.79 samples/sec   Loss 1.5404   LearningRate 0.0062   Epoch: 15   Global Step: 186490   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:54:22,319-Speed 3304.37 samples/sec   Loss 1.5666   LearningRate 0.0062   Epoch: 15   Global Step: 186500   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:54:25,444-Speed 3278.25 samples/sec   Loss 1.5374   LearningRate 0.0062   Epoch: 15   Global Step: 186510   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:54:28,573-Speed 3273.19 samples/sec   Loss 1.5853   LearningRate 0.0062   Epoch: 15   Global Step: 186520   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:54:31,664-Speed 3313.88 samples/sec   Loss 1.5919   LearningRate 0.0062   Epoch: 15   Global Step: 186530   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:54:34,740-Speed 3329.70 samples/sec   Loss 1.5658   LearningRate 0.0062   Epoch: 15   Global Step: 186540   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:54:37,844-Speed 3301.11 samples/sec   Loss 1.5225   LearningRate 0.0062   Epoch: 15   Global Step: 186550   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:54:40,983-Speed 3262.36 samples/sec   Loss 1.5348   LearningRate 0.0062   Epoch: 15   Global Step: 186560   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:54:44,041-Speed 3349.64 samples/sec   Loss 1.5657   LearningRate 0.0062   Epoch: 15   Global Step: 186570   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:54:47,129-Speed 3317.78 samples/sec   Loss 1.5525   LearningRate 0.0062   Epoch: 15   Global Step: 186580   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:54:50,263-Speed 3267.77 samples/sec   Loss 1.5634   LearningRate 0.0062   Epoch: 15   Global Step: 186590   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:54:53,371-Speed 3295.79 samples/sec   Loss 1.5164   LearningRate 0.0062   Epoch: 15   Global Step: 186600   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:54:56,462-Speed 3314.54 samples/sec   Loss 1.6248   LearningRate 0.0062   Epoch: 15   Global Step: 186610   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:54:59,566-Speed 3299.46 samples/sec   Loss 1.5790   LearningRate 0.0062   Epoch: 15   Global Step: 186620   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:55:02,757-Speed 3210.51 samples/sec   Loss 1.5937   LearningRate 0.0062   Epoch: 15   Global Step: 186630   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:55:05,860-Speed 3301.40 samples/sec   Loss 1.5372   LearningRate 0.0062   Epoch: 15   Global Step: 186640   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:55:08,916-Speed 3351.81 samples/sec   Loss 1.5665   LearningRate 0.0062   Epoch: 15   Global Step: 186650   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:55:12,034-Speed 3285.16 samples/sec   Loss 1.6260   LearningRate 0.0062   Epoch: 15   Global Step: 186660   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:55:15,202-Speed 3232.91 samples/sec   Loss 1.6371   LearningRate 0.0062   Epoch: 15   Global Step: 186670   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:55:18,324-Speed 3281.25 samples/sec   Loss 1.5620   LearningRate 0.0062   Epoch: 15   Global Step: 186680   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:55:21,378-Speed 3353.79 samples/sec   Loss 1.5552   LearningRate 0.0062   Epoch: 15   Global Step: 186690   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:55:24,550-Speed 3229.14 samples/sec   Loss 1.5984   LearningRate 0.0062   Epoch: 15   Global Step: 186700   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:55:27,685-Speed 3267.31 samples/sec   Loss 1.6126   LearningRate 0.0062   Epoch: 15   Global Step: 186710   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:55:30,823-Speed 3265.11 samples/sec   Loss 1.6182   LearningRate 0.0062   Epoch: 15   Global Step: 186720   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:55:34,655-Speed 2672.44 samples/sec   Loss 1.5964   LearningRate 0.0062   Epoch: 15   Global Step: 186730   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:55:37,778-Speed 3280.09 samples/sec   Loss 1.5992   LearningRate 0.0062   Epoch: 15   Global Step: 186740   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:55:40,899-Speed 3281.71 samples/sec   Loss 1.5561   LearningRate 0.0062   Epoch: 15   Global Step: 186750   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:55:44,026-Speed 3276.34 samples/sec   Loss 1.5877   LearningRate 0.0062   Epoch: 15   Global Step: 186760   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:55:47,120-Speed 3309.81 samples/sec   Loss 1.5524   LearningRate 0.0062   Epoch: 15   Global Step: 186770   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:55:50,227-Speed 3297.50 samples/sec   Loss 1.5722   LearningRate 0.0062   Epoch: 15   Global Step: 186780   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:55:53,321-Speed 3310.42 samples/sec   Loss 1.5437   LearningRate 0.0062   Epoch: 15   Global Step: 186790   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:55:56,402-Speed 3324.92 samples/sec   Loss 1.5902   LearningRate 0.0062   Epoch: 15   Global Step: 186800   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:55:59,527-Speed 3276.93 samples/sec   Loss 1.6010   LearningRate 0.0062   Epoch: 15   Global Step: 186810   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:56:02,671-Speed 3258.81 samples/sec   Loss 1.6005   LearningRate 0.0061   Epoch: 15   Global Step: 186820   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:56:05,781-Speed 3293.83 samples/sec   Loss 1.5801   LearningRate 0.0061   Epoch: 15   Global Step: 186830   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:56:08,867-Speed 3318.95 samples/sec   Loss 1.5726   LearningRate 0.0061   Epoch: 15   Global Step: 186840   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:56:11,949-Speed 3323.63 samples/sec   Loss 1.5685   LearningRate 0.0061   Epoch: 15   Global Step: 186850   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:56:15,064-Speed 3288.49 samples/sec   Loss 1.5703   LearningRate 0.0061   Epoch: 15   Global Step: 186860   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:56:18,229-Speed 3235.84 samples/sec   Loss 1.5348   LearningRate 0.0061   Epoch: 15   Global Step: 186870   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:56:21,330-Speed 3302.98 samples/sec   Loss 1.5793   LearningRate 0.0061   Epoch: 15   Global Step: 186880   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:56:24,454-Speed 3278.92 samples/sec   Loss 1.5858   LearningRate 0.0061   Epoch: 15   Global Step: 186890   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:56:27,549-Speed 3309.43 samples/sec   Loss 1.6092   LearningRate 0.0061   Epoch: 15   Global Step: 186900   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:56:30,652-Speed 3301.55 samples/sec   Loss 1.5831   LearningRate 0.0061   Epoch: 15   Global Step: 186910   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:56:33,763-Speed 3292.57 samples/sec   Loss 1.6123   LearningRate 0.0061   Epoch: 15   Global Step: 186920   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:56:36,831-Speed 3338.30 samples/sec   Loss 1.6561   LearningRate 0.0061   Epoch: 15   Global Step: 186930   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:56:39,931-Speed 3304.75 samples/sec   Loss 1.5582   LearningRate 0.0061   Epoch: 15   Global Step: 186940   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:56:43,051-Speed 3282.74 samples/sec   Loss 1.5736   LearningRate 0.0061   Epoch: 15   Global Step: 186950   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:56:46,108-Speed 3351.30 samples/sec   Loss 1.5846   LearningRate 0.0061   Epoch: 15   Global Step: 186960   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:56:49,233-Speed 3277.25 samples/sec   Loss 1.5797   LearningRate 0.0061   Epoch: 15   Global Step: 186970   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:56:52,401-Speed 3233.79 samples/sec   Loss 1.5504   LearningRate 0.0061   Epoch: 15   Global Step: 186980   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:56:55,546-Speed 3256.10 samples/sec   Loss 1.5815   LearningRate 0.0061   Epoch: 15   Global Step: 186990   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:56:58,626-Speed 3325.49 samples/sec   Loss 1.6027   LearningRate 0.0061   Epoch: 15   Global Step: 187000   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:57:01,731-Speed 3299.37 samples/sec   Loss 1.5886   LearningRate 0.0061   Epoch: 15   Global Step: 187010   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:57:04,842-Speed 3292.29 samples/sec   Loss 1.5496   LearningRate 0.0061   Epoch: 15   Global Step: 187020   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:57:07,956-Speed 3289.39 samples/sec   Loss 1.5791   LearningRate 0.0061   Epoch: 15   Global Step: 187030   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:57:11,080-Speed 3278.37 samples/sec   Loss 1.6075   LearningRate 0.0061   Epoch: 15   Global Step: 187040   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:57:14,199-Speed 3285.50 samples/sec   Loss 1.6063   LearningRate 0.0061   Epoch: 15   Global Step: 187050   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:57:17,358-Speed 3242.57 samples/sec   Loss 1.5664   LearningRate 0.0061   Epoch: 15   Global Step: 187060   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:57:20,482-Speed 3278.07 samples/sec   Loss 1.6166   LearningRate 0.0061   Epoch: 15   Global Step: 187070   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:57:23,595-Speed 3290.61 samples/sec   Loss 1.6414   LearningRate 0.0061   Epoch: 15   Global Step: 187080   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:57:26,713-Speed 3285.66 samples/sec   Loss 1.6316   LearningRate 0.0061   Epoch: 15   Global Step: 187090   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:57:29,827-Speed 3288.53 samples/sec   Loss 1.6202   LearningRate 0.0061   Epoch: 15   Global Step: 187100   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:57:32,900-Speed 3334.01 samples/sec   Loss 1.6102   LearningRate 0.0061   Epoch: 15   Global Step: 187110   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:57:36,071-Speed 3230.46 samples/sec   Loss 1.5894   LearningRate 0.0061   Epoch: 15   Global Step: 187120   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:57:39,146-Speed 3330.18 samples/sec   Loss 1.5692   LearningRate 0.0061   Epoch: 15   Global Step: 187130   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:57:42,258-Speed 3291.49 samples/sec   Loss 1.5919   LearningRate 0.0061   Epoch: 15   Global Step: 187140   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:57:45,321-Speed 3344.71 samples/sec   Loss 1.5928   LearningRate 0.0061   Epoch: 15   Global Step: 187150   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:57:48,413-Speed 3313.02 samples/sec   Loss 1.5758   LearningRate 0.0061   Epoch: 15   Global Step: 187160   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 17:57:51,506-Speed 3311.83 samples/sec   Loss 1.6463   LearningRate 0.0061   Epoch: 15   Global Step: 187170   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:57:54,652-Speed 3255.67 samples/sec   Loss 1.6057   LearningRate 0.0061   Epoch: 15   Global Step: 187180   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:57:57,717-Speed 3341.70 samples/sec   Loss 1.6303   LearningRate 0.0061   Epoch: 15   Global Step: 187190   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:58:00,815-Speed 3306.50 samples/sec   Loss 1.5624   LearningRate 0.0061   Epoch: 15   Global Step: 187200   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:58:03,973-Speed 3244.34 samples/sec   Loss 1.5583   LearningRate 0.0061   Epoch: 15   Global Step: 187210   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:58:07,095-Speed 3280.78 samples/sec   Loss 1.6271   LearningRate 0.0061   Epoch: 15   Global Step: 187220   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:58:10,173-Speed 3327.78 samples/sec   Loss 1.5713   LearningRate 0.0061   Epoch: 15   Global Step: 187230   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:58:13,331-Speed 3244.26 samples/sec   Loss 1.5481   LearningRate 0.0061   Epoch: 15   Global Step: 187240   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:58:16,417-Speed 3319.46 samples/sec   Loss 1.6224   LearningRate 0.0061   Epoch: 15   Global Step: 187250   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:58:19,511-Speed 3310.35 samples/sec   Loss 1.6651   LearningRate 0.0061   Epoch: 15   Global Step: 187260   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:58:22,569-Speed 3349.43 samples/sec   Loss 1.6018   LearningRate 0.0061   Epoch: 15   Global Step: 187270   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:58:25,645-Speed 3330.34 samples/sec   Loss 1.6244   LearningRate 0.0061   Epoch: 15   Global Step: 187280   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:58:28,709-Speed 3342.70 samples/sec   Loss 1.6286   LearningRate 0.0061   Epoch: 15   Global Step: 187290   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:58:31,817-Speed 3296.35 samples/sec   Loss 1.5836   LearningRate 0.0061   Epoch: 15   Global Step: 187300   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:58:34,876-Speed 3348.43 samples/sec   Loss 1.5879   LearningRate 0.0061   Epoch: 15   Global Step: 187310   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:58:38,055-Speed 3221.90 samples/sec   Loss 1.5690   LearningRate 0.0060   Epoch: 15   Global Step: 187320   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:58:41,144-Speed 3315.83 samples/sec   Loss 1.5925   LearningRate 0.0060   Epoch: 15   Global Step: 187330   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:58:44,250-Speed 3298.99 samples/sec   Loss 1.5994   LearningRate 0.0060   Epoch: 15   Global Step: 187340   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:58:47,410-Speed 3241.61 samples/sec   Loss 1.6047   LearningRate 0.0060   Epoch: 15   Global Step: 187350   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:58:50,508-Speed 3305.77 samples/sec   Loss 1.6312   LearningRate 0.0060   Epoch: 15   Global Step: 187360   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 17:58:53,610-Speed 3301.99 samples/sec   Loss 1.5944   LearningRate 0.0060   Epoch: 15   Global Step: 187370   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:58:56,686-Speed 3330.17 samples/sec   Loss 1.5989   LearningRate 0.0060   Epoch: 15   Global Step: 187380   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:58:59,860-Speed 3227.15 samples/sec   Loss 1.6471   LearningRate 0.0060   Epoch: 15   Global Step: 187390   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:59:02,979-Speed 3284.44 samples/sec   Loss 1.5727   LearningRate 0.0060   Epoch: 15   Global Step: 187400   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:59:06,090-Speed 3292.87 samples/sec   Loss 1.5885   LearningRate 0.0060   Epoch: 15   Global Step: 187410   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:59:09,179-Speed 3316.11 samples/sec   Loss 1.6374   LearningRate 0.0060   Epoch: 15   Global Step: 187420   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:59:12,237-Speed 3350.85 samples/sec   Loss 1.6593   LearningRate 0.0060   Epoch: 15   Global Step: 187430   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:59:15,315-Speed 3327.78 samples/sec   Loss 1.6258   LearningRate 0.0060   Epoch: 15   Global Step: 187440   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:59:18,428-Speed 3291.00 samples/sec   Loss 1.6035   LearningRate 0.0060   Epoch: 15   Global Step: 187450   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:59:21,537-Speed 3294.49 samples/sec   Loss 1.6341   LearningRate 0.0060   Epoch: 15   Global Step: 187460   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:59:24,637-Speed 3304.00 samples/sec   Loss 1.5836   LearningRate 0.0060   Epoch: 15   Global Step: 187470   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:59:27,722-Speed 3320.31 samples/sec   Loss 1.6423   LearningRate 0.0060   Epoch: 15   Global Step: 187480   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:59:30,854-Speed 3270.22 samples/sec   Loss 1.6144   LearningRate 0.0060   Epoch: 15   Global Step: 187490   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:59:33,951-Speed 3308.04 samples/sec   Loss 1.6418   LearningRate 0.0060   Epoch: 15   Global Step: 187500   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:59:37,113-Speed 3240.09 samples/sec   Loss 1.5749   LearningRate 0.0060   Epoch: 15   Global Step: 187510   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:59:40,250-Speed 3264.58 samples/sec   Loss 1.5832   LearningRate 0.0060   Epoch: 15   Global Step: 187520   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:59:43,335-Speed 3320.54 samples/sec   Loss 1.6501   LearningRate 0.0060   Epoch: 15   Global Step: 187530   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:59:46,402-Speed 3340.47 samples/sec   Loss 1.5697   LearningRate 0.0060   Epoch: 15   Global Step: 187540   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 17:59:49,468-Speed 3340.65 samples/sec   Loss 1.6430   LearningRate 0.0060   Epoch: 15   Global Step: 187550   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:59:52,549-Speed 3324.34 samples/sec   Loss 1.5844   LearningRate 0.0060   Epoch: 15   Global Step: 187560   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:59:55,632-Speed 3323.13 samples/sec   Loss 1.6107   LearningRate 0.0060   Epoch: 15   Global Step: 187570   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 17:59:58,687-Speed 3352.81 samples/sec   Loss 1.6502   LearningRate 0.0060   Epoch: 15   Global Step: 187580   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:00:01,812-Speed 3277.03 samples/sec   Loss 1.6216   LearningRate 0.0060   Epoch: 15   Global Step: 187590   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:00:04,872-Speed 3347.44 samples/sec   Loss 1.5963   LearningRate 0.0060   Epoch: 15   Global Step: 187600   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:00:07,971-Speed 3305.97 samples/sec   Loss 1.6244   LearningRate 0.0060   Epoch: 15   Global Step: 187610   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:00:11,059-Speed 3316.92 samples/sec   Loss 1.6103   LearningRate 0.0060   Epoch: 15   Global Step: 187620   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:00:14,200-Speed 3261.84 samples/sec   Loss 1.6029   LearningRate 0.0060   Epoch: 15   Global Step: 187630   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:00:17,324-Speed 3278.40 samples/sec   Loss 1.5705   LearningRate 0.0060   Epoch: 15   Global Step: 187640   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:00:20,421-Speed 3307.42 samples/sec   Loss 1.6038   LearningRate 0.0060   Epoch: 15   Global Step: 187650   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:00:23,501-Speed 3326.49 samples/sec   Loss 1.5838   LearningRate 0.0060   Epoch: 15   Global Step: 187660   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:00:26,651-Speed 3251.07 samples/sec   Loss 1.5707   LearningRate 0.0060   Epoch: 15   Global Step: 187670   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:00:29,766-Speed 3288.33 samples/sec   Loss 1.6544   LearningRate 0.0060   Epoch: 15   Global Step: 187680   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:00:32,834-Speed 3339.11 samples/sec   Loss 1.6122   LearningRate 0.0060   Epoch: 15   Global Step: 187690   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:00:35,952-Speed 3285.08 samples/sec   Loss 1.6381   LearningRate 0.0060   Epoch: 15   Global Step: 187700   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:00:39,089-Speed 3265.66 samples/sec   Loss 1.6391   LearningRate 0.0060   Epoch: 15   Global Step: 187710   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:00:42,170-Speed 3324.15 samples/sec   Loss 1.6065   LearningRate 0.0060   Epoch: 15   Global Step: 187720   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:00:45,274-Speed 3300.02 samples/sec   Loss 1.6245   LearningRate 0.0060   Epoch: 15   Global Step: 187730   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:00:48,420-Speed 3255.62 samples/sec   Loss 1.6475   LearningRate 0.0060   Epoch: 15   Global Step: 187740   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:00:51,506-Speed 3319.06 samples/sec   Loss 1.6187   LearningRate 0.0060   Epoch: 15   Global Step: 187750   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:00:54,563-Speed 3350.96 samples/sec   Loss 1.6161   LearningRate 0.0060   Epoch: 15   Global Step: 187760   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:00:57,633-Speed 3340.68 samples/sec   Loss 1.5622   LearningRate 0.0060   Epoch: 15   Global Step: 187770   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:01:00,736-Speed 3300.72 samples/sec   Loss 1.6169   LearningRate 0.0060   Epoch: 15   Global Step: 187780   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:01:03,823-Speed 3317.93 samples/sec   Loss 1.6057   LearningRate 0.0060   Epoch: 15   Global Step: 187790   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:01:06,943-Speed 3282.80 samples/sec   Loss 1.6805   LearningRate 0.0060   Epoch: 15   Global Step: 187800   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:01:09,991-Speed 3360.88 samples/sec   Loss 1.6578   LearningRate 0.0060   Epoch: 15   Global Step: 187810   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:01:13,055-Speed 3343.77 samples/sec   Loss 1.6213   LearningRate 0.0060   Epoch: 15   Global Step: 187820   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:01:16,103-Speed 3360.50 samples/sec   Loss 1.5963   LearningRate 0.0059   Epoch: 15   Global Step: 187830   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:01:19,173-Speed 3336.45 samples/sec   Loss 1.6978   LearningRate 0.0059   Epoch: 15   Global Step: 187840   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:01:22,251-Speed 3328.25 samples/sec   Loss 1.6303   LearningRate 0.0059   Epoch: 15   Global Step: 187850   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:01:25,446-Speed 3205.83 samples/sec   Loss 1.6032   LearningRate 0.0059   Epoch: 15   Global Step: 187860   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:01:28,658-Speed 3189.06 samples/sec   Loss 1.6241   LearningRate 0.0059   Epoch: 15   Global Step: 187870   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:01:31,787-Speed 3273.59 samples/sec   Loss 1.6707   LearningRate 0.0059   Epoch: 15   Global Step: 187880   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:01:34,927-Speed 3261.57 samples/sec   Loss 1.6373   LearningRate 0.0059   Epoch: 15   Global Step: 187890   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:01:38,076-Speed 3253.69 samples/sec   Loss 1.6506   LearningRate 0.0059   Epoch: 15   Global Step: 187900   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:01:41,182-Speed 3297.22 samples/sec   Loss 1.6631   LearningRate 0.0059   Epoch: 15   Global Step: 187910   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:01:44,251-Speed 3338.45 samples/sec   Loss 1.6119   LearningRate 0.0059   Epoch: 15   Global Step: 187920   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:01:47,310-Speed 3348.05 samples/sec   Loss 1.6806   LearningRate 0.0059   Epoch: 15   Global Step: 187930   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:01:50,400-Speed 3315.20 samples/sec   Loss 1.6721   LearningRate 0.0059   Epoch: 15   Global Step: 187940   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:01:53,555-Speed 3246.22 samples/sec   Loss 1.6434   LearningRate 0.0059   Epoch: 15   Global Step: 187950   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:01:56,678-Speed 3279.82 samples/sec   Loss 1.6213   LearningRate 0.0059   Epoch: 15   Global Step: 187960   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:01:59,791-Speed 3291.15 samples/sec   Loss 1.6706   LearningRate 0.0059   Epoch: 15   Global Step: 187970   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:02:02,949-Speed 3243.37 samples/sec   Loss 1.6791   LearningRate 0.0059   Epoch: 15   Global Step: 187980   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:02:06,144-Speed 3205.91 samples/sec   Loss 1.6211   LearningRate 0.0059   Epoch: 15   Global Step: 187990   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:02:09,224-Speed 3325.10 samples/sec   Loss 1.5932   LearningRate 0.0059   Epoch: 15   Global Step: 188000   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:02:12,341-Speed 3286.25 samples/sec   Loss 1.5813   LearningRate 0.0059   Epoch: 15   Global Step: 188010   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:02:15,576-Speed 3167.04 samples/sec   Loss 1.6216   LearningRate 0.0059   Epoch: 15   Global Step: 188020   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:02:18,629-Speed 3354.78 samples/sec   Loss 1.6124   LearningRate 0.0059   Epoch: 15   Global Step: 188030   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:02:21,689-Speed 3347.44 samples/sec   Loss 1.5972   LearningRate 0.0059   Epoch: 15   Global Step: 188040   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:02:24,788-Speed 3305.90 samples/sec   Loss 1.6914   LearningRate 0.0059   Epoch: 15   Global Step: 188050   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:02:27,852-Speed 3342.52 samples/sec   Loss 1.6892   LearningRate 0.0059   Epoch: 15   Global Step: 188060   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:02:30,962-Speed 3294.33 samples/sec   Loss 1.5786   LearningRate 0.0059   Epoch: 15   Global Step: 188070   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:02:34,057-Speed 3309.42 samples/sec   Loss 1.6177   LearningRate 0.0059   Epoch: 15   Global Step: 188080   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:02:37,141-Speed 3321.55 samples/sec   Loss 1.6479   LearningRate 0.0059   Epoch: 15   Global Step: 188090   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:02:40,236-Speed 3308.89 samples/sec   Loss 1.6413   LearningRate 0.0059   Epoch: 15   Global Step: 188100   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:02:43,349-Speed 3291.00 samples/sec   Loss 1.6510   LearningRate 0.0059   Epoch: 15   Global Step: 188110   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:02:46,409-Speed 3348.78 samples/sec   Loss 1.6421   LearningRate 0.0059   Epoch: 15   Global Step: 188120   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:02:49,475-Speed 3340.51 samples/sec   Loss 1.6960   LearningRate 0.0059   Epoch: 15   Global Step: 188130   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:02:52,555-Speed 3326.05 samples/sec   Loss 1.6070   LearningRate 0.0059   Epoch: 15   Global Step: 188140   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:02:55,667-Speed 3292.04 samples/sec   Loss 1.6099   LearningRate 0.0059   Epoch: 15   Global Step: 188150   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:02:58,754-Speed 3318.20 samples/sec   Loss 1.5885   LearningRate 0.0059   Epoch: 15   Global Step: 188160   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:03:01,898-Speed 3257.71 samples/sec   Loss 1.6861   LearningRate 0.0059   Epoch: 15   Global Step: 188170   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:03:04,958-Speed 3347.48 samples/sec   Loss 1.6476   LearningRate 0.0059   Epoch: 15   Global Step: 188180   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:03:08,067-Speed 3295.27 samples/sec   Loss 1.6881   LearningRate 0.0059   Epoch: 15   Global Step: 188190   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:03:11,169-Speed 3301.92 samples/sec   Loss 1.6399   LearningRate 0.0059   Epoch: 15   Global Step: 188200   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:03:14,286-Speed 3285.79 samples/sec   Loss 1.6185   LearningRate 0.0059   Epoch: 15   Global Step: 188210   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:03:17,420-Speed 3269.24 samples/sec   Loss 1.6181   LearningRate 0.0059   Epoch: 15   Global Step: 188220   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:03:20,498-Speed 3327.64 samples/sec   Loss 1.6254   LearningRate 0.0059   Epoch: 15   Global Step: 188230   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:03:23,573-Speed 3331.17 samples/sec   Loss 1.6381   LearningRate 0.0059   Epoch: 15   Global Step: 188240   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:03:26,737-Speed 3237.36 samples/sec   Loss 1.6662   LearningRate 0.0059   Epoch: 15   Global Step: 188250   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:03:29,824-Speed 3318.13 samples/sec   Loss 1.6181   LearningRate 0.0059   Epoch: 15   Global Step: 188260   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:03:32,940-Speed 3287.71 samples/sec   Loss 1.6247   LearningRate 0.0059   Epoch: 15   Global Step: 188270   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:03:36,044-Speed 3299.93 samples/sec   Loss 1.6248   LearningRate 0.0059   Epoch: 15   Global Step: 188280   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:03:39,283-Speed 3162.97 samples/sec   Loss 1.6893   LearningRate 0.0059   Epoch: 15   Global Step: 188290   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:03:42,372-Speed 3316.24 samples/sec   Loss 1.6897   LearningRate 0.0059   Epoch: 15   Global Step: 188300   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:03:45,463-Speed 3313.45 samples/sec   Loss 1.7161   LearningRate 0.0059   Epoch: 15   Global Step: 188310   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:03:48,559-Speed 3308.58 samples/sec   Loss 1.6613   LearningRate 0.0059   Epoch: 15   Global Step: 188320   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:03:51,689-Speed 3273.10 samples/sec   Loss 1.6817   LearningRate 0.0059   Epoch: 15   Global Step: 188330   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:03:54,817-Speed 3275.03 samples/sec   Loss 1.6616   LearningRate 0.0058   Epoch: 15   Global Step: 188340   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:03:57,890-Speed 3333.21 samples/sec   Loss 1.6016   LearningRate 0.0058   Epoch: 15   Global Step: 188350   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:04:00,981-Speed 3313.68 samples/sec   Loss 1.6869   LearningRate 0.0058   Epoch: 15   Global Step: 188360   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:04:04,100-Speed 3284.56 samples/sec   Loss 1.6665   LearningRate 0.0058   Epoch: 15   Global Step: 188370   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:04:07,202-Speed 3301.85 samples/sec   Loss 1.6434   LearningRate 0.0058   Epoch: 15   Global Step: 188380   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:04:10,282-Speed 3325.60 samples/sec   Loss 1.6461   LearningRate 0.0058   Epoch: 15   Global Step: 188390   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:04:13,428-Speed 3256.66 samples/sec   Loss 1.6415   LearningRate 0.0058   Epoch: 15   Global Step: 188400   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:04:16,558-Speed 3272.42 samples/sec   Loss 1.7215   LearningRate 0.0058   Epoch: 15   Global Step: 188410   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:04:19,696-Speed 3264.93 samples/sec   Loss 1.6370   LearningRate 0.0058   Epoch: 15   Global Step: 188420   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:04:22,770-Speed 3331.98 samples/sec   Loss 1.6919   LearningRate 0.0058   Epoch: 15   Global Step: 188430   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:04:25,841-Speed 3335.00 samples/sec   Loss 1.6509   LearningRate 0.0058   Epoch: 15   Global Step: 188440   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:04:29,006-Speed 3236.52 samples/sec   Loss 1.6703   LearningRate 0.0058   Epoch: 15   Global Step: 188450   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:04:32,084-Speed 3328.31 samples/sec   Loss 1.5892   LearningRate 0.0058   Epoch: 15   Global Step: 188460   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:04:35,163-Speed 3327.20 samples/sec   Loss 1.6652   LearningRate 0.0058   Epoch: 15   Global Step: 188470   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:04:38,256-Speed 3311.35 samples/sec   Loss 1.6529   LearningRate 0.0058   Epoch: 15   Global Step: 188480   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:04:41,406-Speed 3251.97 samples/sec   Loss 1.6245   LearningRate 0.0058   Epoch: 15   Global Step: 188490   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:04:44,474-Speed 3339.30 samples/sec   Loss 1.6519   LearningRate 0.0058   Epoch: 15   Global Step: 188500   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:04:47,576-Speed 3301.75 samples/sec   Loss 1.6243   LearningRate 0.0058   Epoch: 15   Global Step: 188510   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:04:50,721-Speed 3257.37 samples/sec   Loss 1.6251   LearningRate 0.0058   Epoch: 15   Global Step: 188520   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:04:53,942-Speed 3180.17 samples/sec   Loss 1.7106   LearningRate 0.0058   Epoch: 15   Global Step: 188530   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:04:57,066-Speed 3278.84 samples/sec   Loss 1.6818   LearningRate 0.0058   Epoch: 15   Global Step: 188540   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:05:00,142-Speed 3330.20 samples/sec   Loss 1.6477   LearningRate 0.0058   Epoch: 15   Global Step: 188550   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:05:03,307-Speed 3236.08 samples/sec   Loss 1.6287   LearningRate 0.0058   Epoch: 15   Global Step: 188560   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:05:06,449-Speed 3260.44 samples/sec   Loss 1.6294   LearningRate 0.0058   Epoch: 15   Global Step: 188570   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:05:09,509-Speed 3347.21 samples/sec   Loss 1.6782   LearningRate 0.0058   Epoch: 15   Global Step: 188580   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:05:12,591-Speed 3323.60 samples/sec   Loss 1.7033   LearningRate 0.0058   Epoch: 15   Global Step: 188590   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:05:15,700-Speed 3294.72 samples/sec   Loss 1.6170   LearningRate 0.0058   Epoch: 15   Global Step: 188600   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:05:18,855-Speed 3246.64 samples/sec   Loss 1.6528   LearningRate 0.0058   Epoch: 15   Global Step: 188610   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:05:21,904-Speed 3359.98 samples/sec   Loss 1.6497   LearningRate 0.0058   Epoch: 15   Global Step: 188620   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:05:25,011-Speed 3297.45 samples/sec   Loss 1.6813   LearningRate 0.0058   Epoch: 15   Global Step: 188630   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:05:28,127-Speed 3287.20 samples/sec   Loss 1.6384   LearningRate 0.0058   Epoch: 15   Global Step: 188640   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:05:31,268-Speed 3261.10 samples/sec   Loss 1.6368   LearningRate 0.0058   Epoch: 15   Global Step: 188650   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:05:34,325-Speed 3350.07 samples/sec   Loss 1.6238   LearningRate 0.0058   Epoch: 15   Global Step: 188660   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:05:37,444-Speed 3284.18 samples/sec   Loss 1.6267   LearningRate 0.0058   Epoch: 15   Global Step: 188670   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:05:40,502-Speed 3349.60 samples/sec   Loss 1.6614   LearningRate 0.0058   Epoch: 15   Global Step: 188680   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:05:43,607-Speed 3299.80 samples/sec   Loss 1.6429   LearningRate 0.0058   Epoch: 15   Global Step: 188690   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:05:46,705-Speed 3306.30 samples/sec   Loss 1.6394   LearningRate 0.0058   Epoch: 15   Global Step: 188700   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:05:49,854-Speed 3252.94 samples/sec   Loss 1.6631   LearningRate 0.0058   Epoch: 15   Global Step: 188710   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:05:52,940-Speed 3318.72 samples/sec   Loss 1.7188   LearningRate 0.0058   Epoch: 15   Global Step: 188720   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:05:56,001-Speed 3346.41 samples/sec   Loss 1.6654   LearningRate 0.0058   Epoch: 15   Global Step: 188730   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:05:59,075-Speed 3332.26 samples/sec   Loss 1.6805   LearningRate 0.0058   Epoch: 15   Global Step: 188740   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:06:02,220-Speed 3257.37 samples/sec   Loss 1.7008   LearningRate 0.0058   Epoch: 15   Global Step: 188750   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:06:05,293-Speed 3333.05 samples/sec   Loss 1.6912   LearningRate 0.0058   Epoch: 15   Global Step: 188760   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:06:08,398-Speed 3299.18 samples/sec   Loss 1.6517   LearningRate 0.0058   Epoch: 15   Global Step: 188770   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:06:11,467-Speed 3337.12 samples/sec   Loss 1.6077   LearningRate 0.0058   Epoch: 15   Global Step: 188780   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:06:14,536-Speed 3337.71 samples/sec   Loss 1.6757   LearningRate 0.0058   Epoch: 15   Global Step: 188790   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:06:17,645-Speed 3294.62 samples/sec   Loss 1.7161   LearningRate 0.0058   Epoch: 15   Global Step: 188800   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:06:20,699-Speed 3353.92 samples/sec   Loss 1.6293   LearningRate 0.0058   Epoch: 15   Global Step: 188810   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:06:23,770-Speed 3335.68 samples/sec   Loss 1.6917   LearningRate 0.0058   Epoch: 15   Global Step: 188820   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:06:26,809-Speed 3370.22 samples/sec   Loss 1.5846   LearningRate 0.0058   Epoch: 15   Global Step: 188830   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:06:29,886-Speed 3329.76 samples/sec   Loss 1.6946   LearningRate 0.0058   Epoch: 15   Global Step: 188840   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:06:32,955-Speed 3337.21 samples/sec   Loss 1.6930   LearningRate 0.0058   Epoch: 15   Global Step: 188850   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:06:36,074-Speed 3284.77 samples/sec   Loss 1.6964   LearningRate 0.0057   Epoch: 15   Global Step: 188860   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:06:39,148-Speed 3332.59 samples/sec   Loss 1.6881   LearningRate 0.0057   Epoch: 15   Global Step: 188870   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:06:42,248-Speed 3303.97 samples/sec   Loss 1.6561   LearningRate 0.0057   Epoch: 15   Global Step: 188880   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:06:45,295-Speed 3361.68 samples/sec   Loss 1.6731   LearningRate 0.0057   Epoch: 15   Global Step: 188890   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:06:48,453-Speed 3243.49 samples/sec   Loss 1.6765   LearningRate 0.0057   Epoch: 15   Global Step: 188900   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:06:51,565-Speed 3292.02 samples/sec   Loss 1.7011   LearningRate 0.0057   Epoch: 15   Global Step: 188910   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:06:54,646-Speed 3324.48 samples/sec   Loss 1.7274   LearningRate 0.0057   Epoch: 15   Global Step: 188920   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:06:57,700-Speed 3354.39 samples/sec   Loss 1.7490   LearningRate 0.0057   Epoch: 15   Global Step: 188930   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:07:00,817-Speed 3286.17 samples/sec   Loss 1.7169   LearningRate 0.0057   Epoch: 15   Global Step: 188940   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:07:03,900-Speed 3322.24 samples/sec   Loss 1.6748   LearningRate 0.0057   Epoch: 15   Global Step: 188950   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:07:07,019-Speed 3284.25 samples/sec   Loss 1.6823   LearningRate 0.0057   Epoch: 15   Global Step: 188960   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:07:10,080-Speed 3346.93 samples/sec   Loss 1.6708   LearningRate 0.0057   Epoch: 15   Global Step: 188970   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:07:13,153-Speed 3332.60 samples/sec   Loss 1.6639   LearningRate 0.0057   Epoch: 15   Global Step: 188980   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:07:16,243-Speed 3315.51 samples/sec   Loss 1.6847   LearningRate 0.0057   Epoch: 15   Global Step: 188990   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:07:19,335-Speed 3312.26 samples/sec   Loss 1.6819   LearningRate 0.0057   Epoch: 15   Global Step: 189000   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:07:22,456-Speed 3282.35 samples/sec   Loss 1.6573   LearningRate 0.0057   Epoch: 15   Global Step: 189010   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:07:25,576-Speed 3283.03 samples/sec   Loss 1.7037   LearningRate 0.0057   Epoch: 15   Global Step: 189020   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:07:28,682-Speed 3298.40 samples/sec   Loss 1.6570   LearningRate 0.0057   Epoch: 15   Global Step: 189030   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:07:31,801-Speed 3283.90 samples/sec   Loss 1.6482   LearningRate 0.0057   Epoch: 15   Global Step: 189040   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:07:34,876-Speed 3330.74 samples/sec   Loss 1.6452   LearningRate 0.0057   Epoch: 15   Global Step: 189050   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:07:37,970-Speed 3310.95 samples/sec   Loss 1.6831   LearningRate 0.0057   Epoch: 15   Global Step: 189060   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:07:41,109-Speed 3263.19 samples/sec   Loss 1.6835   LearningRate 0.0057   Epoch: 15   Global Step: 189070   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:07:44,232-Speed 3280.18 samples/sec   Loss 1.6567   LearningRate 0.0057   Epoch: 15   Global Step: 189080   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:07:47,287-Speed 3353.71 samples/sec   Loss 1.7417   LearningRate 0.0057   Epoch: 15   Global Step: 189090   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:07:50,413-Speed 3276.26 samples/sec   Loss 1.6845   LearningRate 0.0057   Epoch: 15   Global Step: 189100   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:07:53,510-Speed 3307.49 samples/sec   Loss 1.6561   LearningRate 0.0057   Epoch: 15   Global Step: 189110   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:07:56,643-Speed 3270.06 samples/sec   Loss 1.7033   LearningRate 0.0057   Epoch: 15   Global Step: 189120   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:07:59,728-Speed 3319.49 samples/sec   Loss 1.6513   LearningRate 0.0057   Epoch: 15   Global Step: 189130   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:08:02,854-Speed 3277.63 samples/sec   Loss 1.7001   LearningRate 0.0057   Epoch: 15   Global Step: 189140   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:08:05,931-Speed 3328.83 samples/sec   Loss 1.6479   LearningRate 0.0057   Epoch: 15   Global Step: 189150   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:08:08,980-Speed 3359.11 samples/sec   Loss 1.6405   LearningRate 0.0057   Epoch: 15   Global Step: 189160   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:08:12,165-Speed 3216.42 samples/sec   Loss 1.6729   LearningRate 0.0057   Epoch: 15   Global Step: 189170   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:08:15,212-Speed 3361.34 samples/sec   Loss 1.6437   LearningRate 0.0057   Epoch: 15   Global Step: 189180   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:08:18,333-Speed 3282.64 samples/sec   Loss 1.6862   LearningRate 0.0057   Epoch: 15   Global Step: 189190   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:08:21,419-Speed 3318.85 samples/sec   Loss 1.6780   LearningRate 0.0057   Epoch: 15   Global Step: 189200   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:08:24,510-Speed 3313.23 samples/sec   Loss 1.6475   LearningRate 0.0057   Epoch: 15   Global Step: 189210   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:08:27,633-Speed 3280.91 samples/sec   Loss 1.6505   LearningRate 0.0057   Epoch: 15   Global Step: 189220   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:08:30,723-Speed 3314.75 samples/sec   Loss 1.7635   LearningRate 0.0057   Epoch: 15   Global Step: 189230   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:08:33,809-Speed 3318.59 samples/sec   Loss 1.6331   LearningRate 0.0057   Epoch: 15   Global Step: 189240   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:08:36,910-Speed 3303.46 samples/sec   Loss 1.6562   LearningRate 0.0057   Epoch: 15   Global Step: 189250   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:08:40,043-Speed 3269.09 samples/sec   Loss 1.6704   LearningRate 0.0057   Epoch: 15   Global Step: 189260   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:08:43,212-Speed 3232.95 samples/sec   Loss 1.6571   LearningRate 0.0057   Epoch: 15   Global Step: 189270   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:08:46,298-Speed 3319.85 samples/sec   Loss 1.6483   LearningRate 0.0057   Epoch: 15   Global Step: 189280   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:08:49,531-Speed 3167.98 samples/sec   Loss 1.6634   LearningRate 0.0057   Epoch: 15   Global Step: 189290   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:08:52,630-Speed 3304.89 samples/sec   Loss 1.6767   LearningRate 0.0057   Epoch: 15   Global Step: 189300   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:08:55,727-Speed 3307.99 samples/sec   Loss 1.6968   LearningRate 0.0057   Epoch: 15   Global Step: 189310   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:08:58,785-Speed 3349.27 samples/sec   Loss 1.6381   LearningRate 0.0057   Epoch: 15   Global Step: 189320   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:09:01,887-Speed 3301.55 samples/sec   Loss 1.7250   LearningRate 0.0057   Epoch: 15   Global Step: 189330   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:09:05,039-Speed 3250.47 samples/sec   Loss 1.6994   LearningRate 0.0057   Epoch: 15   Global Step: 189340   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:09:08,112-Speed 3333.30 samples/sec   Loss 1.6666   LearningRate 0.0057   Epoch: 15   Global Step: 189350   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:09:11,193-Speed 3324.69 samples/sec   Loss 1.7144   LearningRate 0.0057   Epoch: 15   Global Step: 189360   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:09:14,307-Speed 3288.49 samples/sec   Loss 1.6564   LearningRate 0.0057   Epoch: 15   Global Step: 189370   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:09:17,380-Speed 3333.72 samples/sec   Loss 1.6802   LearningRate 0.0056   Epoch: 15   Global Step: 189380   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:09:20,523-Speed 3259.81 samples/sec   Loss 1.6445   LearningRate 0.0056   Epoch: 15   Global Step: 189390   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:09:23,597-Speed 3331.43 samples/sec   Loss 1.6318   LearningRate 0.0056   Epoch: 15   Global Step: 189400   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:09:26,680-Speed 3323.18 samples/sec   Loss 1.6692   LearningRate 0.0056   Epoch: 15   Global Step: 189410   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:09:29,764-Speed 3320.42 samples/sec   Loss 1.6932   LearningRate 0.0056   Epoch: 15   Global Step: 189420   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:09:32,819-Speed 3354.29 samples/sec   Loss 1.6222   LearningRate 0.0056   Epoch: 15   Global Step: 189430   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:09:35,889-Speed 3336.55 samples/sec   Loss 1.7544   LearningRate 0.0056   Epoch: 15   Global Step: 189440   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:09:38,969-Speed 3325.34 samples/sec   Loss 1.7084   LearningRate 0.0056   Epoch: 15   Global Step: 189450   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:09:42,175-Speed 3194.56 samples/sec   Loss 1.6687   LearningRate 0.0056   Epoch: 15   Global Step: 189460   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:09:45,293-Speed 3285.40 samples/sec   Loss 1.6332   LearningRate 0.0056   Epoch: 15   Global Step: 189470   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:09:48,434-Speed 3261.09 samples/sec   Loss 1.7205   LearningRate 0.0056   Epoch: 15   Global Step: 189480   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:09:51,541-Speed 3296.67 samples/sec   Loss 1.6706   LearningRate 0.0056   Epoch: 15   Global Step: 189490   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:09:54,734-Speed 3208.22 samples/sec   Loss 1.6567   LearningRate 0.0056   Epoch: 15   Global Step: 189500   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:09:57,790-Speed 3351.89 samples/sec   Loss 1.6764   LearningRate 0.0056   Epoch: 15   Global Step: 189510   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:10:00,885-Speed 3308.91 samples/sec   Loss 1.7209   LearningRate 0.0056   Epoch: 15   Global Step: 189520   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:10:03,976-Speed 3314.45 samples/sec   Loss 1.6753   LearningRate 0.0056   Epoch: 15   Global Step: 189530   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:10:07,109-Speed 3269.39 samples/sec   Loss 1.7258   LearningRate 0.0056   Epoch: 15   Global Step: 189540   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:10:10,176-Speed 3339.69 samples/sec   Loss 1.6470   LearningRate 0.0056   Epoch: 15   Global Step: 189550   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:10:13,281-Speed 3299.25 samples/sec   Loss 1.6643   LearningRate 0.0056   Epoch: 15   Global Step: 189560   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:10:16,403-Speed 3280.72 samples/sec   Loss 1.7231   LearningRate 0.0056   Epoch: 15   Global Step: 189570   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:10:19,513-Speed 3294.38 samples/sec   Loss 1.6885   LearningRate 0.0056   Epoch: 15   Global Step: 189580   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:10:22,565-Speed 3355.49 samples/sec   Loss 1.7302   LearningRate 0.0056   Epoch: 15   Global Step: 189590   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:10:25,645-Speed 3325.55 samples/sec   Loss 1.7179   LearningRate 0.0056   Epoch: 15   Global Step: 189600   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:10:28,765-Speed 3283.31 samples/sec   Loss 1.6694   LearningRate 0.0056   Epoch: 15   Global Step: 189610   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:10:31,826-Speed 3346.68 samples/sec   Loss 1.7027   LearningRate 0.0056   Epoch: 15   Global Step: 189620   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:10:34,904-Speed 3327.88 samples/sec   Loss 1.7167   LearningRate 0.0056   Epoch: 15   Global Step: 189630   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:10:37,989-Speed 3320.00 samples/sec   Loss 1.6126   LearningRate 0.0056   Epoch: 15   Global Step: 189640   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:10:41,087-Speed 3306.87 samples/sec   Loss 1.6995   LearningRate 0.0056   Epoch: 15   Global Step: 189650   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:10:44,173-Speed 3318.78 samples/sec   Loss 1.7220   LearningRate 0.0056   Epoch: 15   Global Step: 189660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-04-27 18:10:47,245-Speed 3334.68 samples/sec   Loss 1.6765   LearningRate 0.0056   Epoch: 15   Global Step: 189670   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:10:50,318-Speed 3333.94 samples/sec   Loss 1.7278   LearningRate 0.0056   Epoch: 15   Global Step: 189680   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:10:53,402-Speed 3320.46 samples/sec   Loss 1.6723   LearningRate 0.0056   Epoch: 15   Global Step: 189690   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:10:56,507-Speed 3299.21 samples/sec   Loss 1.6669   LearningRate 0.0056   Epoch: 15   Global Step: 189700   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:10:59,619-Speed 3291.52 samples/sec   Loss 1.7033   LearningRate 0.0056   Epoch: 15   Global Step: 189710   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:11:02,758-Speed 3263.64 samples/sec   Loss 1.6949   LearningRate 0.0056   Epoch: 15   Global Step: 189720   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:11:05,911-Speed 3248.33 samples/sec   Loss 1.6630   LearningRate 0.0056   Epoch: 15   Global Step: 189730   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:11:08,979-Speed 3338.73 samples/sec   Loss 1.7134   LearningRate 0.0056   Epoch: 15   Global Step: 189740   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:11:12,050-Speed 3335.84 samples/sec   Loss 1.6735   LearningRate 0.0056   Epoch: 15   Global Step: 189750   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:11:15,124-Speed 3332.50 samples/sec   Loss 1.6551   LearningRate 0.0056   Epoch: 15   Global Step: 189760   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:11:18,232-Speed 3295.58 samples/sec   Loss 1.7559   LearningRate 0.0056   Epoch: 15   Global Step: 189770   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:11:21,323-Speed 3313.59 samples/sec   Loss 1.7472   LearningRate 0.0056   Epoch: 15   Global Step: 189780   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:11:24,439-Speed 3287.04 samples/sec   Loss 1.6546   LearningRate 0.0056   Epoch: 15   Global Step: 189790   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:11:27,565-Speed 3277.28 samples/sec   Loss 1.7278   LearningRate 0.0056   Epoch: 15   Global Step: 189800   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:11:30,675-Speed 3293.78 samples/sec   Loss 1.7449   LearningRate 0.0056   Epoch: 15   Global Step: 189810   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:11:33,746-Speed 3334.62 samples/sec   Loss 1.7335   LearningRate 0.0056   Epoch: 15   Global Step: 189820   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:11:36,916-Speed 3231.59 samples/sec   Loss 1.6302   LearningRate 0.0056   Epoch: 15   Global Step: 189830   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:11:40,022-Speed 3297.90 samples/sec   Loss 1.7112   LearningRate 0.0056   Epoch: 15   Global Step: 189840   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:11:43,087-Speed 3342.43 samples/sec   Loss 1.7223   LearningRate 0.0056   Epoch: 15   Global Step: 189850   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:11:46,249-Speed 3239.42 samples/sec   Loss 1.7034   LearningRate 0.0056   Epoch: 15   Global Step: 189860   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:11:49,329-Speed 3325.32 samples/sec   Loss 1.6687   LearningRate 0.0056   Epoch: 15   Global Step: 189870   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:11:52,436-Speed 3296.98 samples/sec   Loss 1.7622   LearningRate 0.0056   Epoch: 15   Global Step: 189880   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:11:55,551-Speed 3288.06 samples/sec   Loss 1.6449   LearningRate 0.0056   Epoch: 15   Global Step: 189890   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:11:58,675-Speed 3278.30 samples/sec   Loss 1.6754   LearningRate 0.0055   Epoch: 15   Global Step: 189900   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:12:01,822-Speed 3255.88 samples/sec   Loss 1.7331   LearningRate 0.0055   Epoch: 15   Global Step: 189910   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:12:04,949-Speed 3274.85 samples/sec   Loss 1.7313   LearningRate 0.0055   Epoch: 15   Global Step: 189920   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:12:08,015-Speed 3341.12 samples/sec   Loss 1.7484   LearningRate 0.0055   Epoch: 15   Global Step: 189930   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:12:11,124-Speed 3294.66 samples/sec   Loss 1.7217   LearningRate 0.0055   Epoch: 15   Global Step: 189940   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:12:14,240-Speed 3288.21 samples/sec   Loss 1.7723   LearningRate 0.0055   Epoch: 15   Global Step: 189950   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:12:17,322-Speed 3323.36 samples/sec   Loss 1.7970   LearningRate 0.0055   Epoch: 15   Global Step: 189960   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:12:20,404-Speed 3323.70 samples/sec   Loss 1.7551   LearningRate 0.0055   Epoch: 15   Global Step: 189970   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:12:23,482-Speed 3328.27 samples/sec   Loss 1.7292   LearningRate 0.0055   Epoch: 15   Global Step: 189980   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:12:26,586-Speed 3300.23 samples/sec   Loss 1.6555   LearningRate 0.0055   Epoch: 15   Global Step: 189990   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:12:29,709-Speed 3279.69 samples/sec   Loss 1.7029   LearningRate 0.0055   Epoch: 15   Global Step: 190000   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:12:32,811-Speed 3301.48 samples/sec   Loss 1.6750   LearningRate 0.0055   Epoch: 15   Global Step: 190010   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:12:35,935-Speed 3279.27 samples/sec   Loss 1.7132   LearningRate 0.0055   Epoch: 15   Global Step: 190020   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:12:39,024-Speed 3315.56 samples/sec   Loss 1.7076   LearningRate 0.0055   Epoch: 15   Global Step: 190030   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:12:42,136-Speed 3291.71 samples/sec   Loss 1.6692   LearningRate 0.0055   Epoch: 15   Global Step: 190040   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:12:45,216-Speed 3325.93 samples/sec   Loss 1.7231   LearningRate 0.0055   Epoch: 15   Global Step: 190050   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:12:48,292-Speed 3330.42 samples/sec   Loss 1.6471   LearningRate 0.0055   Epoch: 15   Global Step: 190060   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:12:51,363-Speed 3335.04 samples/sec   Loss 1.7512   LearningRate 0.0055   Epoch: 15   Global Step: 190070   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:12:54,474-Speed 3293.27 samples/sec   Loss 1.6975   LearningRate 0.0055   Epoch: 15   Global Step: 190080   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:12:57,542-Speed 3338.51 samples/sec   Loss 1.7408   LearningRate 0.0055   Epoch: 15   Global Step: 190090   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:13:00,631-Speed 3315.48 samples/sec   Loss 1.7083   LearningRate 0.0055   Epoch: 15   Global Step: 190100   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:13:03,734-Speed 3301.11 samples/sec   Loss 1.6591   LearningRate 0.0055   Epoch: 15   Global Step: 190110   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:13:06,817-Speed 3322.45 samples/sec   Loss 1.6639   LearningRate 0.0055   Epoch: 15   Global Step: 190120   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:13:09,899-Speed 3323.77 samples/sec   Loss 1.7216   LearningRate 0.0055   Epoch: 15   Global Step: 190130   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:13:12,986-Speed 3318.25 samples/sec   Loss 1.7468   LearningRate 0.0055   Epoch: 15   Global Step: 190140   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:13:16,085-Speed 3304.41 samples/sec   Loss 1.7301   LearningRate 0.0055   Epoch: 15   Global Step: 190150   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:13:19,195-Speed 3294.55 samples/sec   Loss 1.6660   LearningRate 0.0055   Epoch: 15   Global Step: 190160   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:13:22,281-Speed 3318.54 samples/sec   Loss 1.7544   LearningRate 0.0055   Epoch: 15   Global Step: 190170   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:13:25,417-Speed 3266.19 samples/sec   Loss 1.7268   LearningRate 0.0055   Epoch: 15   Global Step: 190180   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:13:28,607-Speed 3211.61 samples/sec   Loss 1.7158   LearningRate 0.0055   Epoch: 15   Global Step: 190190   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:13:31,734-Speed 3275.90 samples/sec   Loss 1.7522   LearningRate 0.0055   Epoch: 15   Global Step: 190200   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:13:34,814-Speed 3325.57 samples/sec   Loss 1.7417   LearningRate 0.0055   Epoch: 15   Global Step: 190210   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:13:37,908-Speed 3310.41 samples/sec   Loss 1.6834   LearningRate 0.0055   Epoch: 15   Global Step: 190220   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:13:40,994-Speed 3318.37 samples/sec   Loss 1.6993   LearningRate 0.0055   Epoch: 15   Global Step: 190230   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:13:44,100-Speed 3298.56 samples/sec   Loss 1.7450   LearningRate 0.0055   Epoch: 15   Global Step: 190240   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:13:47,183-Speed 3322.51 samples/sec   Loss 1.7382   LearningRate 0.0055   Epoch: 15   Global Step: 190250   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:13:50,277-Speed 3309.89 samples/sec   Loss 1.7066   LearningRate 0.0055   Epoch: 15   Global Step: 190260   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:13:53,465-Speed 3214.40 samples/sec   Loss 1.6962   LearningRate 0.0055   Epoch: 15   Global Step: 190270   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:13:56,581-Speed 3286.55 samples/sec   Loss 1.7128   LearningRate 0.0055   Epoch: 15   Global Step: 190280   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:13:59,683-Speed 3302.54 samples/sec   Loss 1.7378   LearningRate 0.0055   Epoch: 15   Global Step: 190290   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:14:02,787-Speed 3299.41 samples/sec   Loss 1.7063   LearningRate 0.0055   Epoch: 15   Global Step: 190300   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:14:05,870-Speed 3322.45 samples/sec   Loss 1.6615   LearningRate 0.0055   Epoch: 15   Global Step: 190310   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:14:08,984-Speed 3290.15 samples/sec   Loss 1.7688   LearningRate 0.0055   Epoch: 15   Global Step: 190320   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:14:12,139-Speed 3246.58 samples/sec   Loss 1.7356   LearningRate 0.0055   Epoch: 15   Global Step: 190330   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:14:15,350-Speed 3189.25 samples/sec   Loss 1.6968   LearningRate 0.0055   Epoch: 15   Global Step: 190340   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:14:18,459-Speed 3294.89 samples/sec   Loss 1.7014   LearningRate 0.0055   Epoch: 15   Global Step: 190350   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:14:21,530-Speed 3335.70 samples/sec   Loss 1.6839   LearningRate 0.0055   Epoch: 15   Global Step: 190360   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:14:24,645-Speed 3288.70 samples/sec   Loss 1.7389   LearningRate 0.0055   Epoch: 15   Global Step: 190370   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:14:27,857-Speed 3188.21 samples/sec   Loss 1.7037   LearningRate 0.0055   Epoch: 15   Global Step: 190380   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:14:30,976-Speed 3285.05 samples/sec   Loss 1.7929   LearningRate 0.0055   Epoch: 15   Global Step: 190390   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:14:34,058-Speed 3322.63 samples/sec   Loss 1.6975   LearningRate 0.0055   Epoch: 15   Global Step: 190400   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:14:37,131-Speed 3333.55 samples/sec   Loss 1.6793   LearningRate 0.0055   Epoch: 15   Global Step: 190410   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:14:40,297-Speed 3235.16 samples/sec   Loss 1.7350   LearningRate 0.0055   Epoch: 15   Global Step: 190420   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:14:43,394-Speed 3307.33 samples/sec   Loss 1.7066   LearningRate 0.0054   Epoch: 15   Global Step: 190430   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:14:46,467-Speed 3333.65 samples/sec   Loss 1.7219   LearningRate 0.0054   Epoch: 15   Global Step: 190440   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:14:49,550-Speed 3322.84 samples/sec   Loss 1.6874   LearningRate 0.0054   Epoch: 15   Global Step: 190450   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:14:52,632-Speed 3324.13 samples/sec   Loss 1.7393   LearningRate 0.0054   Epoch: 15   Global Step: 190460   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:14:55,738-Speed 3297.20 samples/sec   Loss 1.6884   LearningRate 0.0054   Epoch: 15   Global Step: 190470   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:14:58,848-Speed 3293.92 samples/sec   Loss 1.7236   LearningRate 0.0054   Epoch: 15   Global Step: 190480   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:15:02,020-Speed 3229.82 samples/sec   Loss 1.7127   LearningRate 0.0054   Epoch: 15   Global Step: 190490   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:15:05,116-Speed 3307.87 samples/sec   Loss 1.7233   LearningRate 0.0054   Epoch: 15   Global Step: 190500   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:15:08,286-Speed 3230.95 samples/sec   Loss 1.8085   LearningRate 0.0054   Epoch: 15   Global Step: 190510   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:15:11,403-Speed 3286.69 samples/sec   Loss 1.7494   LearningRate 0.0054   Epoch: 15   Global Step: 190520   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:15:14,505-Speed 3301.90 samples/sec   Loss 1.7127   LearningRate 0.0054   Epoch: 15   Global Step: 190530   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:15:17,610-Speed 3299.06 samples/sec   Loss 1.7244   LearningRate 0.0054   Epoch: 15   Global Step: 190540   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:15:20,719-Speed 3294.21 samples/sec   Loss 1.7105   LearningRate 0.0054   Epoch: 15   Global Step: 190550   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:15:23,846-Speed 3276.48 samples/sec   Loss 1.6731   LearningRate 0.0054   Epoch: 15   Global Step: 190560   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:15:27,022-Speed 3224.74 samples/sec   Loss 1.7228   LearningRate 0.0054   Epoch: 15   Global Step: 190570   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:15:30,170-Speed 3253.68 samples/sec   Loss 1.7161   LearningRate 0.0054   Epoch: 15   Global Step: 190580   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:15:33,250-Speed 3326.26 samples/sec   Loss 1.7267   LearningRate 0.0054   Epoch: 15   Global Step: 190590   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:15:36,394-Speed 3257.92 samples/sec   Loss 1.6955   LearningRate 0.0054   Epoch: 15   Global Step: 190600   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:15:39,538-Speed 3257.46 samples/sec   Loss 1.7108   LearningRate 0.0054   Epoch: 15   Global Step: 190610   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:15:42,610-Speed 3334.40 samples/sec   Loss 1.7350   LearningRate 0.0054   Epoch: 15   Global Step: 190620   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:15:45,679-Speed 3338.37 samples/sec   Loss 1.7021   LearningRate 0.0054   Epoch: 15   Global Step: 190630   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:15:48,784-Speed 3299.03 samples/sec   Loss 1.7544   LearningRate 0.0054   Epoch: 15   Global Step: 190640   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:15:51,919-Speed 3266.23 samples/sec   Loss 1.6749   LearningRate 0.0054   Epoch: 15   Global Step: 190650   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:15:55,030-Speed 3293.33 samples/sec   Loss 1.6783   LearningRate 0.0054   Epoch: 15   Global Step: 190660   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:15:58,128-Speed 3306.13 samples/sec   Loss 1.7374   LearningRate 0.0054   Epoch: 15   Global Step: 190670   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:16:01,206-Speed 3327.38 samples/sec   Loss 1.7163   LearningRate 0.0054   Epoch: 15   Global Step: 190680   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:16:04,349-Speed 3259.76 samples/sec   Loss 1.7480   LearningRate 0.0054   Epoch: 15   Global Step: 190690   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:16:07,465-Speed 3287.09 samples/sec   Loss 1.7286   LearningRate 0.0054   Epoch: 15   Global Step: 190700   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:16:10,557-Speed 3312.01 samples/sec   Loss 1.7071   LearningRate 0.0054   Epoch: 15   Global Step: 190710   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:16:13,666-Speed 3295.03 samples/sec   Loss 1.7296   LearningRate 0.0054   Epoch: 15   Global Step: 190720   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:16:16,783-Speed 3286.83 samples/sec   Loss 1.6667   LearningRate 0.0054   Epoch: 15   Global Step: 190730   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:16:19,962-Speed 3221.76 samples/sec   Loss 1.7332   LearningRate 0.0054   Epoch: 15   Global Step: 190740   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:16:23,049-Speed 3318.43 samples/sec   Loss 1.6834   LearningRate 0.0054   Epoch: 15   Global Step: 190750   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:16:26,201-Speed 3249.48 samples/sec   Loss 1.7386   LearningRate 0.0054   Epoch: 15   Global Step: 190760   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:16:29,336-Speed 3267.69 samples/sec   Loss 1.6610   LearningRate 0.0054   Epoch: 15   Global Step: 190770   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:16:32,413-Speed 3328.84 samples/sec   Loss 1.7814   LearningRate 0.0054   Epoch: 15   Global Step: 190780   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:16:35,500-Speed 3318.33 samples/sec   Loss 1.6914   LearningRate 0.0054   Epoch: 15   Global Step: 190790   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:16:38,556-Speed 3351.81 samples/sec   Loss 1.7374   LearningRate 0.0054   Epoch: 15   Global Step: 190800   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:16:41,641-Speed 3320.12 samples/sec   Loss 1.7252   LearningRate 0.0054   Epoch: 15   Global Step: 190810   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:16:44,723-Speed 3323.61 samples/sec   Loss 1.6732   LearningRate 0.0054   Epoch: 15   Global Step: 190820   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:16:47,863-Speed 3262.21 samples/sec   Loss 1.7303   LearningRate 0.0054   Epoch: 15   Global Step: 190830   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:16:50,963-Speed 3303.23 samples/sec   Loss 1.7361   LearningRate 0.0054   Epoch: 15   Global Step: 190840   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:16:54,079-Speed 3287.99 samples/sec   Loss 1.7629   LearningRate 0.0054   Epoch: 15   Global Step: 190850   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:16:57,146-Speed 3339.69 samples/sec   Loss 1.7041   LearningRate 0.0054   Epoch: 15   Global Step: 190860   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:17:00,312-Speed 3236.05 samples/sec   Loss 1.7381   LearningRate 0.0054   Epoch: 15   Global Step: 190870   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:17:03,453-Speed 3260.63 samples/sec   Loss 1.7717   LearningRate 0.0054   Epoch: 15   Global Step: 190880   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:17:06,597-Speed 3258.34 samples/sec   Loss 1.7548   LearningRate 0.0054   Epoch: 15   Global Step: 190890   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:17:09,695-Speed 3305.85 samples/sec   Loss 1.6868   LearningRate 0.0054   Epoch: 15   Global Step: 190900   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:17:12,795-Speed 3304.92 samples/sec   Loss 1.7014   LearningRate 0.0054   Epoch: 15   Global Step: 190910   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:17:15,875-Speed 3325.00 samples/sec   Loss 1.7283   LearningRate 0.0054   Epoch: 15   Global Step: 190920   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:17:18,965-Speed 3315.38 samples/sec   Loss 1.6995   LearningRate 0.0054   Epoch: 15   Global Step: 190930   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:17:22,047-Speed 3323.91 samples/sec   Loss 1.7417   LearningRate 0.0054   Epoch: 15   Global Step: 190940   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:17:25,145-Speed 3305.99 samples/sec   Loss 1.7872   LearningRate 0.0054   Epoch: 15   Global Step: 190950   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:17:28,235-Speed 3315.22 samples/sec   Loss 1.7445   LearningRate 0.0054   Epoch: 15   Global Step: 190960   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:17:31,310-Speed 3331.52 samples/sec   Loss 1.6964   LearningRate 0.0053   Epoch: 15   Global Step: 190970   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:17:34,419-Speed 3294.31 samples/sec   Loss 1.7254   LearningRate 0.0053   Epoch: 15   Global Step: 190980   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:17:37,573-Speed 3248.33 samples/sec   Loss 1.7573   LearningRate 0.0053   Epoch: 15   Global Step: 190990   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:17:40,649-Speed 3330.16 samples/sec   Loss 1.7487   LearningRate 0.0053   Epoch: 15   Global Step: 191000   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:17:43,816-Speed 3234.18 samples/sec   Loss 1.7425   LearningRate 0.0053   Epoch: 15   Global Step: 191010   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:17:46,916-Speed 3304.39 samples/sec   Loss 1.6669   LearningRate 0.0053   Epoch: 15   Global Step: 191020   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:17:50,074-Speed 3243.39 samples/sec   Loss 1.7800   LearningRate 0.0053   Epoch: 15   Global Step: 191030   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:17:53,173-Speed 3305.42 samples/sec   Loss 1.7173   LearningRate 0.0053   Epoch: 15   Global Step: 191040   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:17:56,244-Speed 3335.41 samples/sec   Loss 1.7321   LearningRate 0.0053   Epoch: 15   Global Step: 191050   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:17:59,360-Speed 3286.59 samples/sec   Loss 1.7094   LearningRate 0.0053   Epoch: 15   Global Step: 191060   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:18:02,459-Speed 3306.03 samples/sec   Loss 1.7323   LearningRate 0.0053   Epoch: 15   Global Step: 191070   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:18:05,580-Speed 3282.27 samples/sec   Loss 1.7515   LearningRate 0.0053   Epoch: 15   Global Step: 191080   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:18:08,675-Speed 3308.68 samples/sec   Loss 1.6880   LearningRate 0.0053   Epoch: 15   Global Step: 191090   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:18:11,761-Speed 3319.71 samples/sec   Loss 1.7552   LearningRate 0.0053   Epoch: 15   Global Step: 191100   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:18:14,861-Speed 3303.89 samples/sec   Loss 1.6835   LearningRate 0.0053   Epoch: 15   Global Step: 191110   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:18:17,989-Speed 3275.18 samples/sec   Loss 1.7769   LearningRate 0.0053   Epoch: 15   Global Step: 191120   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:18:21,087-Speed 3305.77 samples/sec   Loss 1.6891   LearningRate 0.0053   Epoch: 15   Global Step: 191130   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:18:24,161-Speed 3332.68 samples/sec   Loss 1.7435   LearningRate 0.0053   Epoch: 15   Global Step: 191140   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:18:27,316-Speed 3246.39 samples/sec   Loss 1.7364   LearningRate 0.0053   Epoch: 15   Global Step: 191150   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:18:30,475-Speed 3242.53 samples/sec   Loss 1.6831   LearningRate 0.0053   Epoch: 15   Global Step: 191160   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:18:33,631-Speed 3246.14 samples/sec   Loss 1.7361   LearningRate 0.0053   Epoch: 15   Global Step: 191170   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:18:36,824-Speed 3207.70 samples/sec   Loss 1.6928   LearningRate 0.0053   Epoch: 15   Global Step: 191180   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:18:39,908-Speed 3321.69 samples/sec   Loss 1.7197   LearningRate 0.0053   Epoch: 15   Global Step: 191190   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:18:42,976-Speed 3338.51 samples/sec   Loss 1.7593   LearningRate 0.0053   Epoch: 15   Global Step: 191200   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:18:46,079-Speed 3301.53 samples/sec   Loss 1.7339   LearningRate 0.0053   Epoch: 15   Global Step: 191210   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:18:49,224-Speed 3257.03 samples/sec   Loss 1.6920   LearningRate 0.0053   Epoch: 15   Global Step: 191220   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:18:52,365-Speed 3260.61 samples/sec   Loss 1.7599   LearningRate 0.0053   Epoch: 15   Global Step: 191230   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:18:55,534-Speed 3232.45 samples/sec   Loss 1.7058   LearningRate 0.0053   Epoch: 15   Global Step: 191240   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:18:58,625-Speed 3313.95 samples/sec   Loss 1.7736   LearningRate 0.0053   Epoch: 15   Global Step: 191250   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:19:01,706-Speed 3324.32 samples/sec   Loss 1.7243   LearningRate 0.0053   Epoch: 15   Global Step: 191260   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:19:04,842-Speed 3266.65 samples/sec   Loss 1.6898   LearningRate 0.0053   Epoch: 15   Global Step: 191270   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:19:07,951-Speed 3294.75 samples/sec   Loss 1.7296   LearningRate 0.0053   Epoch: 15   Global Step: 191280   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:19:11,092-Speed 3261.77 samples/sec   Loss 1.7460   LearningRate 0.0053   Epoch: 15   Global Step: 191290   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:19:14,214-Speed 3279.78 samples/sec   Loss 1.7221   LearningRate 0.0053   Epoch: 15   Global Step: 191300   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:19:17,347-Speed 3270.37 samples/sec   Loss 1.7275   LearningRate 0.0053   Epoch: 15   Global Step: 191310   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:19:20,435-Speed 3316.01 samples/sec   Loss 1.7359   LearningRate 0.0053   Epoch: 15   Global Step: 191320   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:19:23,542-Speed 3297.61 samples/sec   Loss 1.6926   LearningRate 0.0053   Epoch: 15   Global Step: 191330   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:19:26,676-Speed 3268.59 samples/sec   Loss 1.6986   LearningRate 0.0053   Epoch: 15   Global Step: 191340   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:19:29,755-Speed 3326.52 samples/sec   Loss 1.7239   LearningRate 0.0053   Epoch: 15   Global Step: 191350   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:19:32,812-Speed 3350.70 samples/sec   Loss 1.7400   LearningRate 0.0053   Epoch: 15   Global Step: 191360   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:19:35,956-Speed 3257.55 samples/sec   Loss 1.8100   LearningRate 0.0053   Epoch: 15   Global Step: 191370   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:19:39,077-Speed 3282.47 samples/sec   Loss 1.7227   LearningRate 0.0053   Epoch: 15   Global Step: 191380   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:19:42,185-Speed 3295.46 samples/sec   Loss 1.6327   LearningRate 0.0053   Epoch: 15   Global Step: 191390   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:19:45,250-Speed 3342.81 samples/sec   Loss 1.7325   LearningRate 0.0053   Epoch: 15   Global Step: 191400   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:19:48,356-Speed 3297.03 samples/sec   Loss 1.7224   LearningRate 0.0053   Epoch: 15   Global Step: 191410   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:19:51,432-Speed 3330.54 samples/sec   Loss 1.7511   LearningRate 0.0053   Epoch: 15   Global Step: 191420   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:19:54,508-Speed 3329.95 samples/sec   Loss 1.7472   LearningRate 0.0053   Epoch: 15   Global Step: 191430   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:19:57,564-Speed 3352.10 samples/sec   Loss 1.7070   LearningRate 0.0053   Epoch: 15   Global Step: 191440   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:20:00,663-Speed 3305.27 samples/sec   Loss 1.7394   LearningRate 0.0053   Epoch: 15   Global Step: 191450   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:20:03,771-Speed 3296.22 samples/sec   Loss 1.7309   LearningRate 0.0053   Epoch: 15   Global Step: 191460   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:20:06,882-Speed 3292.61 samples/sec   Loss 1.7055   LearningRate 0.0053   Epoch: 15   Global Step: 191470   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:20:09,925-Speed 3366.28 samples/sec   Loss 1.7109   LearningRate 0.0053   Epoch: 15   Global Step: 191480   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:20:12,998-Speed 3333.20 samples/sec   Loss 1.7744   LearningRate 0.0053   Epoch: 15   Global Step: 191490   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:20:16,065-Speed 3339.19 samples/sec   Loss 1.7287   LearningRate 0.0052   Epoch: 15   Global Step: 191500   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:20:19,252-Speed 3213.99 samples/sec   Loss 1.7069   LearningRate 0.0052   Epoch: 15   Global Step: 191510   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:20:22,331-Speed 3327.59 samples/sec   Loss 1.7411   LearningRate 0.0052   Epoch: 15   Global Step: 191520   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:20:25,415-Speed 3321.09 samples/sec   Loss 1.7057   LearningRate 0.0052   Epoch: 15   Global Step: 191530   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:20:28,491-Speed 3329.21 samples/sec   Loss 1.7703   LearningRate 0.0052   Epoch: 15   Global Step: 191540   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:20:31,593-Speed 3302.11 samples/sec   Loss 1.7812   LearningRate 0.0052   Epoch: 15   Global Step: 191550   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:20:34,689-Speed 3309.03 samples/sec   Loss 1.7147   LearningRate 0.0052   Epoch: 15   Global Step: 191560   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:20:37,810-Speed 3281.56 samples/sec   Loss 1.7245   LearningRate 0.0052   Epoch: 15   Global Step: 191570   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:20:40,942-Speed 3270.63 samples/sec   Loss 1.7353   LearningRate 0.0052   Epoch: 15   Global Step: 191580   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:20:44,043-Speed 3303.11 samples/sec   Loss 1.7036   LearningRate 0.0052   Epoch: 15   Global Step: 191590   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:20:47,115-Speed 3335.10 samples/sec   Loss 1.7406   LearningRate 0.0052   Epoch: 15   Global Step: 191600   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:20:50,290-Speed 3226.00 samples/sec   Loss 1.7474   LearningRate 0.0052   Epoch: 15   Global Step: 191610   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:20:53,516-Speed 3175.39 samples/sec   Loss 1.7109   LearningRate 0.0052   Epoch: 15   Global Step: 191620   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:20:56,630-Speed 3288.64 samples/sec   Loss 1.7700   LearningRate 0.0052   Epoch: 15   Global Step: 191630   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:20:59,722-Speed 3312.42 samples/sec   Loss 1.6502   LearningRate 0.0052   Epoch: 15   Global Step: 191640   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:21:02,929-Speed 3193.83 samples/sec   Loss 1.7449   LearningRate 0.0052   Epoch: 15   Global Step: 191650   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:21:06,073-Speed 3258.74 samples/sec   Loss 1.7074   LearningRate 0.0052   Epoch: 15   Global Step: 191660   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:21:09,140-Speed 3339.54 samples/sec   Loss 1.7175   LearningRate 0.0052   Epoch: 15   Global Step: 191670   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:21:12,248-Speed 3295.87 samples/sec   Loss 1.7834   LearningRate 0.0052   Epoch: 15   Global Step: 191680   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:21:15,399-Speed 3250.27 samples/sec   Loss 1.7455   LearningRate 0.0052   Epoch: 15   Global Step: 191690   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:21:18,490-Speed 3314.06 samples/sec   Loss 1.7478   LearningRate 0.0052   Epoch: 15   Global Step: 191700   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:21:21,556-Speed 3341.01 samples/sec   Loss 1.7767   LearningRate 0.0052   Epoch: 15   Global Step: 191710   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:21:24,676-Speed 3283.33 samples/sec   Loss 1.7264   LearningRate 0.0052   Epoch: 15   Global Step: 191720   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:21:27,759-Speed 3322.25 samples/sec   Loss 1.7448   LearningRate 0.0052   Epoch: 15   Global Step: 191730   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:21:30,917-Speed 3243.46 samples/sec   Loss 1.7509   LearningRate 0.0052   Epoch: 15   Global Step: 191740   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:21:33,984-Speed 3339.90 samples/sec   Loss 1.7507   LearningRate 0.0052   Epoch: 15   Global Step: 191750   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:21:37,106-Speed 3280.51 samples/sec   Loss 1.7146   LearningRate 0.0052   Epoch: 15   Global Step: 191760   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:21:40,196-Speed 3315.62 samples/sec   Loss 1.7190   LearningRate 0.0052   Epoch: 15   Global Step: 191770   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:21:43,285-Speed 3315.94 samples/sec   Loss 1.7521   LearningRate 0.0052   Epoch: 15   Global Step: 191780   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:21:46,391-Speed 3298.31 samples/sec   Loss 1.7550   LearningRate 0.0052   Epoch: 15   Global Step: 191790   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:21:49,486-Speed 3309.48 samples/sec   Loss 1.6939   LearningRate 0.0052   Epoch: 15   Global Step: 191800   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:21:52,600-Speed 3288.99 samples/sec   Loss 1.7532   LearningRate 0.0052   Epoch: 15   Global Step: 191810   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:21:55,687-Speed 3317.68 samples/sec   Loss 1.7016   LearningRate 0.0052   Epoch: 15   Global Step: 191820   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:21:58,790-Speed 3301.28 samples/sec   Loss 1.7196   LearningRate 0.0052   Epoch: 15   Global Step: 191830   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:22:01,892-Speed 3302.81 samples/sec   Loss 1.7322   LearningRate 0.0052   Epoch: 15   Global Step: 191840   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:22:05,067-Speed 3225.84 samples/sec   Loss 1.7314   LearningRate 0.0052   Epoch: 15   Global Step: 191850   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:22:08,201-Speed 3268.12 samples/sec   Loss 1.7471   LearningRate 0.0052   Epoch: 15   Global Step: 191860   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:22:11,308-Speed 3297.60 samples/sec   Loss 1.6903   LearningRate 0.0052   Epoch: 15   Global Step: 191870   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:22:14,426-Speed 3285.23 samples/sec   Loss 1.7653   LearningRate 0.0052   Epoch: 15   Global Step: 191880   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:22:17,490-Speed 3342.99 samples/sec   Loss 1.7258   LearningRate 0.0052   Epoch: 15   Global Step: 191890   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:22:20,567-Speed 3328.88 samples/sec   Loss 1.7788   LearningRate 0.0052   Epoch: 15   Global Step: 191900   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:22:23,723-Speed 3246.34 samples/sec   Loss 1.7689   LearningRate 0.0052   Epoch: 15   Global Step: 191910   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:22:26,818-Speed 3309.28 samples/sec   Loss 1.7208   LearningRate 0.0052   Epoch: 15   Global Step: 191920   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:22:29,885-Speed 3339.45 samples/sec   Loss 1.7122   LearningRate 0.0052   Epoch: 15   Global Step: 191930   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:22:33,013-Speed 3275.12 samples/sec   Loss 1.7016   LearningRate 0.0052   Epoch: 15   Global Step: 191940   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:22:36,112-Speed 3305.40 samples/sec   Loss 1.7384   LearningRate 0.0052   Epoch: 15   Global Step: 191950   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:22:39,247-Speed 3267.30 samples/sec   Loss 1.7197   LearningRate 0.0052   Epoch: 15   Global Step: 191960   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:22:42,429-Speed 3218.50 samples/sec   Loss 1.6851   LearningRate 0.0052   Epoch: 15   Global Step: 191970   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:22:45,534-Speed 3299.26 samples/sec   Loss 1.7017   LearningRate 0.0052   Epoch: 15   Global Step: 191980   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:22:48,620-Speed 3318.87 samples/sec   Loss 1.6754   LearningRate 0.0052   Epoch: 15   Global Step: 191990   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:22:51,889-Speed 3133.97 samples/sec   Loss 1.7359   LearningRate 0.0052   Epoch: 15   Global Step: 192000   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:22:55,041-Speed 3249.25 samples/sec   Loss 1.7066   LearningRate 0.0052   Epoch: 15   Global Step: 192010   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:22:58,157-Speed 3287.80 samples/sec   Loss 1.6789   LearningRate 0.0052   Epoch: 15   Global Step: 192020   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:23:01,247-Speed 3315.07 samples/sec   Loss 1.7878   LearningRate 0.0052   Epoch: 15   Global Step: 192030   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:23:04,416-Speed 3231.80 samples/sec   Loss 1.6584   LearningRate 0.0052   Epoch: 15   Global Step: 192040   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:23:07,494-Speed 3327.95 samples/sec   Loss 1.7122   LearningRate 0.0051   Epoch: 15   Global Step: 192050   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:23:10,562-Speed 3338.34 samples/sec   Loss 1.7493   LearningRate 0.0051   Epoch: 15   Global Step: 192060   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:23:13,714-Speed 3250.86 samples/sec   Loss 1.7341   LearningRate 0.0051   Epoch: 15   Global Step: 192070   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:23:16,881-Speed 3234.43 samples/sec   Loss 1.7342   LearningRate 0.0051   Epoch: 15   Global Step: 192080   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:23:20,006-Speed 3277.44 samples/sec   Loss 1.7382   LearningRate 0.0051   Epoch: 15   Global Step: 192090   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:23:23,115-Speed 3294.61 samples/sec   Loss 1.7739   LearningRate 0.0051   Epoch: 15   Global Step: 192100   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:23:26,223-Speed 3296.01 samples/sec   Loss 1.7259   LearningRate 0.0051   Epoch: 15   Global Step: 192110   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:23:29,381-Speed 3244.08 samples/sec   Loss 1.7655   LearningRate 0.0051   Epoch: 15   Global Step: 192120   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:23:32,458-Speed 3328.23 samples/sec   Loss 1.7392   LearningRate 0.0051   Epoch: 15   Global Step: 192130   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:23:35,586-Speed 3274.75 samples/sec   Loss 1.7521   LearningRate 0.0051   Epoch: 15   Global Step: 192140   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:23:38,688-Speed 3302.02 samples/sec   Loss 1.7586   LearningRate 0.0051   Epoch: 15   Global Step: 192150   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:23:41,860-Speed 3229.63 samples/sec   Loss 1.7113   LearningRate 0.0051   Epoch: 15   Global Step: 192160   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:23:44,963-Speed 3300.42 samples/sec   Loss 1.7303   LearningRate 0.0051   Epoch: 15   Global Step: 192170   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:23:48,070-Speed 3297.19 samples/sec   Loss 1.7851   LearningRate 0.0051   Epoch: 15   Global Step: 192180   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:23:51,221-Speed 3250.53 samples/sec   Loss 1.7268   LearningRate 0.0051   Epoch: 15   Global Step: 192190   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:23:54,359-Speed 3264.15 samples/sec   Loss 1.7532   LearningRate 0.0051   Epoch: 15   Global Step: 192200   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:23:57,472-Speed 3290.79 samples/sec   Loss 1.7134   LearningRate 0.0051   Epoch: 15   Global Step: 192210   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:24:00,625-Speed 3247.98 samples/sec   Loss 1.7336   LearningRate 0.0051   Epoch: 15   Global Step: 192220   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:24:03,727-Speed 3302.27 samples/sec   Loss 1.7214   LearningRate 0.0051   Epoch: 15   Global Step: 192230   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:24:06,820-Speed 3311.99 samples/sec   Loss 1.7839   LearningRate 0.0051   Epoch: 15   Global Step: 192240   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:24:09,888-Speed 3338.61 samples/sec   Loss 1.7108   LearningRate 0.0051   Epoch: 15   Global Step: 192250   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:24:13,023-Speed 3266.32 samples/sec   Loss 1.7501   LearningRate 0.0051   Epoch: 15   Global Step: 192260   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:24:16,176-Speed 3249.77 samples/sec   Loss 1.7826   LearningRate 0.0051   Epoch: 15   Global Step: 192270   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:24:19,262-Speed 3318.45 samples/sec   Loss 1.7423   LearningRate 0.0051   Epoch: 15   Global Step: 192280   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:24:22,344-Speed 3323.71 samples/sec   Loss 1.7876   LearningRate 0.0051   Epoch: 15   Global Step: 192290   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:24:25,541-Speed 3204.55 samples/sec   Loss 1.7584   LearningRate 0.0051   Epoch: 15   Global Step: 192300   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:24:28,645-Speed 3299.79 samples/sec   Loss 1.7888   LearningRate 0.0051   Epoch: 15   Global Step: 192310   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:24:31,764-Speed 3284.22 samples/sec   Loss 1.7162   LearningRate 0.0051   Epoch: 15   Global Step: 192320   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:24:34,873-Speed 3294.55 samples/sec   Loss 1.7366   LearningRate 0.0051   Epoch: 15   Global Step: 192330   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:24:37,946-Speed 3333.82 samples/sec   Loss 1.7328   LearningRate 0.0051   Epoch: 15   Global Step: 192340   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:24:41,121-Speed 3225.68 samples/sec   Loss 1.7416   LearningRate 0.0051   Epoch: 15   Global Step: 192350   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:24:44,231-Speed 3293.44 samples/sec   Loss 1.7303   LearningRate 0.0051   Epoch: 15   Global Step: 192360   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:24:47,344-Speed 3290.60 samples/sec   Loss 1.7520   LearningRate 0.0051   Epoch: 15   Global Step: 192370   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:24:50,460-Speed 3286.94 samples/sec   Loss 1.7431   LearningRate 0.0051   Epoch: 15   Global Step: 192380   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:24:53,541-Speed 3324.66 samples/sec   Loss 1.7313   LearningRate 0.0051   Epoch: 15   Global Step: 192390   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:24:56,620-Speed 3327.51 samples/sec   Loss 1.7527   LearningRate 0.0051   Epoch: 15   Global Step: 192400   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:24:59,751-Speed 3271.47 samples/sec   Loss 1.7775   LearningRate 0.0051   Epoch: 15   Global Step: 192410   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:25:02,829-Speed 3327.42 samples/sec   Loss 1.7484   LearningRate 0.0051   Epoch: 15   Global Step: 192420   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:25:05,930-Speed 3303.80 samples/sec   Loss 1.7241   LearningRate 0.0051   Epoch: 15   Global Step: 192430   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:25:09,008-Speed 3328.06 samples/sec   Loss 1.7813   LearningRate 0.0051   Epoch: 15   Global Step: 192440   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:25:12,150-Speed 3259.23 samples/sec   Loss 1.7487   LearningRate 0.0051   Epoch: 15   Global Step: 192450   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:25:15,247-Speed 3307.34 samples/sec   Loss 1.7238   LearningRate 0.0051   Epoch: 15   Global Step: 192460   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:25:18,350-Speed 3300.81 samples/sec   Loss 1.7563   LearningRate 0.0051   Epoch: 15   Global Step: 192470   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:25:21,412-Speed 3345.43 samples/sec   Loss 1.7049   LearningRate 0.0051   Epoch: 15   Global Step: 192480   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:25:24,499-Speed 3318.69 samples/sec   Loss 1.8137   LearningRate 0.0051   Epoch: 15   Global Step: 192490   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:25:27,580-Speed 3325.10 samples/sec   Loss 1.7410   LearningRate 0.0051   Epoch: 15   Global Step: 192500   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:25:30,658-Speed 3327.63 samples/sec   Loss 1.6946   LearningRate 0.0051   Epoch: 15   Global Step: 192510   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:25:33,719-Speed 3346.26 samples/sec   Loss 1.7528   LearningRate 0.0051   Epoch: 15   Global Step: 192520   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:25:36,792-Speed 3332.64 samples/sec   Loss 1.6959   LearningRate 0.0051   Epoch: 15   Global Step: 192530   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:25:39,915-Speed 3280.75 samples/sec   Loss 1.6994   LearningRate 0.0051   Epoch: 15   Global Step: 192540   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:25:43,032-Speed 3285.83 samples/sec   Loss 1.7174   LearningRate 0.0051   Epoch: 15   Global Step: 192550   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:25:46,170-Speed 3264.51 samples/sec   Loss 1.7479   LearningRate 0.0051   Epoch: 15   Global Step: 192560   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:25:49,282-Speed 3290.98 samples/sec   Loss 1.6915   LearningRate 0.0051   Epoch: 15   Global Step: 192570   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:25:52,410-Speed 3275.00 samples/sec   Loss 1.7222   LearningRate 0.0051   Epoch: 15   Global Step: 192580   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:25:55,491-Speed 3324.27 samples/sec   Loss 1.7488   LearningRate 0.0051   Epoch: 15   Global Step: 192590   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:25:58,540-Speed 3359.90 samples/sec   Loss 1.7841   LearningRate 0.0050   Epoch: 15   Global Step: 192600   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:26:01,597-Speed 3350.50 samples/sec   Loss 1.7391   LearningRate 0.0050   Epoch: 15   Global Step: 192610   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:26:04,692-Speed 3309.98 samples/sec   Loss 1.7808   LearningRate 0.0050   Epoch: 15   Global Step: 192620   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:26:07,796-Speed 3299.82 samples/sec   Loss 1.7277   LearningRate 0.0050   Epoch: 15   Global Step: 192630   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:26:10,907-Speed 3292.03 samples/sec   Loss 1.7350   LearningRate 0.0050   Epoch: 15   Global Step: 192640   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:26:13,991-Speed 3321.25 samples/sec   Loss 1.7257   LearningRate 0.0050   Epoch: 15   Global Step: 192650   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:26:17,135-Speed 3258.60 samples/sec   Loss 1.7219   LearningRate 0.0050   Epoch: 15   Global Step: 192660   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:26:20,197-Speed 3344.88 samples/sec   Loss 1.7177   LearningRate 0.0050   Epoch: 15   Global Step: 192670   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:26:23,307-Speed 3293.70 samples/sec   Loss 1.6963   LearningRate 0.0050   Epoch: 15   Global Step: 192680   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:26:26,470-Speed 3239.10 samples/sec   Loss 1.7617   LearningRate 0.0050   Epoch: 15   Global Step: 192690   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:26:29,632-Speed 3238.89 samples/sec   Loss 1.7637   LearningRate 0.0050   Epoch: 15   Global Step: 192700   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:26:32,758-Speed 3277.17 samples/sec   Loss 1.7364   LearningRate 0.0050   Epoch: 15   Global Step: 192710   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:26:35,873-Speed 3288.26 samples/sec   Loss 1.7324   LearningRate 0.0050   Epoch: 15   Global Step: 192720   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:26:38,952-Speed 3326.59 samples/sec   Loss 1.7189   LearningRate 0.0050   Epoch: 15   Global Step: 192730   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:26:42,045-Speed 3311.25 samples/sec   Loss 1.7824   LearningRate 0.0050   Epoch: 15   Global Step: 192740   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:26:45,185-Speed 3263.25 samples/sec   Loss 1.7489   LearningRate 0.0050   Epoch: 15   Global Step: 192750   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:26:48,346-Speed 3239.53 samples/sec   Loss 1.7476   LearningRate 0.0050   Epoch: 15   Global Step: 192760   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:26:51,450-Speed 3300.46 samples/sec   Loss 1.7441   LearningRate 0.0050   Epoch: 15   Global Step: 192770   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:26:54,591-Speed 3261.45 samples/sec   Loss 1.7709   LearningRate 0.0050   Epoch: 15   Global Step: 192780   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:26:57,653-Speed 3345.21 samples/sec   Loss 1.7883   LearningRate 0.0050   Epoch: 15   Global Step: 192790   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:27:00,748-Speed 3309.60 samples/sec   Loss 1.6746   LearningRate 0.0050   Epoch: 15   Global Step: 192800   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:27:03,844-Speed 3308.46 samples/sec   Loss 1.7404   LearningRate 0.0050   Epoch: 15   Global Step: 192810   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:27:06,911-Speed 3338.91 samples/sec   Loss 1.7183   LearningRate 0.0050   Epoch: 15   Global Step: 192820   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:27:10,028-Speed 3286.63 samples/sec   Loss 1.7136   LearningRate 0.0050   Epoch: 15   Global Step: 192830   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:27:13,143-Speed 3289.26 samples/sec   Loss 1.6966   LearningRate 0.0050   Epoch: 15   Global Step: 192840   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:27:16,307-Speed 3236.69 samples/sec   Loss 1.7246   LearningRate 0.0050   Epoch: 15   Global Step: 192850   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:27:19,434-Speed 3275.76 samples/sec   Loss 1.7664   LearningRate 0.0050   Epoch: 15   Global Step: 192860   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:27:22,494-Speed 3347.77 samples/sec   Loss 1.7589   LearningRate 0.0050   Epoch: 15   Global Step: 192870   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:27:25,577-Speed 3322.12 samples/sec   Loss 1.7852   LearningRate 0.0050   Epoch: 15   Global Step: 192880   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:27:28,686-Speed 3295.07 samples/sec   Loss 1.7228   LearningRate 0.0050   Epoch: 15   Global Step: 192890   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:27:31,831-Speed 3257.17 samples/sec   Loss 1.7692   LearningRate 0.0050   Epoch: 15   Global Step: 192900   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:27:34,903-Speed 3334.48 samples/sec   Loss 1.7303   LearningRate 0.0050   Epoch: 15   Global Step: 192910   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:27:38,049-Speed 3256.09 samples/sec   Loss 1.8089   LearningRate 0.0050   Epoch: 15   Global Step: 192920   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:27:41,193-Speed 3258.26 samples/sec   Loss 1.7778   LearningRate 0.0050   Epoch: 15   Global Step: 192930   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:27:44,269-Speed 3329.71 samples/sec   Loss 1.7810   LearningRate 0.0050   Epoch: 15   Global Step: 192940   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:27:47,353-Speed 3320.77 samples/sec   Loss 1.7915   LearningRate 0.0050   Epoch: 15   Global Step: 192950   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:27:50,464-Speed 3292.88 samples/sec   Loss 1.7492   LearningRate 0.0050   Epoch: 15   Global Step: 192960   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:27:53,558-Speed 3311.54 samples/sec   Loss 1.7445   LearningRate 0.0050   Epoch: 15   Global Step: 192970   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:27:56,658-Speed 3303.07 samples/sec   Loss 1.7508   LearningRate 0.0050   Epoch: 15   Global Step: 192980   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:27:59,739-Speed 3325.70 samples/sec   Loss 1.7739   LearningRate 0.0050   Epoch: 15   Global Step: 192990   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:28:02,953-Speed 3186.29 samples/sec   Loss 1.7243   LearningRate 0.0050   Epoch: 15   Global Step: 193000   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:28:06,167-Speed 3188.22 samples/sec   Loss 1.7044   LearningRate 0.0050   Epoch: 15   Global Step: 193010   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:28:09,270-Speed 3300.34 samples/sec   Loss 1.7663   LearningRate 0.0050   Epoch: 15   Global Step: 193020   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:28:12,389-Speed 3283.87 samples/sec   Loss 1.7077   LearningRate 0.0050   Epoch: 15   Global Step: 193030   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:28:15,565-Speed 3225.54 samples/sec   Loss 1.7627   LearningRate 0.0050   Epoch: 15   Global Step: 193040   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:28:18,747-Speed 3218.91 samples/sec   Loss 1.7819   LearningRate 0.0050   Epoch: 15   Global Step: 193050   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:28:21,811-Speed 3342.90 samples/sec   Loss 1.7238   LearningRate 0.0050   Epoch: 15   Global Step: 193060   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:28:24,956-Speed 3256.64 samples/sec   Loss 1.7482   LearningRate 0.0050   Epoch: 15   Global Step: 193070   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:28:28,037-Speed 3325.26 samples/sec   Loss 1.7231   LearningRate 0.0050   Epoch: 15   Global Step: 193080   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:28:31,205-Speed 3232.76 samples/sec   Loss 1.7553   LearningRate 0.0050   Epoch: 15   Global Step: 193090   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:28:34,311-Speed 3297.67 samples/sec   Loss 1.7181   LearningRate 0.0050   Epoch: 15   Global Step: 193100   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:28:37,399-Speed 3317.50 samples/sec   Loss 1.7775   LearningRate 0.0050   Epoch: 15   Global Step: 193110   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:28:40,494-Speed 3310.15 samples/sec   Loss 1.7616   LearningRate 0.0050   Epoch: 15   Global Step: 193120   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:28:43,654-Speed 3240.98 samples/sec   Loss 1.7363   LearningRate 0.0050   Epoch: 15   Global Step: 193130   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:28:46,715-Speed 3346.50 samples/sec   Loss 1.7221   LearningRate 0.0050   Epoch: 15   Global Step: 193140   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:28:49,862-Speed 3255.14 samples/sec   Loss 1.7645   LearningRate 0.0050   Epoch: 15   Global Step: 193150   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:28:53,008-Speed 3255.91 samples/sec   Loss 1.7670   LearningRate 0.0049   Epoch: 15   Global Step: 193160   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:28:56,120-Speed 3291.90 samples/sec   Loss 1.7899   LearningRate 0.0049   Epoch: 15   Global Step: 193170   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:28:59,193-Speed 3332.67 samples/sec   Loss 1.7182   LearningRate 0.0049   Epoch: 15   Global Step: 193180   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:29:02,345-Speed 3249.87 samples/sec   Loss 1.7275   LearningRate 0.0049   Epoch: 15   Global Step: 193190   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:29:05,452-Speed 3296.94 samples/sec   Loss 1.8009   LearningRate 0.0049   Epoch: 15   Global Step: 193200   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:29:08,593-Speed 3260.86 samples/sec   Loss 1.8037   LearningRate 0.0049   Epoch: 15   Global Step: 193210   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:29:11,727-Speed 3268.23 samples/sec   Loss 1.8027   LearningRate 0.0049   Epoch: 15   Global Step: 193220   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:29:14,832-Speed 3299.32 samples/sec   Loss 1.7305   LearningRate 0.0049   Epoch: 15   Global Step: 193230   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:29:17,918-Speed 3318.82 samples/sec   Loss 1.7641   LearningRate 0.0049   Epoch: 15   Global Step: 193240   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:29:21,048-Speed 3272.65 samples/sec   Loss 1.7805   LearningRate 0.0049   Epoch: 15   Global Step: 193250   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:29:24,123-Speed 3331.51 samples/sec   Loss 1.7192   LearningRate 0.0049   Epoch: 15   Global Step: 193260   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:29:27,214-Speed 3313.11 samples/sec   Loss 1.7786   LearningRate 0.0049   Epoch: 15   Global Step: 193270   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:29:30,296-Speed 3324.12 samples/sec   Loss 1.7715   LearningRate 0.0049   Epoch: 15   Global Step: 193280   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:29:33,360-Speed 3343.31 samples/sec   Loss 1.7387   LearningRate 0.0049   Epoch: 15   Global Step: 193290   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:29:36,431-Speed 3335.02 samples/sec   Loss 1.8132   LearningRate 0.0049   Epoch: 15   Global Step: 193300   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:29:39,461-Speed 3380.61 samples/sec   Loss 1.7263   LearningRate 0.0049   Epoch: 15   Global Step: 193310   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:29:42,531-Speed 3336.86 samples/sec   Loss 1.7396   LearningRate 0.0049   Epoch: 15   Global Step: 193320   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:29:45,619-Speed 3317.43 samples/sec   Loss 1.8188   LearningRate 0.0049   Epoch: 15   Global Step: 193330   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:29:48,761-Speed 3259.65 samples/sec   Loss 1.7599   LearningRate 0.0049   Epoch: 15   Global Step: 193340   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:29:51,894-Speed 3269.63 samples/sec   Loss 1.7521   LearningRate 0.0049   Epoch: 15   Global Step: 193350   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:29:55,024-Speed 3271.93 samples/sec   Loss 1.7479   LearningRate 0.0049   Epoch: 15   Global Step: 193360   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:29:58,142-Speed 3285.18 samples/sec   Loss 1.7591   LearningRate 0.0049   Epoch: 15   Global Step: 193370   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:30:01,253-Speed 3293.08 samples/sec   Loss 1.7387   LearningRate 0.0049   Epoch: 15   Global Step: 193380   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:30:04,355-Speed 3302.27 samples/sec   Loss 1.7762   LearningRate 0.0049   Epoch: 15   Global Step: 193390   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:30:07,439-Speed 3321.16 samples/sec   Loss 1.7479   LearningRate 0.0049   Epoch: 15   Global Step: 193400   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:30:10,508-Speed 3337.34 samples/sec   Loss 1.7335   LearningRate 0.0049   Epoch: 15   Global Step: 193410   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:30:13,636-Speed 3275.00 samples/sec   Loss 1.7632   LearningRate 0.0049   Epoch: 15   Global Step: 193420   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:30:16,792-Speed 3245.84 samples/sec   Loss 1.7938   LearningRate 0.0049   Epoch: 15   Global Step: 193430   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:30:19,929-Speed 3267.62 samples/sec   Loss 1.7457   LearningRate 0.0049   Epoch: 15   Global Step: 193440   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:30:23,057-Speed 3275.22 samples/sec   Loss 1.6693   LearningRate 0.0049   Epoch: 15   Global Step: 193450   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:30:26,140-Speed 3322.75 samples/sec   Loss 1.7617   LearningRate 0.0049   Epoch: 15   Global Step: 193460   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:30:29,308-Speed 3232.74 samples/sec   Loss 1.7650   LearningRate 0.0049   Epoch: 15   Global Step: 193470   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:30:32,423-Speed 3288.60 samples/sec   Loss 1.7792   LearningRate 0.0049   Epoch: 15   Global Step: 193480   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:30:35,535-Speed 3291.79 samples/sec   Loss 1.7915   LearningRate 0.0049   Epoch: 15   Global Step: 193490   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:30:38,632-Speed 3307.01 samples/sec   Loss 1.7487   LearningRate 0.0049   Epoch: 15   Global Step: 193500   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:30:41,736-Speed 3299.87 samples/sec   Loss 1.7708   LearningRate 0.0049   Epoch: 15   Global Step: 193510   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:30:44,897-Speed 3240.93 samples/sec   Loss 1.7782   LearningRate 0.0049   Epoch: 15   Global Step: 193520   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:30:48,050-Speed 3248.95 samples/sec   Loss 1.7279   LearningRate 0.0049   Epoch: 15   Global Step: 193530   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:30:51,177-Speed 3275.59 samples/sec   Loss 1.7721   LearningRate 0.0049   Epoch: 15   Global Step: 193540   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:30:54,330-Speed 3248.63 samples/sec   Loss 1.7829   LearningRate 0.0049   Epoch: 15   Global Step: 193550   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:30:57,486-Speed 3245.14 samples/sec   Loss 1.8050   LearningRate 0.0049   Epoch: 15   Global Step: 193560   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:31:00,650-Speed 3237.63 samples/sec   Loss 1.7620   LearningRate 0.0049   Epoch: 15   Global Step: 193570   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:31:03,824-Speed 3227.93 samples/sec   Loss 1.7883   LearningRate 0.0049   Epoch: 15   Global Step: 193580   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:31:06,997-Speed 3228.08 samples/sec   Loss 1.7186   LearningRate 0.0049   Epoch: 15   Global Step: 193590   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:31:10,042-Speed 3362.84 samples/sec   Loss 1.7673   LearningRate 0.0049   Epoch: 15   Global Step: 193600   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:31:13,186-Speed 3259.11 samples/sec   Loss 1.7447   LearningRate 0.0049   Epoch: 15   Global Step: 193610   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:31:16,352-Speed 3235.48 samples/sec   Loss 1.7403   LearningRate 0.0049   Epoch: 15   Global Step: 193620   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:31:19,472-Speed 3282.77 samples/sec   Loss 1.7858   LearningRate 0.0049   Epoch: 15   Global Step: 193630   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:31:22,580-Speed 3295.44 samples/sec   Loss 1.8242   LearningRate 0.0049   Epoch: 15   Global Step: 193640   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:31:25,824-Speed 3157.38 samples/sec   Loss 1.7538   LearningRate 0.0049   Epoch: 15   Global Step: 193650   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:31:28,910-Speed 3319.39 samples/sec   Loss 1.7659   LearningRate 0.0049   Epoch: 15   Global Step: 193660   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:31:31,998-Speed 3316.90 samples/sec   Loss 1.7365   LearningRate 0.0049   Epoch: 15   Global Step: 193670   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:31:35,056-Speed 3349.91 samples/sec   Loss 1.6873   LearningRate 0.0049   Epoch: 15   Global Step: 193680   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:31:38,129-Speed 3333.81 samples/sec   Loss 1.7877   LearningRate 0.0049   Epoch: 15   Global Step: 193690   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:31:41,232-Speed 3300.38 samples/sec   Loss 1.7336   LearningRate 0.0049   Epoch: 15   Global Step: 193700   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:31:44,318-Speed 3319.47 samples/sec   Loss 1.7669   LearningRate 0.0049   Epoch: 15   Global Step: 193710   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:31:47,390-Speed 3334.24 samples/sec   Loss 1.7508   LearningRate 0.0048   Epoch: 15   Global Step: 193720   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:31:50,585-Speed 3205.95 samples/sec   Loss 1.7524   LearningRate 0.0048   Epoch: 15   Global Step: 193730   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:31:53,848-Speed 3139.90 samples/sec   Loss 1.7582   LearningRate 0.0048   Epoch: 15   Global Step: 193740   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:31:56,965-Speed 3285.68 samples/sec   Loss 1.6667   LearningRate 0.0048   Epoch: 15   Global Step: 193750   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:32:00,031-Speed 3341.05 samples/sec   Loss 1.7512   LearningRate 0.0048   Epoch: 15   Global Step: 193760   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:32:03,101-Speed 3336.92 samples/sec   Loss 1.7544   LearningRate 0.0048   Epoch: 15   Global Step: 193770   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:32:06,164-Speed 3343.90 samples/sec   Loss 1.7063   LearningRate 0.0048   Epoch: 15   Global Step: 193780   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:32:09,216-Speed 3356.13 samples/sec   Loss 1.7242   LearningRate 0.0048   Epoch: 15   Global Step: 193790   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:32:12,266-Speed 3358.34 samples/sec   Loss 1.6917   LearningRate 0.0048   Epoch: 15   Global Step: 193800   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:32:15,394-Speed 3275.28 samples/sec   Loss 1.7885   LearningRate 0.0048   Epoch: 15   Global Step: 193810   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:32:18,550-Speed 3245.51 samples/sec   Loss 1.7903   LearningRate 0.0048   Epoch: 15   Global Step: 193820   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:32:21,640-Speed 3314.94 samples/sec   Loss 1.7003   LearningRate 0.0048   Epoch: 15   Global Step: 193830   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:32:24,742-Speed 3302.25 samples/sec   Loss 1.8082   LearningRate 0.0048   Epoch: 15   Global Step: 193840   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:32:27,891-Speed 3252.40 samples/sec   Loss 1.7638   LearningRate 0.0048   Epoch: 15   Global Step: 193850   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:32:30,964-Speed 3334.02 samples/sec   Loss 1.7551   LearningRate 0.0048   Epoch: 15   Global Step: 193860   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:32:34,056-Speed 3312.76 samples/sec   Loss 1.7017   LearningRate 0.0048   Epoch: 15   Global Step: 193870   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:32:37,155-Speed 3304.88 samples/sec   Loss 1.7782   LearningRate 0.0048   Epoch: 15   Global Step: 193880   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:32:40,293-Speed 3264.87 samples/sec   Loss 1.7384   LearningRate 0.0048   Epoch: 15   Global Step: 193890   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:32:43,439-Speed 3255.59 samples/sec   Loss 1.7360   LearningRate 0.0048   Epoch: 15   Global Step: 193900   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:32:46,565-Speed 3277.49 samples/sec   Loss 1.7183   LearningRate 0.0048   Epoch: 15   Global Step: 193910   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:32:49,733-Speed 3232.45 samples/sec   Loss 1.7731   LearningRate 0.0048   Epoch: 15   Global Step: 193920   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:32:52,849-Speed 3287.28 samples/sec   Loss 1.7577   LearningRate 0.0048   Epoch: 15   Global Step: 193930   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:32:55,948-Speed 3306.26 samples/sec   Loss 1.7891   LearningRate 0.0048   Epoch: 15   Global Step: 193940   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:32:59,068-Speed 3282.83 samples/sec   Loss 1.7679   LearningRate 0.0048   Epoch: 15   Global Step: 193950   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:33:02,192-Speed 3279.05 samples/sec   Loss 1.7781   LearningRate 0.0048   Epoch: 15   Global Step: 193960   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:33:05,438-Speed 3155.69 samples/sec   Loss 1.7749   LearningRate 0.0048   Epoch: 15   Global Step: 193970   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:33:08,516-Speed 3327.90 samples/sec   Loss 1.7358   LearningRate 0.0048   Epoch: 15   Global Step: 193980   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:33:11,644-Speed 3274.10 samples/sec   Loss 1.7636   LearningRate 0.0048   Epoch: 15   Global Step: 193990   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:33:14,861-Speed 3184.22 samples/sec   Loss 1.8029   LearningRate 0.0048   Epoch: 15   Global Step: 194000   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:33:18,073-Speed 3188.72 samples/sec   Loss 1.6925   LearningRate 0.0048   Epoch: 15   Global Step: 194010   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:33:21,143-Speed 3336.68 samples/sec   Loss 1.7447   LearningRate 0.0048   Epoch: 15   Global Step: 194020   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:33:24,241-Speed 3306.41 samples/sec   Loss 1.7618   LearningRate 0.0048   Epoch: 15   Global Step: 194030   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:33:27,386-Speed 3257.48 samples/sec   Loss 1.7799   LearningRate 0.0048   Epoch: 15   Global Step: 194040   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:33:30,576-Speed 3210.88 samples/sec   Loss 1.7250   LearningRate 0.0048   Epoch: 15   Global Step: 194050   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:33:33,671-Speed 3309.64 samples/sec   Loss 1.7633   LearningRate 0.0048   Epoch: 15   Global Step: 194060   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:33:36,816-Speed 3256.53 samples/sec   Loss 1.7442   LearningRate 0.0048   Epoch: 15   Global Step: 194070   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:33:39,906-Speed 3315.13 samples/sec   Loss 1.7643   LearningRate 0.0048   Epoch: 15   Global Step: 194080   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:33:43,130-Speed 3176.81 samples/sec   Loss 1.7660   LearningRate 0.0048   Epoch: 15   Global Step: 194090   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:33:46,241-Speed 3293.27 samples/sec   Loss 1.7592   LearningRate 0.0048   Epoch: 15   Global Step: 194100   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:33:49,399-Speed 3243.28 samples/sec   Loss 1.7516   LearningRate 0.0048   Epoch: 15   Global Step: 194110   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:33:52,542-Speed 3258.80 samples/sec   Loss 1.7535   LearningRate 0.0048   Epoch: 15   Global Step: 194120   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:33:55,642-Speed 3304.87 samples/sec   Loss 1.7594   LearningRate 0.0048   Epoch: 15   Global Step: 194130   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:33:58,706-Speed 3342.83 samples/sec   Loss 1.7906   LearningRate 0.0048   Epoch: 15   Global Step: 194140   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:34:01,831-Speed 3277.08 samples/sec   Loss 1.7671   LearningRate 0.0048   Epoch: 15   Global Step: 194150   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:34:04,969-Speed 3264.12 samples/sec   Loss 1.7444   LearningRate 0.0048   Epoch: 15   Global Step: 194160   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:34:08,031-Speed 3345.79 samples/sec   Loss 1.7352   LearningRate 0.0048   Epoch: 15   Global Step: 194170   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:34:11,089-Speed 3349.40 samples/sec   Loss 1.7478   LearningRate 0.0048   Epoch: 15   Global Step: 194180   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:34:14,181-Speed 3313.09 samples/sec   Loss 1.7372   LearningRate 0.0048   Epoch: 15   Global Step: 194190   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:34:17,293-Speed 3291.44 samples/sec   Loss 1.7111   LearningRate 0.0048   Epoch: 15   Global Step: 194200   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:34:20,371-Speed 3328.41 samples/sec   Loss 1.7315   LearningRate 0.0048   Epoch: 15   Global Step: 194210   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:34:23,474-Speed 3300.27 samples/sec   Loss 1.7975   LearningRate 0.0048   Epoch: 15   Global Step: 194220   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:34:26,565-Speed 3314.05 samples/sec   Loss 1.7468   LearningRate 0.0048   Epoch: 15   Global Step: 194230   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:34:29,688-Speed 3280.09 samples/sec   Loss 1.7594   LearningRate 0.0048   Epoch: 15   Global Step: 194240   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:34:32,792-Speed 3299.84 samples/sec   Loss 1.7581   LearningRate 0.0048   Epoch: 15   Global Step: 194250   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:34:35,919-Speed 3276.07 samples/sec   Loss 1.7856   LearningRate 0.0048   Epoch: 15   Global Step: 194260   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:34:39,122-Speed 3197.98 samples/sec   Loss 1.7477   LearningRate 0.0048   Epoch: 15   Global Step: 194270   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:34:42,244-Speed 3281.48 samples/sec   Loss 1.7618   LearningRate 0.0047   Epoch: 15   Global Step: 194280   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:34:45,334-Speed 3315.13 samples/sec   Loss 1.7891   LearningRate 0.0047   Epoch: 15   Global Step: 194290   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:34:48,386-Speed 3355.75 samples/sec   Loss 1.7586   LearningRate 0.0047   Epoch: 15   Global Step: 194300   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:34:51,547-Speed 3240.86 samples/sec   Loss 1.7879   LearningRate 0.0047   Epoch: 15   Global Step: 194310   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:34:54,713-Speed 3235.17 samples/sec   Loss 1.7591   LearningRate 0.0047   Epoch: 15   Global Step: 194320   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:34:57,837-Speed 3279.35 samples/sec   Loss 1.7372   LearningRate 0.0047   Epoch: 15   Global Step: 194330   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:35:00,993-Speed 3245.76 samples/sec   Loss 1.7564   LearningRate 0.0047   Epoch: 15   Global Step: 194340   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:35:04,142-Speed 3253.31 samples/sec   Loss 1.7829   LearningRate 0.0047   Epoch: 15   Global Step: 194350   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:35:07,278-Speed 3265.51 samples/sec   Loss 1.8028   LearningRate 0.0047   Epoch: 15   Global Step: 194360   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:35:10,384-Speed 3298.45 samples/sec   Loss 1.7315   LearningRate 0.0047   Epoch: 15   Global Step: 194370   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:35:13,499-Speed 3288.74 samples/sec   Loss 1.7503   LearningRate 0.0047   Epoch: 15   Global Step: 194380   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:35:16,588-Speed 3316.33 samples/sec   Loss 1.6951   LearningRate 0.0047   Epoch: 15   Global Step: 194390   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:35:19,693-Speed 3297.78 samples/sec   Loss 1.6933   LearningRate 0.0047   Epoch: 15   Global Step: 194400   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:35:22,797-Speed 3301.00 samples/sec   Loss 1.8167   LearningRate 0.0047   Epoch: 15   Global Step: 194410   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:35:25,858-Speed 3346.16 samples/sec   Loss 1.7432   LearningRate 0.0047   Epoch: 15   Global Step: 194420   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:35:28,983-Speed 3277.46 samples/sec   Loss 1.7742   LearningRate 0.0047   Epoch: 15   Global Step: 194430   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:35:32,072-Speed 3315.96 samples/sec   Loss 1.7744   LearningRate 0.0047   Epoch: 15   Global Step: 194440   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:35:35,191-Speed 3284.03 samples/sec   Loss 1.7515   LearningRate 0.0047   Epoch: 15   Global Step: 194450   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:35:38,269-Speed 3328.29 samples/sec   Loss 1.7677   LearningRate 0.0047   Epoch: 15   Global Step: 194460   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:35:41,345-Speed 3330.20 samples/sec   Loss 1.7949   LearningRate 0.0047   Epoch: 15   Global Step: 194470   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:35:44,406-Speed 3346.33 samples/sec   Loss 1.7958   LearningRate 0.0047   Epoch: 15   Global Step: 194480   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:35:47,527-Speed 3282.63 samples/sec   Loss 1.7749   LearningRate 0.0047   Epoch: 15   Global Step: 194490   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:35:50,602-Speed 3331.17 samples/sec   Loss 1.7679   LearningRate 0.0047   Epoch: 15   Global Step: 194500   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:35:53,798-Speed 3204.99 samples/sec   Loss 1.8079   LearningRate 0.0047   Epoch: 15   Global Step: 194510   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:35:56,868-Speed 3336.92 samples/sec   Loss 1.7364   LearningRate 0.0047   Epoch: 15   Global Step: 194520   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:35:59,963-Speed 3309.30 samples/sec   Loss 1.7434   LearningRate 0.0047   Epoch: 15   Global Step: 194530   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:36:03,087-Speed 3279.40 samples/sec   Loss 1.8077   LearningRate 0.0047   Epoch: 15   Global Step: 194540   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:36:06,217-Speed 3272.79 samples/sec   Loss 1.7616   LearningRate 0.0047   Epoch: 15   Global Step: 194550   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:36:09,275-Speed 3349.07 samples/sec   Loss 1.8285   LearningRate 0.0047   Epoch: 15   Global Step: 194560   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:36:12,432-Speed 3244.91 samples/sec   Loss 1.7536   LearningRate 0.0047   Epoch: 15   Global Step: 194570   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:36:15,530-Speed 3306.73 samples/sec   Loss 1.8027   LearningRate 0.0047   Epoch: 15   Global Step: 194580   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:36:18,592-Speed 3344.88 samples/sec   Loss 1.7542   LearningRate 0.0047   Epoch: 15   Global Step: 194590   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:36:21,655-Speed 3344.11 samples/sec   Loss 1.7590   LearningRate 0.0047   Epoch: 15   Global Step: 194600   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:36:24,743-Speed 3317.62 samples/sec   Loss 1.7733   LearningRate 0.0047   Epoch: 15   Global Step: 194610   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:36:27,848-Speed 3298.83 samples/sec   Loss 1.7990   LearningRate 0.0047   Epoch: 15   Global Step: 194620   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:36:30,944-Speed 3308.74 samples/sec   Loss 1.8237   LearningRate 0.0047   Epoch: 15   Global Step: 194630   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:36:34,036-Speed 3313.26 samples/sec   Loss 1.6719   LearningRate 0.0047   Epoch: 15   Global Step: 194640   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:36:37,152-Speed 3286.89 samples/sec   Loss 1.7551   LearningRate 0.0047   Epoch: 15   Global Step: 194650   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:36:40,241-Speed 3315.95 samples/sec   Loss 1.7764   LearningRate 0.0047   Epoch: 15   Global Step: 194660   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:36:43,390-Speed 3252.80 samples/sec   Loss 1.7181   LearningRate 0.0047   Epoch: 15   Global Step: 194670   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:36:46,495-Speed 3299.47 samples/sec   Loss 1.7502   LearningRate 0.0047   Epoch: 15   Global Step: 194680   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:36:49,545-Speed 3358.01 samples/sec   Loss 1.8380   LearningRate 0.0047   Epoch: 15   Global Step: 194690   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:36:52,622-Speed 3329.81 samples/sec   Loss 1.7805   LearningRate 0.0047   Epoch: 15   Global Step: 194700   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:36:55,725-Speed 3300.25 samples/sec   Loss 1.7760   LearningRate 0.0047   Epoch: 15   Global Step: 194710   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:36:58,761-Speed 3374.54 samples/sec   Loss 1.6580   LearningRate 0.0047   Epoch: 15   Global Step: 194720   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:37:01,885-Speed 3277.97 samples/sec   Loss 1.7245   LearningRate 0.0047   Epoch: 15   Global Step: 194730   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:37:04,978-Speed 3312.07 samples/sec   Loss 1.7297   LearningRate 0.0047   Epoch: 15   Global Step: 194740   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:37:08,068-Speed 3315.43 samples/sec   Loss 1.7222   LearningRate 0.0047   Epoch: 15   Global Step: 194750   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:37:11,183-Speed 3288.05 samples/sec   Loss 1.7261   LearningRate 0.0047   Epoch: 15   Global Step: 194760   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:37:14,333-Speed 3251.68 samples/sec   Loss 1.7597   LearningRate 0.0047   Epoch: 15   Global Step: 194770   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:37:17,453-Speed 3283.94 samples/sec   Loss 1.8328   LearningRate 0.0047   Epoch: 15   Global Step: 194780   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:37:20,558-Speed 3298.21 samples/sec   Loss 1.7571   LearningRate 0.0047   Epoch: 15   Global Step: 194790   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:37:23,638-Speed 3325.88 samples/sec   Loss 1.7753   LearningRate 0.0047   Epoch: 15   Global Step: 194800   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:37:26,728-Speed 3315.09 samples/sec   Loss 1.7118   LearningRate 0.0047   Epoch: 15   Global Step: 194810   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:37:29,850-Speed 3281.23 samples/sec   Loss 1.8022   LearningRate 0.0047   Epoch: 15   Global Step: 194820   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:37:32,972-Speed 3280.66 samples/sec   Loss 1.7753   LearningRate 0.0047   Epoch: 15   Global Step: 194830   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:37:36,083-Speed 3292.56 samples/sec   Loss 1.7714   LearningRate 0.0047   Epoch: 15   Global Step: 194840   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:37:39,190-Speed 3297.04 samples/sec   Loss 1.7585   LearningRate 0.0047   Epoch: 15   Global Step: 194850   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:37:42,319-Speed 3274.31 samples/sec   Loss 1.7502   LearningRate 0.0046   Epoch: 15   Global Step: 194860   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:37:45,416-Speed 3307.28 samples/sec   Loss 1.8265   LearningRate 0.0046   Epoch: 15   Global Step: 194870   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:37:48,620-Speed 3196.55 samples/sec   Loss 1.7281   LearningRate 0.0046   Epoch: 15   Global Step: 194880   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:37:51,691-Speed 3336.18 samples/sec   Loss 1.7586   LearningRate 0.0046   Epoch: 15   Global Step: 194890   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:37:54,812-Speed 3281.54 samples/sec   Loss 1.7218   LearningRate 0.0046   Epoch: 15   Global Step: 194900   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:37:57,876-Speed 3343.12 samples/sec   Loss 1.7799   LearningRate 0.0046   Epoch: 15   Global Step: 194910   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:38:00,936-Speed 3348.16 samples/sec   Loss 1.7499   LearningRate 0.0046   Epoch: 15   Global Step: 194920   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:38:04,077-Speed 3261.21 samples/sec   Loss 1.7741   LearningRate 0.0046   Epoch: 15   Global Step: 194930   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:38:07,161-Speed 3321.28 samples/sec   Loss 1.7634   LearningRate 0.0046   Epoch: 15   Global Step: 194940   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:38:10,217-Speed 3351.99 samples/sec   Loss 1.7563   LearningRate 0.0046   Epoch: 15   Global Step: 194950   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:38:13,296-Speed 3326.35 samples/sec   Loss 1.7801   LearningRate 0.0046   Epoch: 15   Global Step: 194960   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:38:16,426-Speed 3272.57 samples/sec   Loss 1.7664   LearningRate 0.0046   Epoch: 15   Global Step: 194970   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:38:19,486-Speed 3347.45 samples/sec   Loss 1.7276   LearningRate 0.0046   Epoch: 15   Global Step: 194980   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:38:22,560-Speed 3333.15 samples/sec   Loss 1.7672   LearningRate 0.0046   Epoch: 15   Global Step: 194990   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:38:25,616-Speed 3351.56 samples/sec   Loss 1.8074   LearningRate 0.0046   Epoch: 15   Global Step: 195000   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:38:28,681-Speed 3342.42 samples/sec   Loss 1.7453   LearningRate 0.0046   Epoch: 15   Global Step: 195010   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:38:31,776-Speed 3308.86 samples/sec   Loss 1.7736   LearningRate 0.0046   Epoch: 15   Global Step: 195020   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:38:34,848-Speed 3334.73 samples/sec   Loss 1.7587   LearningRate 0.0046   Epoch: 15   Global Step: 195030   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:38:37,931-Speed 3321.95 samples/sec   Loss 1.7693   LearningRate 0.0046   Epoch: 15   Global Step: 195040   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:38:41,001-Speed 3337.04 samples/sec   Loss 1.7711   LearningRate 0.0046   Epoch: 15   Global Step: 195050   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:38:44,117-Speed 3287.10 samples/sec   Loss 1.7836   LearningRate 0.0046   Epoch: 15   Global Step: 195060   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:38:47,241-Speed 3279.10 samples/sec   Loss 1.7124   LearningRate 0.0046   Epoch: 15   Global Step: 195070   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:38:50,312-Speed 3335.02 samples/sec   Loss 1.7958   LearningRate 0.0046   Epoch: 15   Global Step: 195080   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:38:53,395-Speed 3323.12 samples/sec   Loss 1.7409   LearningRate 0.0046   Epoch: 15   Global Step: 195090   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:38:56,478-Speed 3322.11 samples/sec   Loss 1.8122   LearningRate 0.0046   Epoch: 15   Global Step: 195100   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:38:59,606-Speed 3274.83 samples/sec   Loss 1.8169   LearningRate 0.0046   Epoch: 15   Global Step: 195110   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:39:02,678-Speed 3334.07 samples/sec   Loss 1.7602   LearningRate 0.0046   Epoch: 15   Global Step: 195120   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:39:05,770-Speed 3313.08 samples/sec   Loss 1.7557   LearningRate 0.0046   Epoch: 15   Global Step: 195130   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:39:08,829-Speed 3348.91 samples/sec   Loss 1.6831   LearningRate 0.0046   Epoch: 15   Global Step: 195140   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:39:11,909-Speed 3325.53 samples/sec   Loss 1.8094   LearningRate 0.0046   Epoch: 15   Global Step: 195150   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:39:15,030-Speed 3282.50 samples/sec   Loss 1.7541   LearningRate 0.0046   Epoch: 15   Global Step: 195160   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:39:18,177-Speed 3254.91 samples/sec   Loss 1.7514   LearningRate 0.0046   Epoch: 15   Global Step: 195170   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:39:21,243-Speed 3341.34 samples/sec   Loss 1.7768   LearningRate 0.0046   Epoch: 15   Global Step: 195180   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:39:24,415-Speed 3228.46 samples/sec   Loss 1.7635   LearningRate 0.0046   Epoch: 15   Global Step: 195190   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:39:27,561-Speed 3256.94 samples/sec   Loss 1.7538   LearningRate 0.0046   Epoch: 15   Global Step: 195200   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:39:30,691-Speed 3272.54 samples/sec   Loss 1.7550   LearningRate 0.0046   Epoch: 15   Global Step: 195210   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:39:33,751-Speed 3346.51 samples/sec   Loss 1.7461   LearningRate 0.0046   Epoch: 15   Global Step: 195220   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:39:36,969-Speed 3183.24 samples/sec   Loss 1.7724   LearningRate 0.0046   Epoch: 15   Global Step: 195230   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:39:40,102-Speed 3269.76 samples/sec   Loss 1.7979   LearningRate 0.0046   Epoch: 15   Global Step: 195240   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:39:43,268-Speed 3235.17 samples/sec   Loss 1.7906   LearningRate 0.0046   Epoch: 15   Global Step: 195250   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:39:46,362-Speed 3311.32 samples/sec   Loss 1.7535   LearningRate 0.0046   Epoch: 15   Global Step: 195260   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:39:49,513-Speed 3250.44 samples/sec   Loss 1.7896   LearningRate 0.0046   Epoch: 15   Global Step: 195270   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:39:52,658-Speed 3256.73 samples/sec   Loss 1.6938   LearningRate 0.0046   Epoch: 15   Global Step: 195280   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:39:55,731-Speed 3333.09 samples/sec   Loss 1.7715   LearningRate 0.0046   Epoch: 15   Global Step: 195290   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:39:58,828-Speed 3307.90 samples/sec   Loss 1.7526   LearningRate 0.0046   Epoch: 15   Global Step: 195300   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:40:01,933-Speed 3298.56 samples/sec   Loss 1.7763   LearningRate 0.0046   Epoch: 15   Global Step: 195310   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:40:05,115-Speed 3219.10 samples/sec   Loss 1.7932   LearningRate 0.0046   Epoch: 15   Global Step: 195320   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:40:08,250-Speed 3267.09 samples/sec   Loss 1.7157   LearningRate 0.0046   Epoch: 15   Global Step: 195330   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:40:11,336-Speed 3319.13 samples/sec   Loss 1.8118   LearningRate 0.0046   Epoch: 15   Global Step: 195340   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:40:14,456-Speed 3283.38 samples/sec   Loss 1.7565   LearningRate 0.0046   Epoch: 15   Global Step: 195350   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:40:17,590-Speed 3269.26 samples/sec   Loss 1.8209   LearningRate 0.0046   Epoch: 15   Global Step: 195360   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:40:20,725-Speed 3266.92 samples/sec   Loss 1.8237   LearningRate 0.0046   Epoch: 15   Global Step: 195370   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:40:23,858-Speed 3269.94 samples/sec   Loss 1.7150   LearningRate 0.0046   Epoch: 15   Global Step: 195380   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:40:26,925-Speed 3339.73 samples/sec   Loss 1.8177   LearningRate 0.0046   Epoch: 15   Global Step: 195390   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:40:30,005-Speed 3325.02 samples/sec   Loss 1.7674   LearningRate 0.0046   Epoch: 15   Global Step: 195400   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:40:33,110-Speed 3299.54 samples/sec   Loss 1.7920   LearningRate 0.0046   Epoch: 15   Global Step: 195410   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:40:36,171-Speed 3346.11 samples/sec   Loss 1.8425   LearningRate 0.0046   Epoch: 15   Global Step: 195420   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:40:39,364-Speed 3209.03 samples/sec   Loss 1.8088   LearningRate 0.0046   Epoch: 15   Global Step: 195430   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:40:42,457-Speed 3311.71 samples/sec   Loss 1.8150   LearningRate 0.0045   Epoch: 15   Global Step: 195440   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:40:45,515-Speed 3349.47 samples/sec   Loss 1.7365   LearningRate 0.0045   Epoch: 15   Global Step: 195450   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:40:48,656-Speed 3260.78 samples/sec   Loss 1.7680   LearningRate 0.0045   Epoch: 15   Global Step: 195460   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:40:51,827-Speed 3230.81 samples/sec   Loss 1.6787   LearningRate 0.0045   Epoch: 15   Global Step: 195470   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:40:54,980-Speed 3248.69 samples/sec   Loss 1.7449   LearningRate 0.0045   Epoch: 15   Global Step: 195480   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:40:58,058-Speed 3328.23 samples/sec   Loss 1.7734   LearningRate 0.0045   Epoch: 15   Global Step: 195490   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:41:01,126-Speed 3338.64 samples/sec   Loss 1.7923   LearningRate 0.0045   Epoch: 15   Global Step: 195500   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:41:04,231-Speed 3298.28 samples/sec   Loss 1.7541   LearningRate 0.0045   Epoch: 15   Global Step: 195510   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:41:07,324-Speed 3312.03 samples/sec   Loss 1.7355   LearningRate 0.0045   Epoch: 15   Global Step: 195520   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:41:10,393-Speed 3337.32 samples/sec   Loss 1.7880   LearningRate 0.0045   Epoch: 15   Global Step: 195530   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:41:13,508-Speed 3288.64 samples/sec   Loss 1.7762   LearningRate 0.0045   Epoch: 15   Global Step: 195540   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:41:16,668-Speed 3241.88 samples/sec   Loss 1.6952   LearningRate 0.0045   Epoch: 15   Global Step: 195550   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:41:19,725-Speed 3350.82 samples/sec   Loss 1.7885   LearningRate 0.0045   Epoch: 15   Global Step: 195560   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:41:22,802-Speed 3329.30 samples/sec   Loss 1.7393   LearningRate 0.0045   Epoch: 15   Global Step: 195570   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:41:25,931-Speed 3273.44 samples/sec   Loss 1.7455   LearningRate 0.0045   Epoch: 15   Global Step: 195580   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:41:29,090-Speed 3242.63 samples/sec   Loss 1.7479   LearningRate 0.0045   Epoch: 15   Global Step: 195590   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:41:32,161-Speed 3335.32 samples/sec   Loss 1.7553   LearningRate 0.0045   Epoch: 15   Global Step: 195600   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:41:35,317-Speed 3245.70 samples/sec   Loss 1.7328   LearningRate 0.0045   Epoch: 15   Global Step: 195610   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:41:38,438-Speed 3282.49 samples/sec   Loss 1.7009   LearningRate 0.0045   Epoch: 15   Global Step: 195620   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:41:41,546-Speed 3295.63 samples/sec   Loss 1.7489   LearningRate 0.0045   Epoch: 15   Global Step: 195630   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:41:44,692-Speed 3255.50 samples/sec   Loss 1.7597   LearningRate 0.0045   Epoch: 15   Global Step: 195640   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:41:47,830-Speed 3264.74 samples/sec   Loss 1.7672   LearningRate 0.0045   Epoch: 15   Global Step: 195650   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:41:50,983-Speed 3248.69 samples/sec   Loss 1.7547   LearningRate 0.0045   Epoch: 15   Global Step: 195660   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:41:54,034-Speed 3357.93 samples/sec   Loss 1.7457   LearningRate 0.0045   Epoch: 15   Global Step: 195670   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:41:57,107-Speed 3333.76 samples/sec   Loss 1.7502   LearningRate 0.0045   Epoch: 15   Global Step: 195680   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:42:00,156-Speed 3359.64 samples/sec   Loss 1.7183   LearningRate 0.0045   Epoch: 15   Global Step: 195690   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:42:03,292-Speed 3265.81 samples/sec   Loss 1.6868   LearningRate 0.0045   Epoch: 15   Global Step: 195700   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:42:06,430-Speed 3265.00 samples/sec   Loss 1.7225   LearningRate 0.0045   Epoch: 15   Global Step: 195710   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:42:09,549-Speed 3284.07 samples/sec   Loss 1.7562   LearningRate 0.0045   Epoch: 15   Global Step: 195720   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:42:12,614-Speed 3342.04 samples/sec   Loss 1.7647   LearningRate 0.0045   Epoch: 15   Global Step: 195730   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:42:15,775-Speed 3239.99 samples/sec   Loss 1.7932   LearningRate 0.0045   Epoch: 15   Global Step: 195740   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:42:18,872-Speed 3307.85 samples/sec   Loss 1.7261   LearningRate 0.0045   Epoch: 15   Global Step: 195750   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:42:21,938-Speed 3340.71 samples/sec   Loss 1.8138   LearningRate 0.0045   Epoch: 15   Global Step: 195760   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:42:25,035-Speed 3307.66 samples/sec   Loss 1.7721   LearningRate 0.0045   Epoch: 15   Global Step: 195770   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:42:28,105-Speed 3336.93 samples/sec   Loss 1.7153   LearningRate 0.0045   Epoch: 15   Global Step: 195780   Fp16 Grad Scale: 4096   Required: 5 hours
Training: 2022-04-27 18:42:31,189-Speed 3321.35 samples/sec   Loss 1.8090   LearningRate 0.0045   Epoch: 15   Global Step: 195790   Fp16 Grad Scale: 4096   Required: 5 hours
Training: 2022-04-27 18:42:34,245-Speed 3352.22 samples/sec   Loss 1.7596   LearningRate 0.0045   Epoch: 15   Global Step: 195800   Fp16 Grad Scale: 4096   Required: 5 hours
Training: 2022-04-27 18:42:37,343-Speed 3306.20 samples/sec   Loss 1.7790   LearningRate 0.0045   Epoch: 15   Global Step: 195810   Fp16 Grad Scale: 4096   Required: 5 hours
Training: 2022-04-27 18:42:40,400-Speed 3350.71 samples/sec   Loss 1.7511   LearningRate 0.0045   Epoch: 15   Global Step: 195820   Fp16 Grad Scale: 4096   Required: 5 hours
Training: 2022-04-27 18:42:43,477-Speed 3328.29 samples/sec   Loss 1.7470   LearningRate 0.0045   Epoch: 15   Global Step: 195830   Fp16 Grad Scale: 4096   Required: 5 hours
Training: 2022-04-27 18:42:46,578-Speed 3303.29 samples/sec   Loss 1.7527   LearningRate 0.0045   Epoch: 15   Global Step: 195840   Fp16 Grad Scale: 4096   Required: 5 hours
Training: 2022-04-27 18:42:49,653-Speed 3332.00 samples/sec   Loss 1.7745   LearningRate 0.0045   Epoch: 15   Global Step: 195850   Fp16 Grad Scale: 4096   Required: 5 hours
Training: 2022-04-27 18:42:52,801-Speed 3253.74 samples/sec   Loss 1.7654   LearningRate 0.0045   Epoch: 15   Global Step: 195860   Fp16 Grad Scale: 4096   Required: 5 hours
Training: 2022-04-27 18:42:55,878-Speed 3328.36 samples/sec   Loss 1.8294   LearningRate 0.0045   Epoch: 15   Global Step: 195870   Fp16 Grad Scale: 4096   Required: 5 hours
Training: 2022-04-27 18:42:59,057-Speed 3222.64 samples/sec   Loss 1.7555   LearningRate 0.0045   Epoch: 15   Global Step: 195880   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:43:02,182-Speed 3277.88 samples/sec   Loss 1.7804   LearningRate 0.0045   Epoch: 15   Global Step: 195890   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:43:05,330-Speed 3253.51 samples/sec   Loss 1.7379   LearningRate 0.0045   Epoch: 15   Global Step: 195900   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:43:08,479-Speed 3252.45 samples/sec   Loss 1.7520   LearningRate 0.0045   Epoch: 15   Global Step: 195910   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:43:11,553-Speed 3332.95 samples/sec   Loss 1.7703   LearningRate 0.0045   Epoch: 15   Global Step: 195920   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:43:14,693-Speed 3262.05 samples/sec   Loss 1.7441   LearningRate 0.0045   Epoch: 15   Global Step: 195930   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:43:17,888-Speed 3206.30 samples/sec   Loss 1.7505   LearningRate 0.0045   Epoch: 15   Global Step: 195940   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:43:21,003-Speed 3288.82 samples/sec   Loss 1.7685   LearningRate 0.0045   Epoch: 15   Global Step: 195950   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:43:24,220-Speed 3183.94 samples/sec   Loss 1.7795   LearningRate 0.0045   Epoch: 15   Global Step: 195960   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:43:27,391-Speed 3230.00 samples/sec   Loss 1.8052   LearningRate 0.0045   Epoch: 15   Global Step: 195970   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:43:30,575-Speed 3217.01 samples/sec   Loss 1.7557   LearningRate 0.0045   Epoch: 15   Global Step: 195980   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:43:33,662-Speed 3318.66 samples/sec   Loss 1.7786   LearningRate 0.0045   Epoch: 15   Global Step: 195990   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:43:36,840-Speed 3222.89 samples/sec   Loss 1.7729   LearningRate 0.0045   Epoch: 15   Global Step: 196000   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:43:39,976-Speed 3265.85 samples/sec   Loss 1.7354   LearningRate 0.0045   Epoch: 15   Global Step: 196010   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:43:43,062-Speed 3319.21 samples/sec   Loss 1.7386   LearningRate 0.0044   Epoch: 15   Global Step: 196020   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:43:46,235-Speed 3228.95 samples/sec   Loss 1.7466   LearningRate 0.0044   Epoch: 15   Global Step: 196030   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:43:49,441-Speed 3194.87 samples/sec   Loss 1.7853   LearningRate 0.0044   Epoch: 15   Global Step: 196040   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:43:52,599-Speed 3243.17 samples/sec   Loss 1.7675   LearningRate 0.0044   Epoch: 15   Global Step: 196050   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:43:55,760-Speed 3240.60 samples/sec   Loss 1.8274   LearningRate 0.0044   Epoch: 15   Global Step: 196060   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:43:58,870-Speed 3293.95 samples/sec   Loss 1.8056   LearningRate 0.0044   Epoch: 15   Global Step: 196070   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:44:01,948-Speed 3328.07 samples/sec   Loss 1.7536   LearningRate 0.0044   Epoch: 15   Global Step: 196080   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:44:05,071-Speed 3280.05 samples/sec   Loss 1.8111   LearningRate 0.0044   Epoch: 15   Global Step: 196090   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:44:08,128-Speed 3350.47 samples/sec   Loss 1.8244   LearningRate 0.0044   Epoch: 15   Global Step: 196100   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:44:11,261-Speed 3269.31 samples/sec   Loss 1.7778   LearningRate 0.0044   Epoch: 15   Global Step: 196110   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:44:14,376-Speed 3288.86 samples/sec   Loss 1.8181   LearningRate 0.0044   Epoch: 15   Global Step: 196120   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:44:17,509-Speed 3268.70 samples/sec   Loss 1.7100   LearningRate 0.0044   Epoch: 15   Global Step: 196130   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:44:20,598-Speed 3316.08 samples/sec   Loss 1.7625   LearningRate 0.0044   Epoch: 15   Global Step: 196140   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:44:23,694-Speed 3308.83 samples/sec   Loss 1.7722   LearningRate 0.0044   Epoch: 15   Global Step: 196150   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:44:26,834-Speed 3262.65 samples/sec   Loss 1.7089   LearningRate 0.0044   Epoch: 15   Global Step: 196160   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:44:29,962-Speed 3274.31 samples/sec   Loss 1.7561   LearningRate 0.0044   Epoch: 15   Global Step: 196170   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:44:33,070-Speed 3296.11 samples/sec   Loss 1.7690   LearningRate 0.0044   Epoch: 15   Global Step: 196180   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:44:36,234-Speed 3237.12 samples/sec   Loss 1.7714   LearningRate 0.0044   Epoch: 15   Global Step: 196190   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:44:39,359-Speed 3278.03 samples/sec   Loss 1.7743   LearningRate 0.0044   Epoch: 15   Global Step: 196200   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:44:42,527-Speed 3233.31 samples/sec   Loss 1.7983   LearningRate 0.0044   Epoch: 15   Global Step: 196210   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-04-27 18:44:45,613-Speed 3319.10 samples/sec   Loss 1.7840   LearningRate 0.0044   Epoch: 15   Global Step: 196220   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:44:48,740-Speed 3275.35 samples/sec   Loss 1.7617   LearningRate 0.0044   Epoch: 15   Global Step: 196230   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:44:51,809-Speed 3338.16 samples/sec   Loss 1.7397   LearningRate 0.0044   Epoch: 15   Global Step: 196240   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:44:54,937-Speed 3274.93 samples/sec   Loss 1.7636   LearningRate 0.0044   Epoch: 15   Global Step: 196250   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:44:57,990-Speed 3355.00 samples/sec   Loss 1.7704   LearningRate 0.0044   Epoch: 15   Global Step: 196260   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-04-27 18:45:01,082-Speed 3312.83 samples/sec   Loss 1.7883   LearningRate 0.0044   Epoch: 15   Global Step: 196270   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:45:04,182-Speed 3304.45 samples/sec   Loss 1.7379   LearningRate 0.0044   Epoch: 15   Global Step: 196280   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:45:07,277-Speed 3309.05 samples/sec   Loss 1.7186   LearningRate 0.0044   Epoch: 15   Global Step: 196290   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:45:10,421-Speed 3257.78 samples/sec   Loss 1.7753   LearningRate 0.0044   Epoch: 15   Global Step: 196300   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:45:13,594-Speed 3228.95 samples/sec   Loss 1.7096   LearningRate 0.0044   Epoch: 15   Global Step: 196310   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:45:16,679-Speed 3320.29 samples/sec   Loss 1.7464   LearningRate 0.0044   Epoch: 15   Global Step: 196320   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:45:19,781-Speed 3301.44 samples/sec   Loss 1.7832   LearningRate 0.0044   Epoch: 15   Global Step: 196330   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:45:22,848-Speed 3340.01 samples/sec   Loss 1.7934   LearningRate 0.0044   Epoch: 15   Global Step: 196340   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:45:25,924-Speed 3330.27 samples/sec   Loss 1.7912   LearningRate 0.0044   Epoch: 15   Global Step: 196350   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:45:29,015-Speed 3313.56 samples/sec   Loss 1.7780   LearningRate 0.0044   Epoch: 15   Global Step: 196360   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-04-27 18:45:32,136-Speed 3282.62 samples/sec   Loss 1.7404   LearningRate 0.0044   Epoch: 15   Global Step: 196370   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:45:35,335-Speed 3202.31 samples/sec   Loss 1.7822   LearningRate 0.0044   Epoch: 15   Global Step: 196380   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:45:38,394-Speed 3349.22 samples/sec   Loss 1.7592   LearningRate 0.0044   Epoch: 15   Global Step: 196390   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:45:41,581-Speed 3213.12 samples/sec   Loss 1.7448   LearningRate 0.0044   Epoch: 15   Global Step: 196400   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:45:44,646-Speed 3343.07 samples/sec   Loss 1.7969   LearningRate 0.0044   Epoch: 15   Global Step: 196410   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:45:47,788-Speed 3260.11 samples/sec   Loss 1.7677   LearningRate 0.0044   Epoch: 15   Global Step: 196420   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:45:50,878-Speed 3314.19 samples/sec   Loss 1.7527   LearningRate 0.0044   Epoch: 15   Global Step: 196430   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:45:54,022-Speed 3258.71 samples/sec   Loss 1.7593   LearningRate 0.0044   Epoch: 15   Global Step: 196440   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:45:57,700-Speed 2784.54 samples/sec   Loss 1.7549   LearningRate 0.0044   Epoch: 15   Global Step: 196450   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:46:00,771-Speed 3335.73 samples/sec   Loss 1.7726   LearningRate 0.0044   Epoch: 15   Global Step: 196460   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:46:03,976-Speed 3195.70 samples/sec   Loss 1.6865   LearningRate 0.0044   Epoch: 15   Global Step: 196470   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:46:07,192-Speed 3185.93 samples/sec   Loss 1.7309   LearningRate 0.0044   Epoch: 15   Global Step: 196480   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:46:10,311-Speed 3283.73 samples/sec   Loss 1.7236   LearningRate 0.0044   Epoch: 15   Global Step: 196490   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:46:13,422-Speed 3292.28 samples/sec   Loss 1.7452   LearningRate 0.0044   Epoch: 15   Global Step: 196500   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:46:16,488-Speed 3341.59 samples/sec   Loss 1.7688   LearningRate 0.0044   Epoch: 15   Global Step: 196510   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:46:19,581-Speed 3311.22 samples/sec   Loss 1.7966   LearningRate 0.0044   Epoch: 15   Global Step: 196520   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:46:22,678-Speed 3307.52 samples/sec   Loss 1.7350   LearningRate 0.0044   Epoch: 15   Global Step: 196530   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:46:25,771-Speed 3311.87 samples/sec   Loss 1.8149   LearningRate 0.0044   Epoch: 15   Global Step: 196540   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:46:28,849-Speed 3327.86 samples/sec   Loss 1.7718   LearningRate 0.0044   Epoch: 15   Global Step: 196550   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:46:31,903-Speed 3354.47 samples/sec   Loss 1.7888   LearningRate 0.0044   Epoch: 15   Global Step: 196560   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:46:35,017-Speed 3289.16 samples/sec   Loss 1.7775   LearningRate 0.0044   Epoch: 15   Global Step: 196570   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:46:38,129-Speed 3291.90 samples/sec   Loss 1.7298   LearningRate 0.0044   Epoch: 15   Global Step: 196580   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:46:41,251-Speed 3280.90 samples/sec   Loss 1.7841   LearningRate 0.0044   Epoch: 15   Global Step: 196590   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:46:44,306-Speed 3352.51 samples/sec   Loss 1.7808   LearningRate 0.0044   Epoch: 15   Global Step: 196600   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:46:47,392-Speed 3319.88 samples/sec   Loss 1.8134   LearningRate 0.0043   Epoch: 15   Global Step: 196610   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:46:50,534-Speed 3259.29 samples/sec   Loss 1.7744   LearningRate 0.0043   Epoch: 15   Global Step: 196620   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 18:46:53,649-Speed 3288.91 samples/sec   Loss 1.8112   LearningRate 0.0043   Epoch: 15   Global Step: 196630   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 18:46:56,751-Speed 3302.31 samples/sec   Loss 1.7960   LearningRate 0.0043   Epoch: 15   Global Step: 196640   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 18:46:59,785-Speed 3375.84 samples/sec   Loss 1.7387   LearningRate 0.0043   Epoch: 15   Global Step: 196650   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:47:02,865-Speed 3326.09 samples/sec   Loss 1.7524   LearningRate 0.0043   Epoch: 15   Global Step: 196660   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:47:05,966-Speed 3302.42 samples/sec   Loss 1.8024   LearningRate 0.0043   Epoch: 15   Global Step: 196670   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:47:09,044-Speed 3328.34 samples/sec   Loss 1.8030   LearningRate 0.0043   Epoch: 15   Global Step: 196680   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:47:12,105-Speed 3346.34 samples/sec   Loss 1.7532   LearningRate 0.0043   Epoch: 15   Global Step: 196690   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:47:15,201-Speed 3309.08 samples/sec   Loss 1.7281   LearningRate 0.0043   Epoch: 15   Global Step: 196700   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:47:18,252-Speed 3356.77 samples/sec   Loss 1.7530   LearningRate 0.0043   Epoch: 15   Global Step: 196710   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:47:21,334-Speed 3323.72 samples/sec   Loss 1.7363   LearningRate 0.0043   Epoch: 15   Global Step: 196720   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:47:24,422-Speed 3316.90 samples/sec   Loss 1.7622   LearningRate 0.0043   Epoch: 15   Global Step: 196730   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:47:27,537-Speed 3288.29 samples/sec   Loss 1.7896   LearningRate 0.0043   Epoch: 15   Global Step: 196740   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:47:31,182-Speed 2810.29 samples/sec   Loss 1.7784   LearningRate 0.0043   Epoch: 15   Global Step: 196750   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:47:34,245-Speed 3344.72 samples/sec   Loss 1.7058   LearningRate 0.0043   Epoch: 15   Global Step: 196760   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:47:37,976-Speed 2744.91 samples/sec   Loss 1.7453   LearningRate 0.0043   Epoch: 15   Global Step: 196770   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:47:43,537-Speed 1841.69 samples/sec   Loss 1.7647   LearningRate 0.0043   Epoch: 15   Global Step: 196780   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:47:46,608-Speed 3335.66 samples/sec   Loss 1.7741   LearningRate 0.0043   Epoch: 15   Global Step: 196790   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:47:49,691-Speed 3323.12 samples/sec   Loss 1.7986   LearningRate 0.0043   Epoch: 15   Global Step: 196800   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:47:52,776-Speed 3319.47 samples/sec   Loss 1.7833   LearningRate 0.0043   Epoch: 15   Global Step: 196810   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:47:55,966-Speed 3211.11 samples/sec   Loss 1.7254   LearningRate 0.0043   Epoch: 15   Global Step: 196820   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:47:59,058-Speed 3313.20 samples/sec   Loss 1.7397   LearningRate 0.0043   Epoch: 15   Global Step: 196830   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:48:02,138-Speed 3326.33 samples/sec   Loss 1.8250   LearningRate 0.0043   Epoch: 15   Global Step: 196840   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:48:05,277-Speed 3263.16 samples/sec   Loss 1.7883   LearningRate 0.0043   Epoch: 15   Global Step: 196850   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:48:08,376-Speed 3305.05 samples/sec   Loss 1.7881   LearningRate 0.0043   Epoch: 15   Global Step: 196860   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:48:11,468-Speed 3313.01 samples/sec   Loss 1.7708   LearningRate 0.0043   Epoch: 15   Global Step: 196870   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:48:14,575-Speed 3296.89 samples/sec   Loss 1.7534   LearningRate 0.0043   Epoch: 15   Global Step: 196880   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:48:17,724-Speed 3252.98 samples/sec   Loss 1.6955   LearningRate 0.0043   Epoch: 15   Global Step: 196890   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:48:20,801-Speed 3329.15 samples/sec   Loss 1.7290   LearningRate 0.0043   Epoch: 15   Global Step: 196900   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:48:23,842-Speed 3367.92 samples/sec   Loss 1.7397   LearningRate 0.0043   Epoch: 15   Global Step: 196910   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:48:26,939-Speed 3307.84 samples/sec   Loss 1.7759   LearningRate 0.0043   Epoch: 15   Global Step: 196920   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:48:30,034-Speed 3309.10 samples/sec   Loss 1.8072   LearningRate 0.0043   Epoch: 15   Global Step: 196930   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:48:33,160-Speed 3277.42 samples/sec   Loss 1.7468   LearningRate 0.0043   Epoch: 15   Global Step: 196940   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:48:36,311-Speed 3250.55 samples/sec   Loss 1.7451   LearningRate 0.0043   Epoch: 15   Global Step: 196950   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:48:39,431-Speed 3283.92 samples/sec   Loss 1.7593   LearningRate 0.0043   Epoch: 15   Global Step: 196960   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:48:42,540-Speed 3294.93 samples/sec   Loss 1.7524   LearningRate 0.0043   Epoch: 15   Global Step: 196970   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:48:45,611-Speed 3335.18 samples/sec   Loss 1.6986   LearningRate 0.0043   Epoch: 15   Global Step: 196980   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:48:48,738-Speed 3276.41 samples/sec   Loss 1.7800   LearningRate 0.0043   Epoch: 15   Global Step: 196990   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:48:51,883-Speed 3256.99 samples/sec   Loss 1.7942   LearningRate 0.0043   Epoch: 15   Global Step: 197000   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:48:55,030-Speed 3254.94 samples/sec   Loss 1.7620   LearningRate 0.0043   Epoch: 15   Global Step: 197010   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:48:58,087-Speed 3350.71 samples/sec   Loss 1.7895   LearningRate 0.0043   Epoch: 15   Global Step: 197020   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:49:01,274-Speed 3213.71 samples/sec   Loss 1.7596   LearningRate 0.0043   Epoch: 15   Global Step: 197030   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:49:04,483-Speed 3192.09 samples/sec   Loss 1.8161   LearningRate 0.0043   Epoch: 15   Global Step: 197040   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:49:07,632-Speed 3253.37 samples/sec   Loss 1.7804   LearningRate 0.0043   Epoch: 15   Global Step: 197050   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:49:10,720-Speed 3316.99 samples/sec   Loss 1.7117   LearningRate 0.0043   Epoch: 15   Global Step: 197060   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:49:13,880-Speed 3241.35 samples/sec   Loss 1.7728   LearningRate 0.0043   Epoch: 15   Global Step: 197070   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:49:16,989-Speed 3294.86 samples/sec   Loss 1.7631   LearningRate 0.0043   Epoch: 15   Global Step: 197080   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:49:20,120-Speed 3271.42 samples/sec   Loss 1.7783   LearningRate 0.0043   Epoch: 15   Global Step: 197090   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:49:23,251-Speed 3271.86 samples/sec   Loss 1.7740   LearningRate 0.0043   Epoch: 15   Global Step: 197100   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:49:26,373-Speed 3280.95 samples/sec   Loss 1.7688   LearningRate 0.0043   Epoch: 15   Global Step: 197110   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 18:49:29,473-Speed 3303.91 samples/sec   Loss 1.8078   LearningRate 0.0043   Epoch: 15   Global Step: 197120   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 18:49:32,584-Speed 3292.51 samples/sec   Loss 1.7336   LearningRate 0.0043   Epoch: 15   Global Step: 197130   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 18:49:35,683-Speed 3305.65 samples/sec   Loss 1.7533   LearningRate 0.0043   Epoch: 15   Global Step: 197140   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 18:49:38,746-Speed 3344.75 samples/sec   Loss 1.7719   LearningRate 0.0043   Epoch: 15   Global Step: 197150   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:49:41,906-Speed 3241.35 samples/sec   Loss 1.7921   LearningRate 0.0043   Epoch: 15   Global Step: 197160   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:49:45,033-Speed 3275.27 samples/sec   Loss 1.7952   LearningRate 0.0043   Epoch: 15   Global Step: 197170   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:49:48,188-Speed 3246.80 samples/sec   Loss 1.7666   LearningRate 0.0043   Epoch: 15   Global Step: 197180   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:49:51,367-Speed 3222.10 samples/sec   Loss 1.7131   LearningRate 0.0043   Epoch: 15   Global Step: 197190   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:49:54,512-Speed 3257.39 samples/sec   Loss 1.7621   LearningRate 0.0043   Epoch: 15   Global Step: 197200   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:49:57,598-Speed 3319.37 samples/sec   Loss 1.7480   LearningRate 0.0042   Epoch: 15   Global Step: 197210   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:50:00,790-Speed 3208.55 samples/sec   Loss 1.7270   LearningRate 0.0042   Epoch: 15   Global Step: 197220   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:50:03,957-Speed 3234.37 samples/sec   Loss 1.7273   LearningRate 0.0042   Epoch: 15   Global Step: 197230   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:50:07,128-Speed 3230.04 samples/sec   Loss 1.7906   LearningRate 0.0042   Epoch: 15   Global Step: 197240   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:50:10,219-Speed 3313.90 samples/sec   Loss 1.7707   LearningRate 0.0042   Epoch: 15   Global Step: 197250   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:50:13,351-Speed 3271.79 samples/sec   Loss 1.7548   LearningRate 0.0042   Epoch: 15   Global Step: 197260   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:50:16,498-Speed 3254.16 samples/sec   Loss 1.8143   LearningRate 0.0042   Epoch: 15   Global Step: 197270   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:50:19,679-Speed 3220.53 samples/sec   Loss 1.7546   LearningRate 0.0042   Epoch: 15   Global Step: 197280   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:50:22,769-Speed 3314.78 samples/sec   Loss 1.8002   LearningRate 0.0042   Epoch: 15   Global Step: 197290   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:50:25,919-Speed 3251.67 samples/sec   Loss 1.7156   LearningRate 0.0042   Epoch: 15   Global Step: 197300   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:50:29,026-Speed 3297.73 samples/sec   Loss 1.7709   LearningRate 0.0042   Epoch: 15   Global Step: 197310   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:50:32,128-Speed 3302.02 samples/sec   Loss 1.8211   LearningRate 0.0042   Epoch: 15   Global Step: 197320   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:50:35,218-Speed 3314.35 samples/sec   Loss 1.7592   LearningRate 0.0042   Epoch: 15   Global Step: 197330   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:50:38,321-Speed 3301.39 samples/sec   Loss 1.7105   LearningRate 0.0042   Epoch: 15   Global Step: 197340   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:50:41,458-Speed 3265.43 samples/sec   Loss 1.6810   LearningRate 0.0042   Epoch: 15   Global Step: 197350   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:50:44,576-Speed 3286.00 samples/sec   Loss 1.7463   LearningRate 0.0042   Epoch: 15   Global Step: 197360   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:50:47,741-Speed 3236.37 samples/sec   Loss 1.7688   LearningRate 0.0042   Epoch: 15   Global Step: 197370   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:50:50,883-Speed 3258.93 samples/sec   Loss 1.7856   LearningRate 0.0042   Epoch: 15   Global Step: 197380   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:50:54,077-Speed 3207.45 samples/sec   Loss 1.7879   LearningRate 0.0042   Epoch: 15   Global Step: 197390   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:50:57,169-Speed 3313.07 samples/sec   Loss 1.8155   LearningRate 0.0042   Epoch: 15   Global Step: 197400   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:51:00,313-Speed 3257.44 samples/sec   Loss 1.7730   LearningRate 0.0042   Epoch: 15   Global Step: 197410   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:51:03,451-Speed 3264.49 samples/sec   Loss 1.7448   LearningRate 0.0042   Epoch: 15   Global Step: 197420   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:51:06,656-Speed 3196.31 samples/sec   Loss 1.7732   LearningRate 0.0042   Epoch: 15   Global Step: 197430   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 18:51:09,750-Speed 3310.47 samples/sec   Loss 1.7819   LearningRate 0.0042   Epoch: 15   Global Step: 197440   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 18:51:12,916-Speed 3235.97 samples/sec   Loss 1.7502   LearningRate 0.0042   Epoch: 15   Global Step: 197450   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 18:51:16,134-Speed 3182.36 samples/sec   Loss 1.7497   LearningRate 0.0042   Epoch: 15   Global Step: 197460   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 18:51:19,209-Speed 3330.78 samples/sec   Loss 1.7998   LearningRate 0.0042   Epoch: 15   Global Step: 197470   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 18:51:22,284-Speed 3331.90 samples/sec   Loss 1.8123   LearningRate 0.0042   Epoch: 15   Global Step: 197480   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:51:25,381-Speed 3307.90 samples/sec   Loss 1.7923   LearningRate 0.0042   Epoch: 15   Global Step: 197490   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:51:28,447-Speed 3339.97 samples/sec   Loss 1.7450   LearningRate 0.0042   Epoch: 15   Global Step: 197500   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:51:31,554-Speed 3297.19 samples/sec   Loss 1.8042   LearningRate 0.0042   Epoch: 15   Global Step: 197510   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:51:34,682-Speed 3274.82 samples/sec   Loss 1.7339   LearningRate 0.0042   Epoch: 15   Global Step: 197520   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:51:37,859-Speed 3224.15 samples/sec   Loss 1.7266   LearningRate 0.0042   Epoch: 15   Global Step: 197530   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:51:40,964-Speed 3298.70 samples/sec   Loss 1.7267   LearningRate 0.0042   Epoch: 15   Global Step: 197540   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:51:44,113-Speed 3252.86 samples/sec   Loss 1.7628   LearningRate 0.0042   Epoch: 15   Global Step: 197550   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:51:47,228-Speed 3288.98 samples/sec   Loss 1.6731   LearningRate 0.0042   Epoch: 15   Global Step: 197560   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:51:50,372-Speed 3258.21 samples/sec   Loss 1.7711   LearningRate 0.0042   Epoch: 15   Global Step: 197570   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:51:53,452-Speed 3325.22 samples/sec   Loss 1.7823   LearningRate 0.0042   Epoch: 15   Global Step: 197580   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 18:51:56,587-Speed 3267.61 samples/sec   Loss 1.7667   LearningRate 0.0042   Epoch: 15   Global Step: 197590   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 18:51:59,659-Speed 3334.75 samples/sec   Loss 1.7972   LearningRate 0.0042   Epoch: 15   Global Step: 197600   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:52:02,764-Speed 3298.39 samples/sec   Loss 1.7136   LearningRate 0.0042   Epoch: 15   Global Step: 197610   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:52:05,853-Speed 3315.79 samples/sec   Loss 1.7425   LearningRate 0.0042   Epoch: 15   Global Step: 197620   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:52:08,938-Speed 3320.81 samples/sec   Loss 1.7430   LearningRate 0.0042   Epoch: 15   Global Step: 197630   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:52:12,106-Speed 3233.44 samples/sec   Loss 1.7327   LearningRate 0.0042   Epoch: 15   Global Step: 197640   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:52:15,269-Speed 3238.53 samples/sec   Loss 1.8425   LearningRate 0.0042   Epoch: 15   Global Step: 197650   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:52:18,364-Speed 3309.01 samples/sec   Loss 1.8012   LearningRate 0.0042   Epoch: 15   Global Step: 197660   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:52:21,487-Speed 3279.82 samples/sec   Loss 1.8002   LearningRate 0.0042   Epoch: 15   Global Step: 197670   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:52:24,611-Speed 3279.07 samples/sec   Loss 1.7700   LearningRate 0.0042   Epoch: 15   Global Step: 197680   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:52:27,739-Speed 3275.71 samples/sec   Loss 1.7278   LearningRate 0.0042   Epoch: 15   Global Step: 197690   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:52:30,917-Speed 3222.65 samples/sec   Loss 1.7540   LearningRate 0.0042   Epoch: 15   Global Step: 197700   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:52:34,008-Speed 3313.78 samples/sec   Loss 1.7635   LearningRate 0.0042   Epoch: 15   Global Step: 197710   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:52:37,161-Speed 3249.37 samples/sec   Loss 1.7881   LearningRate 0.0042   Epoch: 15   Global Step: 197720   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:52:40,297-Speed 3266.03 samples/sec   Loss 1.8310   LearningRate 0.0042   Epoch: 15   Global Step: 197730   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:52:43,410-Speed 3290.73 samples/sec   Loss 1.7344   LearningRate 0.0042   Epoch: 15   Global Step: 197740   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:52:46,495-Speed 3319.85 samples/sec   Loss 1.7747   LearningRate 0.0042   Epoch: 15   Global Step: 197750   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:52:49,591-Speed 3308.95 samples/sec   Loss 1.7671   LearningRate 0.0042   Epoch: 15   Global Step: 197760   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:52:52,701-Speed 3293.12 samples/sec   Loss 1.7586   LearningRate 0.0042   Epoch: 15   Global Step: 197770   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:52:55,822-Speed 3282.80 samples/sec   Loss 1.7169   LearningRate 0.0042   Epoch: 15   Global Step: 197780   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:52:58,888-Speed 3341.03 samples/sec   Loss 1.7673   LearningRate 0.0042   Epoch: 15   Global Step: 197790   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:53:02,000-Speed 3290.38 samples/sec   Loss 1.7566   LearningRate 0.0042   Epoch: 15   Global Step: 197800   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 18:53:05,142-Speed 3260.72 samples/sec   Loss 1.7350   LearningRate 0.0042   Epoch: 15   Global Step: 197810   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:53:08,225-Speed 3322.31 samples/sec   Loss 1.7537   LearningRate 0.0041   Epoch: 15   Global Step: 197820   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:53:11,389-Speed 3237.64 samples/sec   Loss 1.7892   LearningRate 0.0041   Epoch: 15   Global Step: 197830   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:53:14,500-Speed 3292.30 samples/sec   Loss 1.7535   LearningRate 0.0041   Epoch: 15   Global Step: 197840   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:53:17,633-Speed 3270.32 samples/sec   Loss 1.8275   LearningRate 0.0041   Epoch: 15   Global Step: 197850   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:53:20,716-Speed 3321.97 samples/sec   Loss 1.7629   LearningRate 0.0041   Epoch: 15   Global Step: 197860   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:53:23,819-Speed 3300.99 samples/sec   Loss 1.7504   LearningRate 0.0041   Epoch: 15   Global Step: 197870   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:53:27,007-Speed 3213.13 samples/sec   Loss 1.7718   LearningRate 0.0041   Epoch: 15   Global Step: 197880   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:53:30,197-Speed 3210.78 samples/sec   Loss 1.7839   LearningRate 0.0041   Epoch: 15   Global Step: 197890   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:53:33,265-Speed 3338.77 samples/sec   Loss 1.7163   LearningRate 0.0041   Epoch: 15   Global Step: 197900   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:53:36,405-Speed 3262.03 samples/sec   Loss 1.7654   LearningRate 0.0041   Epoch: 15   Global Step: 197910   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 18:53:39,580-Speed 3226.16 samples/sec   Loss 1.8131   LearningRate 0.0041   Epoch: 15   Global Step: 197920   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:53:42,838-Speed 3144.43 samples/sec   Loss 1.7580   LearningRate 0.0041   Epoch: 15   Global Step: 197930   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:53:45,909-Speed 3335.88 samples/sec   Loss 1.7521   LearningRate 0.0041   Epoch: 15   Global Step: 197940   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:53:49,012-Speed 3300.37 samples/sec   Loss 1.7367   LearningRate 0.0041   Epoch: 15   Global Step: 197950   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:53:52,132-Speed 3282.50 samples/sec   Loss 1.7626   LearningRate 0.0041   Epoch: 15   Global Step: 197960   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:53:55,200-Speed 3339.62 samples/sec   Loss 1.7050   LearningRate 0.0041   Epoch: 15   Global Step: 197970   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:53:58,305-Speed 3298.55 samples/sec   Loss 1.8025   LearningRate 0.0041   Epoch: 15   Global Step: 197980   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:54:01,502-Speed 3204.28 samples/sec   Loss 1.7861   LearningRate 0.0041   Epoch: 15   Global Step: 197990   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:54:04,603-Speed 3302.57 samples/sec   Loss 1.6997   LearningRate 0.0041   Epoch: 15   Global Step: 198000   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:54:07,731-Speed 3275.66 samples/sec   Loss 1.7445   LearningRate 0.0041   Epoch: 15   Global Step: 198010   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:54:10,838-Speed 3295.69 samples/sec   Loss 1.7936   LearningRate 0.0041   Epoch: 15   Global Step: 198020   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 18:54:13,945-Speed 3297.77 samples/sec   Loss 1.8272   LearningRate 0.0041   Epoch: 15   Global Step: 198030   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:54:17,056-Speed 3291.82 samples/sec   Loss 1.7440   LearningRate 0.0041   Epoch: 15   Global Step: 198040   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:54:20,144-Speed 3317.15 samples/sec   Loss 1.8386   LearningRate 0.0041   Epoch: 15   Global Step: 198050   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:54:23,200-Speed 3352.33 samples/sec   Loss 1.7492   LearningRate 0.0041   Epoch: 15   Global Step: 198060   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:54:26,279-Speed 3326.15 samples/sec   Loss 1.7827   LearningRate 0.0041   Epoch: 15   Global Step: 198070   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:54:29,391-Speed 3291.30 samples/sec   Loss 1.6923   LearningRate 0.0041   Epoch: 15   Global Step: 198080   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:54:32,451-Speed 3347.69 samples/sec   Loss 1.7578   LearningRate 0.0041   Epoch: 15   Global Step: 198090   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:54:35,513-Speed 3347.25 samples/sec   Loss 1.8020   LearningRate 0.0041   Epoch: 15   Global Step: 198100   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:54:38,585-Speed 3334.60 samples/sec   Loss 1.8150   LearningRate 0.0041   Epoch: 15   Global Step: 198110   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:54:41,663-Speed 3328.39 samples/sec   Loss 1.7762   LearningRate 0.0041   Epoch: 15   Global Step: 198120   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:54:44,745-Speed 3324.06 samples/sec   Loss 1.7797   LearningRate 0.0041   Epoch: 15   Global Step: 198130   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:54:47,878-Speed 3269.51 samples/sec   Loss 1.7542   LearningRate 0.0041   Epoch: 15   Global Step: 198140   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:54:51,001-Speed 3279.26 samples/sec   Loss 1.7551   LearningRate 0.0041   Epoch: 15   Global Step: 198150   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:54:54,114-Speed 3290.68 samples/sec   Loss 1.7575   LearningRate 0.0041   Epoch: 15   Global Step: 198160   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:54:57,171-Speed 3350.37 samples/sec   Loss 1.7327   LearningRate 0.0041   Epoch: 15   Global Step: 198170   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:55:00,262-Speed 3313.97 samples/sec   Loss 1.6990   LearningRate 0.0041   Epoch: 15   Global Step: 198180   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:55:03,404-Speed 3260.49 samples/sec   Loss 1.7515   LearningRate 0.0041   Epoch: 15   Global Step: 198190   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:55:06,550-Speed 3255.61 samples/sec   Loss 1.7770   LearningRate 0.0041   Epoch: 15   Global Step: 198200   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:55:09,601-Speed 3357.24 samples/sec   Loss 1.6991   LearningRate 0.0041   Epoch: 15   Global Step: 198210   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:55:12,771-Speed 3231.53 samples/sec   Loss 1.7656   LearningRate 0.0041   Epoch: 15   Global Step: 198220   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:55:15,822-Speed 3357.07 samples/sec   Loss 1.7039   LearningRate 0.0041   Epoch: 15   Global Step: 198230   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:55:18,903-Speed 3325.11 samples/sec   Loss 1.7571   LearningRate 0.0041   Epoch: 15   Global Step: 198240   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:55:21,958-Speed 3352.47 samples/sec   Loss 1.7542   LearningRate 0.0041   Epoch: 15   Global Step: 198250   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:55:25,042-Speed 3321.18 samples/sec   Loss 1.8141   LearningRate 0.0041   Epoch: 15   Global Step: 198260   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:55:28,184-Speed 3260.41 samples/sec   Loss 1.7841   LearningRate 0.0041   Epoch: 15   Global Step: 198270   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:55:31,299-Speed 3288.21 samples/sec   Loss 1.7499   LearningRate 0.0041   Epoch: 15   Global Step: 198280   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:55:34,375-Speed 3330.32 samples/sec   Loss 1.7978   LearningRate 0.0041   Epoch: 15   Global Step: 198290   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:55:37,453-Speed 3329.58 samples/sec   Loss 1.6566   LearningRate 0.0041   Epoch: 15   Global Step: 198300   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:55:40,577-Speed 3278.29 samples/sec   Loss 1.7694   LearningRate 0.0041   Epoch: 15   Global Step: 198310   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:55:43,708-Speed 3272.30 samples/sec   Loss 1.7694   LearningRate 0.0041   Epoch: 15   Global Step: 198320   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:55:46,796-Speed 3316.96 samples/sec   Loss 1.7678   LearningRate 0.0041   Epoch: 15   Global Step: 198330   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:55:49,955-Speed 3241.97 samples/sec   Loss 1.7805   LearningRate 0.0041   Epoch: 15   Global Step: 198340   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:55:53,078-Speed 3280.47 samples/sec   Loss 1.7469   LearningRate 0.0041   Epoch: 15   Global Step: 198350   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:55:56,187-Speed 3294.44 samples/sec   Loss 1.7157   LearningRate 0.0041   Epoch: 15   Global Step: 198360   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:55:59,256-Speed 3337.85 samples/sec   Loss 1.7979   LearningRate 0.0041   Epoch: 15   Global Step: 198370   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:56:02,326-Speed 3336.58 samples/sec   Loss 1.7718   LearningRate 0.0041   Epoch: 15   Global Step: 198380   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:56:05,411-Speed 3320.13 samples/sec   Loss 1.7897   LearningRate 0.0041   Epoch: 15   Global Step: 198390   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:56:08,482-Speed 3335.51 samples/sec   Loss 1.7409   LearningRate 0.0041   Epoch: 15   Global Step: 198400   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:56:11,599-Speed 3286.07 samples/sec   Loss 1.8025   LearningRate 0.0041   Epoch: 15   Global Step: 198410   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:56:14,690-Speed 3314.34 samples/sec   Loss 1.7846   LearningRate 0.0041   Epoch: 15   Global Step: 198420   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:56:17,779-Speed 3315.81 samples/sec   Loss 1.6927   LearningRate 0.0040   Epoch: 15   Global Step: 198430   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:56:20,888-Speed 3294.55 samples/sec   Loss 1.6961   LearningRate 0.0040   Epoch: 15   Global Step: 198440   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 18:56:24,026-Speed 3264.65 samples/sec   Loss 1.7289   LearningRate 0.0040   Epoch: 15   Global Step: 198450   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:56:27,148-Speed 3281.15 samples/sec   Loss 1.7272   LearningRate 0.0040   Epoch: 15   Global Step: 198460   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:56:30,313-Speed 3236.19 samples/sec   Loss 1.7225   LearningRate 0.0040   Epoch: 15   Global Step: 198470   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:56:33,378-Speed 3341.95 samples/sec   Loss 1.6860   LearningRate 0.0040   Epoch: 15   Global Step: 198480   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:56:36,545-Speed 3234.87 samples/sec   Loss 1.7173   LearningRate 0.0040   Epoch: 15   Global Step: 198490   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:56:39,604-Speed 3347.60 samples/sec   Loss 1.7481   LearningRate 0.0040   Epoch: 15   Global Step: 198500   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:56:42,683-Speed 3328.04 samples/sec   Loss 1.7367   LearningRate 0.0040   Epoch: 15   Global Step: 198510   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:56:45,750-Speed 3339.26 samples/sec   Loss 1.8026   LearningRate 0.0040   Epoch: 15   Global Step: 198520   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:56:48,859-Speed 3295.75 samples/sec   Loss 1.7199   LearningRate 0.0040   Epoch: 15   Global Step: 198530   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:56:51,961-Speed 3301.28 samples/sec   Loss 1.7550   LearningRate 0.0040   Epoch: 15   Global Step: 198540   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:56:55,035-Speed 3332.34 samples/sec   Loss 1.6932   LearningRate 0.0040   Epoch: 15   Global Step: 198550   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 18:56:58,142-Speed 3296.96 samples/sec   Loss 1.7170   LearningRate 0.0040   Epoch: 15   Global Step: 198560   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 18:57:01,204-Speed 3345.71 samples/sec   Loss 1.8109   LearningRate 0.0040   Epoch: 15   Global Step: 198570   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 18:57:04,359-Speed 3246.17 samples/sec   Loss 1.7901   LearningRate 0.0040   Epoch: 15   Global Step: 198580   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:57:07,422-Speed 3344.53 samples/sec   Loss 1.7927   LearningRate 0.0040   Epoch: 15   Global Step: 198590   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:57:10,468-Speed 3362.27 samples/sec   Loss 1.7774   LearningRate 0.0040   Epoch: 15   Global Step: 198600   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:57:13,547-Speed 3327.60 samples/sec   Loss 1.7462   LearningRate 0.0040   Epoch: 15   Global Step: 198610   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:57:16,669-Speed 3280.41 samples/sec   Loss 1.7541   LearningRate 0.0040   Epoch: 15   Global Step: 198620   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:57:19,755-Speed 3319.79 samples/sec   Loss 1.7374   LearningRate 0.0040   Epoch: 15   Global Step: 198630   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:57:22,887-Speed 3270.39 samples/sec   Loss 1.7717   LearningRate 0.0040   Epoch: 15   Global Step: 198640   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:57:26,047-Speed 3241.78 samples/sec   Loss 1.7826   LearningRate 0.0040   Epoch: 15   Global Step: 198650   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:57:29,137-Speed 3314.49 samples/sec   Loss 1.7750   LearningRate 0.0040   Epoch: 15   Global Step: 198660   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:57:32,218-Speed 3324.71 samples/sec   Loss 1.7172   LearningRate 0.0040   Epoch: 15   Global Step: 198670   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:57:35,342-Speed 3279.07 samples/sec   Loss 1.7674   LearningRate 0.0040   Epoch: 15   Global Step: 198680   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 18:57:38,403-Speed 3346.09 samples/sec   Loss 1.7835   LearningRate 0.0040   Epoch: 15   Global Step: 198690   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:57:41,475-Speed 3334.12 samples/sec   Loss 1.7607   LearningRate 0.0040   Epoch: 15   Global Step: 198700   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:57:44,562-Speed 3318.99 samples/sec   Loss 1.7474   LearningRate 0.0040   Epoch: 15   Global Step: 198710   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:57:47,669-Speed 3296.91 samples/sec   Loss 1.7844   LearningRate 0.0040   Epoch: 15   Global Step: 198720   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:57:50,941-Speed 3130.63 samples/sec   Loss 1.7282   LearningRate 0.0040   Epoch: 15   Global Step: 198730   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:58:22,736-Speed 322.09 samples/sec   Loss 1.5482   LearningRate 0.0040   Epoch: 16   Global Step: 198740   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:58:26,313-Speed 2863.61 samples/sec   Loss 1.3278   LearningRate 0.0040   Epoch: 16   Global Step: 198750   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:58:29,508-Speed 3205.39 samples/sec   Loss 1.3362   LearningRate 0.0040   Epoch: 16   Global Step: 198760   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:58:32,600-Speed 3314.23 samples/sec   Loss 1.3548   LearningRate 0.0040   Epoch: 16   Global Step: 198770   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:58:35,682-Speed 3323.11 samples/sec   Loss 1.2549   LearningRate 0.0040   Epoch: 16   Global Step: 198780   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:58:38,848-Speed 3235.40 samples/sec   Loss 1.2861   LearningRate 0.0040   Epoch: 16   Global Step: 198790   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 18:58:41,919-Speed 3334.80 samples/sec   Loss 1.3060   LearningRate 0.0040   Epoch: 16   Global Step: 198800   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:58:44,994-Speed 3332.03 samples/sec   Loss 1.2763   LearningRate 0.0040   Epoch: 16   Global Step: 198810   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:58:48,073-Speed 3326.95 samples/sec   Loss 1.3100   LearningRate 0.0040   Epoch: 16   Global Step: 198820   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:58:51,136-Speed 3344.15 samples/sec   Loss 1.2675   LearningRate 0.0040   Epoch: 16   Global Step: 198830   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:58:54,208-Speed 3334.21 samples/sec   Loss 1.2710   LearningRate 0.0040   Epoch: 16   Global Step: 198840   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:58:57,276-Speed 3339.27 samples/sec   Loss 1.2705   LearningRate 0.0040   Epoch: 16   Global Step: 198850   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:59:00,457-Speed 3219.82 samples/sec   Loss 1.3113   LearningRate 0.0040   Epoch: 16   Global Step: 198860   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:59:03,627-Speed 3231.61 samples/sec   Loss 1.2666   LearningRate 0.0040   Epoch: 16   Global Step: 198870   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:59:06,728-Speed 3303.31 samples/sec   Loss 1.2935   LearningRate 0.0040   Epoch: 16   Global Step: 198880   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:59:09,804-Speed 3329.47 samples/sec   Loss 1.2864   LearningRate 0.0040   Epoch: 16   Global Step: 198890   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:59:12,900-Speed 3309.19 samples/sec   Loss 1.2689   LearningRate 0.0040   Epoch: 16   Global Step: 198900   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:59:16,194-Speed 3109.41 samples/sec   Loss 1.2443   LearningRate 0.0040   Epoch: 16   Global Step: 198910   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:59:19,395-Speed 3200.09 samples/sec   Loss 1.2406   LearningRate 0.0040   Epoch: 16   Global Step: 198920   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:59:22,667-Speed 3130.14 samples/sec   Loss 1.2856   LearningRate 0.0040   Epoch: 16   Global Step: 198930   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:59:25,884-Speed 3184.62 samples/sec   Loss 1.2875   LearningRate 0.0040   Epoch: 16   Global Step: 198940   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:59:29,129-Speed 3156.47 samples/sec   Loss 1.2929   LearningRate 0.0040   Epoch: 16   Global Step: 198950   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:59:32,213-Speed 3321.35 samples/sec   Loss 1.3014   LearningRate 0.0040   Epoch: 16   Global Step: 198960   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:59:35,275-Speed 3345.99 samples/sec   Loss 1.2890   LearningRate 0.0040   Epoch: 16   Global Step: 198970   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:59:38,381-Speed 3297.89 samples/sec   Loss 1.2953   LearningRate 0.0040   Epoch: 16   Global Step: 198980   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:59:41,553-Speed 3228.79 samples/sec   Loss 1.2703   LearningRate 0.0040   Epoch: 16   Global Step: 198990   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:59:44,617-Speed 3342.94 samples/sec   Loss 1.2692   LearningRate 0.0040   Epoch: 16   Global Step: 199000   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 18:59:47,675-Speed 3350.59 samples/sec   Loss 1.2754   LearningRate 0.0040   Epoch: 16   Global Step: 199010   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:59:50,735-Speed 3346.65 samples/sec   Loss 1.2818   LearningRate 0.0040   Epoch: 16   Global Step: 199020   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:59:53,797-Speed 3345.46 samples/sec   Loss 1.3163   LearningRate 0.0040   Epoch: 16   Global Step: 199030   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 18:59:56,906-Speed 3294.68 samples/sec   Loss 1.3059   LearningRate 0.0040   Epoch: 16   Global Step: 199040   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:00:00,021-Speed 3288.13 samples/sec   Loss 1.3297   LearningRate 0.0039   Epoch: 16   Global Step: 199050   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:00:03,131-Speed 3293.76 samples/sec   Loss 1.2853   LearningRate 0.0039   Epoch: 16   Global Step: 199060   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:00:06,281-Speed 3251.86 samples/sec   Loss 1.2673   LearningRate 0.0039   Epoch: 16   Global Step: 199070   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:00:09,374-Speed 3311.85 samples/sec   Loss 1.3228   LearningRate 0.0039   Epoch: 16   Global Step: 199080   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:00:12,482-Speed 3296.02 samples/sec   Loss 1.2651   LearningRate 0.0039   Epoch: 16   Global Step: 199090   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:00:15,658-Speed 3225.09 samples/sec   Loss 1.2545   LearningRate 0.0039   Epoch: 16   Global Step: 199100   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:00:18,736-Speed 3328.04 samples/sec   Loss 1.2544   LearningRate 0.0039   Epoch: 16   Global Step: 199110   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:00:21,773-Speed 3372.73 samples/sec   Loss 1.3102   LearningRate 0.0039   Epoch: 16   Global Step: 199120   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:00:24,951-Speed 3223.35 samples/sec   Loss 1.2492   LearningRate 0.0039   Epoch: 16   Global Step: 199130   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:00:28,092-Speed 3261.55 samples/sec   Loss 1.3039   LearningRate 0.0039   Epoch: 16   Global Step: 199140   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:00:31,175-Speed 3321.88 samples/sec   Loss 1.2348   LearningRate 0.0039   Epoch: 16   Global Step: 199150   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:00:34,261-Speed 3320.52 samples/sec   Loss 1.2488   LearningRate 0.0039   Epoch: 16   Global Step: 199160   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:00:37,411-Speed 3251.08 samples/sec   Loss 1.3230   LearningRate 0.0039   Epoch: 16   Global Step: 199170   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:00:40,516-Speed 3299.81 samples/sec   Loss 1.3155   LearningRate 0.0039   Epoch: 16   Global Step: 199180   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:00:43,651-Speed 3267.47 samples/sec   Loss 1.2984   LearningRate 0.0039   Epoch: 16   Global Step: 199190   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:00:46,740-Speed 3315.88 samples/sec   Loss 1.2900   LearningRate 0.0039   Epoch: 16   Global Step: 199200   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:00:49,848-Speed 3295.67 samples/sec   Loss 1.3079   LearningRate 0.0039   Epoch: 16   Global Step: 199210   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:00:52,946-Speed 3306.34 samples/sec   Loss 1.2767   LearningRate 0.0039   Epoch: 16   Global Step: 199220   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:00:55,994-Speed 3361.02 samples/sec   Loss 1.2600   LearningRate 0.0039   Epoch: 16   Global Step: 199230   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:00:59,059-Speed 3341.58 samples/sec   Loss 1.3493   LearningRate 0.0039   Epoch: 16   Global Step: 199240   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:01:02,179-Speed 3283.69 samples/sec   Loss 1.2892   LearningRate 0.0039   Epoch: 16   Global Step: 199250   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:01:05,294-Speed 3288.39 samples/sec   Loss 1.2456   LearningRate 0.0039   Epoch: 16   Global Step: 199260   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:01:08,372-Speed 3328.17 samples/sec   Loss 1.2430   LearningRate 0.0039   Epoch: 16   Global Step: 199270   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:01:11,468-Speed 3308.73 samples/sec   Loss 1.2627   LearningRate 0.0039   Epoch: 16   Global Step: 199280   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:01:14,530-Speed 3344.88 samples/sec   Loss 1.2795   LearningRate 0.0039   Epoch: 16   Global Step: 199290   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:01:17,628-Speed 3306.42 samples/sec   Loss 1.2897   LearningRate 0.0039   Epoch: 16   Global Step: 199300   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:01:20,692-Speed 3343.31 samples/sec   Loss 1.2446   LearningRate 0.0039   Epoch: 16   Global Step: 199310   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:01:23,758-Speed 3340.77 samples/sec   Loss 1.3204   LearningRate 0.0039   Epoch: 16   Global Step: 199320   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:01:26,863-Speed 3299.21 samples/sec   Loss 1.3642   LearningRate 0.0039   Epoch: 16   Global Step: 199330   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:01:29,993-Speed 3272.70 samples/sec   Loss 1.3071   LearningRate 0.0039   Epoch: 16   Global Step: 199340   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:01:33,103-Speed 3293.36 samples/sec   Loss 1.3087   LearningRate 0.0039   Epoch: 16   Global Step: 199350   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:01:36,297-Speed 3207.39 samples/sec   Loss 1.3111   LearningRate 0.0039   Epoch: 16   Global Step: 199360   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:01:39,406-Speed 3295.01 samples/sec   Loss 1.2961   LearningRate 0.0039   Epoch: 16   Global Step: 199370   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:01:42,568-Speed 3239.05 samples/sec   Loss 1.2625   LearningRate 0.0039   Epoch: 16   Global Step: 199380   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:01:45,691-Speed 3280.32 samples/sec   Loss 1.2909   LearningRate 0.0039   Epoch: 16   Global Step: 199390   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:01:48,806-Speed 3288.06 samples/sec   Loss 1.2859   LearningRate 0.0039   Epoch: 16   Global Step: 199400   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:01:51,933-Speed 3275.24 samples/sec   Loss 1.3441   LearningRate 0.0039   Epoch: 16   Global Step: 199410   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:01:55,038-Speed 3299.42 samples/sec   Loss 1.2912   LearningRate 0.0039   Epoch: 16   Global Step: 199420   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:01:58,090-Speed 3355.86 samples/sec   Loss 1.3234   LearningRate 0.0039   Epoch: 16   Global Step: 199430   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:02:01,196-Speed 3298.82 samples/sec   Loss 1.2973   LearningRate 0.0039   Epoch: 16   Global Step: 199440   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:02:04,261-Speed 3341.39 samples/sec   Loss 1.2901   LearningRate 0.0039   Epoch: 16   Global Step: 199450   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:02:07,340-Speed 3327.49 samples/sec   Loss 1.3178   LearningRate 0.0039   Epoch: 16   Global Step: 199460   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:02:10,429-Speed 3316.26 samples/sec   Loss 1.3234   LearningRate 0.0039   Epoch: 16   Global Step: 199470   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:02:13,538-Speed 3293.62 samples/sec   Loss 1.3260   LearningRate 0.0039   Epoch: 16   Global Step: 199480   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:02:16,612-Speed 3332.52 samples/sec   Loss 1.2979   LearningRate 0.0039   Epoch: 16   Global Step: 199490   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:02:19,695-Speed 3322.77 samples/sec   Loss 1.3374   LearningRate 0.0039   Epoch: 16   Global Step: 199500   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:02:22,781-Speed 3319.44 samples/sec   Loss 1.2724   LearningRate 0.0039   Epoch: 16   Global Step: 199510   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:02:25,848-Speed 3340.07 samples/sec   Loss 1.3072   LearningRate 0.0039   Epoch: 16   Global Step: 199520   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:02:28,901-Speed 3355.03 samples/sec   Loss 1.3506   LearningRate 0.0039   Epoch: 16   Global Step: 199530   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:02:31,973-Speed 3334.56 samples/sec   Loss 1.2941   LearningRate 0.0039   Epoch: 16   Global Step: 199540   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:02:35,087-Speed 3289.77 samples/sec   Loss 1.3588   LearningRate 0.0039   Epoch: 16   Global Step: 199550   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:02:38,240-Speed 3248.81 samples/sec   Loss 1.3396   LearningRate 0.0039   Epoch: 16   Global Step: 199560   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:02:41,328-Speed 3316.48 samples/sec   Loss 1.2938   LearningRate 0.0039   Epoch: 16   Global Step: 199570   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:02:44,429-Speed 3302.95 samples/sec   Loss 1.3577   LearningRate 0.0039   Epoch: 16   Global Step: 199580   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:02:47,543-Speed 3290.01 samples/sec   Loss 1.3012   LearningRate 0.0039   Epoch: 16   Global Step: 199590   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:02:50,669-Speed 3276.71 samples/sec   Loss 1.3177   LearningRate 0.0039   Epoch: 16   Global Step: 199600   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:02:53,845-Speed 3225.66 samples/sec   Loss 1.2818   LearningRate 0.0039   Epoch: 16   Global Step: 199610   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:02:56,930-Speed 3320.45 samples/sec   Loss 1.2721   LearningRate 0.0039   Epoch: 16   Global Step: 199620   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:03:00,004-Speed 3331.94 samples/sec   Loss 1.2445   LearningRate 0.0039   Epoch: 16   Global Step: 199630   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:03:03,141-Speed 3265.11 samples/sec   Loss 1.2676   LearningRate 0.0039   Epoch: 16   Global Step: 199640   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:03:06,214-Speed 3333.27 samples/sec   Loss 1.2918   LearningRate 0.0039   Epoch: 16   Global Step: 199650   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:03:09,279-Speed 3342.34 samples/sec   Loss 1.2580   LearningRate 0.0039   Epoch: 16   Global Step: 199660   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:03:12,371-Speed 3312.69 samples/sec   Loss 1.2726   LearningRate 0.0039   Epoch: 16   Global Step: 199670   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:03:15,474-Speed 3301.74 samples/sec   Loss 1.3054   LearningRate 0.0038   Epoch: 16   Global Step: 199680   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:03:18,590-Speed 3287.61 samples/sec   Loss 1.2811   LearningRate 0.0038   Epoch: 16   Global Step: 199690   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:03:21,677-Speed 3317.41 samples/sec   Loss 1.2568   LearningRate 0.0038   Epoch: 16   Global Step: 199700   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:03:24,776-Speed 3305.42 samples/sec   Loss 1.3155   LearningRate 0.0038   Epoch: 16   Global Step: 199710   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:03:27,837-Speed 3347.02 samples/sec   Loss 1.3068   LearningRate 0.0038   Epoch: 16   Global Step: 199720   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:03:30,928-Speed 3313.16 samples/sec   Loss 1.3157   LearningRate 0.0038   Epoch: 16   Global Step: 199730   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:03:34,042-Speed 3290.42 samples/sec   Loss 1.3134   LearningRate 0.0038   Epoch: 16   Global Step: 199740   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:03:37,126-Speed 3321.41 samples/sec   Loss 1.2960   LearningRate 0.0038   Epoch: 16   Global Step: 199750   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:03:40,250-Speed 3278.06 samples/sec   Loss 1.2871   LearningRate 0.0038   Epoch: 16   Global Step: 199760   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:03:43,328-Speed 3328.29 samples/sec   Loss 1.3058   LearningRate 0.0038   Epoch: 16   Global Step: 199770   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:03:46,394-Speed 3341.13 samples/sec   Loss 1.2724   LearningRate 0.0038   Epoch: 16   Global Step: 199780   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:03:49,538-Speed 3258.43 samples/sec   Loss 1.3028   LearningRate 0.0038   Epoch: 16   Global Step: 199790   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:03:52,624-Speed 3319.59 samples/sec   Loss 1.2975   LearningRate 0.0038   Epoch: 16   Global Step: 199800   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:03:55,730-Speed 3296.95 samples/sec   Loss 1.3056   LearningRate 0.0038   Epoch: 16   Global Step: 199810   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:03:58,798-Speed 3339.67 samples/sec   Loss 1.3038   LearningRate 0.0038   Epoch: 16   Global Step: 199820   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:04:01,859-Speed 3346.02 samples/sec   Loss 1.2922   LearningRate 0.0038   Epoch: 16   Global Step: 199830   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:04:04,921-Speed 3345.32 samples/sec   Loss 1.3402   LearningRate 0.0038   Epoch: 16   Global Step: 199840   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:04:08,000-Speed 3326.62 samples/sec   Loss 1.3603   LearningRate 0.0038   Epoch: 16   Global Step: 199850   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:04:11,137-Speed 3265.27 samples/sec   Loss 1.2927   LearningRate 0.0038   Epoch: 16   Global Step: 199860   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:04:14,203-Speed 3341.59 samples/sec   Loss 1.3013   LearningRate 0.0038   Epoch: 16   Global Step: 199870   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:04:17,278-Speed 3330.92 samples/sec   Loss 1.3296   LearningRate 0.0038   Epoch: 16   Global Step: 199880   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:04:20,350-Speed 3333.51 samples/sec   Loss 1.3075   LearningRate 0.0038   Epoch: 16   Global Step: 199890   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:04:23,417-Speed 3340.10 samples/sec   Loss 1.2986   LearningRate 0.0038   Epoch: 16   Global Step: 199900   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:04:26,480-Speed 3344.49 samples/sec   Loss 1.2550   LearningRate 0.0038   Epoch: 16   Global Step: 199910   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:04:29,579-Speed 3305.30 samples/sec   Loss 1.2869   LearningRate 0.0038   Epoch: 16   Global Step: 199920   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:04:32,659-Speed 3325.55 samples/sec   Loss 1.2749   LearningRate 0.0038   Epoch: 16   Global Step: 199930   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:04:35,717-Speed 3349.22 samples/sec   Loss 1.2403   LearningRate 0.0038   Epoch: 16   Global Step: 199940   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:04:38,776-Speed 3348.56 samples/sec   Loss 1.3323   LearningRate 0.0038   Epoch: 16   Global Step: 199950   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:04:41,852-Speed 3330.84 samples/sec   Loss 1.2957   LearningRate 0.0038   Epoch: 16   Global Step: 199960   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:04:44,935-Speed 3322.59 samples/sec   Loss 1.2717   LearningRate 0.0038   Epoch: 16   Global Step: 199970   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:04:47,994-Speed 3347.67 samples/sec   Loss 1.3225   LearningRate 0.0038   Epoch: 16   Global Step: 199980   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:04:51,070-Speed 3329.87 samples/sec   Loss 1.2781   LearningRate 0.0038   Epoch: 16   Global Step: 199990   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:04:54,172-Speed 3302.13 samples/sec   Loss 1.2844   LearningRate 0.0038   Epoch: 16   Global Step: 200000   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:04:57,234-Speed 3345.24 samples/sec   Loss 1.2609   LearningRate 0.0038   Epoch: 16   Global Step: 200010   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:05:00,331-Speed 3307.84 samples/sec   Loss 1.3107   LearningRate 0.0038   Epoch: 16   Global Step: 200020   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:05:03,452-Speed 3282.02 samples/sec   Loss 1.3117   LearningRate 0.0038   Epoch: 16   Global Step: 200030   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:05:06,653-Speed 3199.83 samples/sec   Loss 1.3152   LearningRate 0.0038   Epoch: 16   Global Step: 200040   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:05:09,726-Speed 3333.25 samples/sec   Loss 1.2642   LearningRate 0.0038   Epoch: 16   Global Step: 200050   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:05:12,793-Speed 3340.75 samples/sec   Loss 1.3397   LearningRate 0.0038   Epoch: 16   Global Step: 200060   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:05:15,886-Speed 3311.58 samples/sec   Loss 1.3736   LearningRate 0.0038   Epoch: 16   Global Step: 200070   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:05:18,985-Speed 3305.10 samples/sec   Loss 1.3249   LearningRate 0.0038   Epoch: 16   Global Step: 200080   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:05:22,057-Speed 3333.81 samples/sec   Loss 1.3165   LearningRate 0.0038   Epoch: 16   Global Step: 200090   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:05:25,119-Speed 3345.26 samples/sec   Loss 1.3297   LearningRate 0.0038   Epoch: 16   Global Step: 200100   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:05:28,209-Speed 3315.14 samples/sec   Loss 1.3388   LearningRate 0.0038   Epoch: 16   Global Step: 200110   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:05:31,266-Speed 3351.60 samples/sec   Loss 1.3228   LearningRate 0.0038   Epoch: 16   Global Step: 200120   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:05:34,340-Speed 3331.62 samples/sec   Loss 1.2984   LearningRate 0.0038   Epoch: 16   Global Step: 200130   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:05:37,444-Speed 3299.92 samples/sec   Loss 1.3299   LearningRate 0.0038   Epoch: 16   Global Step: 200140   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:05:40,613-Speed 3232.91 samples/sec   Loss 1.3408   LearningRate 0.0038   Epoch: 16   Global Step: 200150   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:05:43,672-Speed 3348.46 samples/sec   Loss 1.3458   LearningRate 0.0038   Epoch: 16   Global Step: 200160   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:05:46,744-Speed 3336.80 samples/sec   Loss 1.3251   LearningRate 0.0038   Epoch: 16   Global Step: 200170   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:05:49,832-Speed 3317.38 samples/sec   Loss 1.3286   LearningRate 0.0038   Epoch: 16   Global Step: 200180   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:05:53,037-Speed 3195.63 samples/sec   Loss 1.3733   LearningRate 0.0038   Epoch: 16   Global Step: 200190   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:05:56,115-Speed 3327.41 samples/sec   Loss 1.3017   LearningRate 0.0038   Epoch: 16   Global Step: 200200   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:05:59,213-Speed 3306.64 samples/sec   Loss 1.3353   LearningRate 0.0038   Epoch: 16   Global Step: 200210   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:06:02,410-Speed 3204.07 samples/sec   Loss 1.3867   LearningRate 0.0038   Epoch: 16   Global Step: 200220   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:06:05,601-Speed 3209.64 samples/sec   Loss 1.3125   LearningRate 0.0038   Epoch: 16   Global Step: 200230   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:06:08,684-Speed 3323.29 samples/sec   Loss 1.3691   LearningRate 0.0038   Epoch: 16   Global Step: 200240   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:06:11,734-Speed 3358.01 samples/sec   Loss 1.3135   LearningRate 0.0038   Epoch: 16   Global Step: 200250   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:06:15,732-Speed 2561.72 samples/sec   Loss 1.3126   LearningRate 0.0038   Epoch: 16   Global Step: 200260   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:06:18,879-Speed 3255.05 samples/sec   Loss 1.3152   LearningRate 0.0038   Epoch: 16   Global Step: 200270   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:06:21,954-Speed 3331.12 samples/sec   Loss 1.2735   LearningRate 0.0038   Epoch: 16   Global Step: 200280   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:06:25,074-Speed 3282.88 samples/sec   Loss 1.3038   LearningRate 0.0038   Epoch: 16   Global Step: 200290   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:06:28,258-Speed 3217.65 samples/sec   Loss 1.3242   LearningRate 0.0038   Epoch: 16   Global Step: 200300   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:06:31,330-Speed 3334.17 samples/sec   Loss 1.3350   LearningRate 0.0038   Epoch: 16   Global Step: 200310   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:06:34,445-Speed 3288.54 samples/sec   Loss 1.3158   LearningRate 0.0037   Epoch: 16   Global Step: 200320   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:06:37,576-Speed 3271.17 samples/sec   Loss 1.3267   LearningRate 0.0037   Epoch: 16   Global Step: 200330   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:06:40,697-Speed 3282.44 samples/sec   Loss 1.2940   LearningRate 0.0037   Epoch: 16   Global Step: 200340   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:06:43,790-Speed 3311.43 samples/sec   Loss 1.2895   LearningRate 0.0037   Epoch: 16   Global Step: 200350   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:06:46,917-Speed 3276.02 samples/sec   Loss 1.2921   LearningRate 0.0037   Epoch: 16   Global Step: 200360   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:06:50,086-Speed 3232.13 samples/sec   Loss 1.2857   LearningRate 0.0037   Epoch: 16   Global Step: 200370   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:06:53,251-Speed 3236.68 samples/sec   Loss 1.2983   LearningRate 0.0037   Epoch: 16   Global Step: 200380   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:06:56,366-Speed 3288.25 samples/sec   Loss 1.3333   LearningRate 0.0037   Epoch: 16   Global Step: 200390   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:06:59,517-Speed 3251.75 samples/sec   Loss 1.3640   LearningRate 0.0037   Epoch: 16   Global Step: 200400   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:07:02,620-Speed 3300.37 samples/sec   Loss 1.3188   LearningRate 0.0037   Epoch: 16   Global Step: 200410   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:07:05,677-Speed 3350.41 samples/sec   Loss 1.3264   LearningRate 0.0037   Epoch: 16   Global Step: 200420   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:07:08,735-Speed 3350.51 samples/sec   Loss 1.3558   LearningRate 0.0037   Epoch: 16   Global Step: 200430   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:07:11,836-Speed 3302.41 samples/sec   Loss 1.3367   LearningRate 0.0037   Epoch: 16   Global Step: 200440   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:07:14,922-Speed 3319.26 samples/sec   Loss 1.3134   LearningRate 0.0037   Epoch: 16   Global Step: 200450   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:07:18,045-Speed 3280.35 samples/sec   Loss 1.3238   LearningRate 0.0037   Epoch: 16   Global Step: 200460   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:07:21,099-Speed 3353.72 samples/sec   Loss 1.3411   LearningRate 0.0037   Epoch: 16   Global Step: 200470   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:07:24,172-Speed 3333.96 samples/sec   Loss 1.2824   LearningRate 0.0037   Epoch: 16   Global Step: 200480   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:07:27,293-Speed 3281.88 samples/sec   Loss 1.3097   LearningRate 0.0037   Epoch: 16   Global Step: 200490   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:07:30,485-Speed 3209.48 samples/sec   Loss 1.3136   LearningRate 0.0037   Epoch: 16   Global Step: 200500   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:07:33,585-Speed 3304.32 samples/sec   Loss 1.2954   LearningRate 0.0037   Epoch: 16   Global Step: 200510   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:07:36,812-Speed 3173.68 samples/sec   Loss 1.3464   LearningRate 0.0037   Epoch: 16   Global Step: 200520   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:07:40,007-Speed 3206.23 samples/sec   Loss 1.3105   LearningRate 0.0037   Epoch: 16   Global Step: 200530   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:07:43,159-Speed 3250.13 samples/sec   Loss 1.3232   LearningRate 0.0037   Epoch: 16   Global Step: 200540   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:07:46,212-Speed 3355.11 samples/sec   Loss 1.3262   LearningRate 0.0037   Epoch: 16   Global Step: 200550   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:07:49,366-Speed 3247.64 samples/sec   Loss 1.3049   LearningRate 0.0037   Epoch: 16   Global Step: 200560   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:07:52,566-Speed 3201.22 samples/sec   Loss 1.2958   LearningRate 0.0037   Epoch: 16   Global Step: 200570   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:07:55,706-Speed 3262.30 samples/sec   Loss 1.3757   LearningRate 0.0037   Epoch: 16   Global Step: 200580   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:07:58,788-Speed 3323.98 samples/sec   Loss 1.3489   LearningRate 0.0037   Epoch: 16   Global Step: 200590   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:08:01,898-Speed 3293.06 samples/sec   Loss 1.3400   LearningRate 0.0037   Epoch: 16   Global Step: 200600   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:08:04,993-Speed 3309.34 samples/sec   Loss 1.3010   LearningRate 0.0037   Epoch: 16   Global Step: 200610   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:08:08,071-Speed 3328.69 samples/sec   Loss 1.3133   LearningRate 0.0037   Epoch: 16   Global Step: 200620   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:08:11,232-Speed 3240.71 samples/sec   Loss 1.2665   LearningRate 0.0037   Epoch: 16   Global Step: 200630   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:08:14,341-Speed 3294.64 samples/sec   Loss 1.3203   LearningRate 0.0037   Epoch: 16   Global Step: 200640   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:08:17,456-Speed 3287.49 samples/sec   Loss 1.3263   LearningRate 0.0037   Epoch: 16   Global Step: 200650   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:08:20,551-Speed 3310.50 samples/sec   Loss 1.3048   LearningRate 0.0037   Epoch: 16   Global Step: 200660   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:08:23,692-Speed 3260.51 samples/sec   Loss 1.3644   LearningRate 0.0037   Epoch: 16   Global Step: 200670   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:08:26,804-Speed 3291.65 samples/sec   Loss 1.3852   LearningRate 0.0037   Epoch: 16   Global Step: 200680   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:08:29,891-Speed 3318.50 samples/sec   Loss 1.3153   LearningRate 0.0037   Epoch: 16   Global Step: 200690   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:08:33,043-Speed 3249.35 samples/sec   Loss 1.3739   LearningRate 0.0037   Epoch: 16   Global Step: 200700   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:08:36,139-Speed 3309.21 samples/sec   Loss 1.3339   LearningRate 0.0037   Epoch: 16   Global Step: 200710   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:08:39,322-Speed 3217.74 samples/sec   Loss 1.3052   LearningRate 0.0037   Epoch: 16   Global Step: 200720   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:08:42,421-Speed 3305.82 samples/sec   Loss 1.3219   LearningRate 0.0037   Epoch: 16   Global Step: 200730   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:08:45,485-Speed 3343.34 samples/sec   Loss 1.3019   LearningRate 0.0037   Epoch: 16   Global Step: 200740   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:08:48,631-Speed 3256.07 samples/sec   Loss 1.3351   LearningRate 0.0037   Epoch: 16   Global Step: 200750   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:08:51,695-Speed 3343.12 samples/sec   Loss 1.3297   LearningRate 0.0037   Epoch: 16   Global Step: 200760   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:08:54,802-Speed 3296.89 samples/sec   Loss 1.3549   LearningRate 0.0037   Epoch: 16   Global Step: 200770   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:08:57,896-Speed 3310.43 samples/sec   Loss 1.3906   LearningRate 0.0037   Epoch: 16   Global Step: 200780   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:09:01,014-Speed 3285.19 samples/sec   Loss 1.3667   LearningRate 0.0037   Epoch: 16   Global Step: 200790   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:09:04,132-Speed 3285.51 samples/sec   Loss 1.3582   LearningRate 0.0037   Epoch: 16   Global Step: 200800   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:09:07,206-Speed 3333.74 samples/sec   Loss 1.3442   LearningRate 0.0037   Epoch: 16   Global Step: 200810   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:09:10,258-Speed 3356.11 samples/sec   Loss 1.3343   LearningRate 0.0037   Epoch: 16   Global Step: 200820   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:09:13,327-Speed 3338.09 samples/sec   Loss 1.3602   LearningRate 0.0037   Epoch: 16   Global Step: 200830   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:09:16,396-Speed 3337.07 samples/sec   Loss 1.2914   LearningRate 0.0037   Epoch: 16   Global Step: 200840   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:09:19,518-Speed 3281.60 samples/sec   Loss 1.3454   LearningRate 0.0037   Epoch: 16   Global Step: 200850   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:09:22,594-Speed 3329.23 samples/sec   Loss 1.3194   LearningRate 0.0037   Epoch: 16   Global Step: 200860   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:09:25,691-Speed 3308.21 samples/sec   Loss 1.3234   LearningRate 0.0037   Epoch: 16   Global Step: 200870   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:09:28,859-Speed 3233.15 samples/sec   Loss 1.3274   LearningRate 0.0037   Epoch: 16   Global Step: 200880   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:09:31,978-Speed 3284.70 samples/sec   Loss 1.3110   LearningRate 0.0037   Epoch: 16   Global Step: 200890   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:09:35,090-Speed 3291.06 samples/sec   Loss 1.3513   LearningRate 0.0037   Epoch: 16   Global Step: 200900   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:09:38,187-Speed 3308.03 samples/sec   Loss 1.2972   LearningRate 0.0037   Epoch: 16   Global Step: 200910   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:09:41,333-Speed 3254.95 samples/sec   Loss 1.2959   LearningRate 0.0037   Epoch: 16   Global Step: 200920   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:09:44,491-Speed 3244.23 samples/sec   Loss 1.3277   LearningRate 0.0037   Epoch: 16   Global Step: 200930   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:09:47,572-Speed 3324.67 samples/sec   Loss 1.3326   LearningRate 0.0037   Epoch: 16   Global Step: 200940   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:09:50,684-Speed 3291.08 samples/sec   Loss 1.3456   LearningRate 0.0037   Epoch: 16   Global Step: 200950   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:09:53,788-Speed 3300.64 samples/sec   Loss 1.3066   LearningRate 0.0036   Epoch: 16   Global Step: 200960   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:09:56,866-Speed 3328.09 samples/sec   Loss 1.3712   LearningRate 0.0036   Epoch: 16   Global Step: 200970   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:09:59,956-Speed 3314.81 samples/sec   Loss 1.3155   LearningRate 0.0036   Epoch: 16   Global Step: 200980   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:10:03,041-Speed 3320.51 samples/sec   Loss 1.3047   LearningRate 0.0036   Epoch: 16   Global Step: 200990   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:10:06,156-Speed 3287.98 samples/sec   Loss 1.3369   LearningRate 0.0036   Epoch: 16   Global Step: 201000   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:10:09,210-Speed 3354.79 samples/sec   Loss 1.3072   LearningRate 0.0036   Epoch: 16   Global Step: 201010   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:10:12,342-Speed 3270.65 samples/sec   Loss 1.3243   LearningRate 0.0036   Epoch: 16   Global Step: 201020   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:10:15,440-Speed 3306.89 samples/sec   Loss 1.3710   LearningRate 0.0036   Epoch: 16   Global Step: 201030   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:10:18,509-Speed 3337.66 samples/sec   Loss 1.3147   LearningRate 0.0036   Epoch: 16   Global Step: 201040   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:10:21,563-Speed 3353.43 samples/sec   Loss 1.3435   LearningRate 0.0036   Epoch: 16   Global Step: 201050   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:10:24,652-Speed 3318.49 samples/sec   Loss 1.3431   LearningRate 0.0036   Epoch: 16   Global Step: 201060   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:10:27,754-Speed 3301.64 samples/sec   Loss 1.3735   LearningRate 0.0036   Epoch: 16   Global Step: 201070   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:10:30,807-Speed 3354.84 samples/sec   Loss 1.4119   LearningRate 0.0036   Epoch: 16   Global Step: 201080   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:10:33,868-Speed 3346.97 samples/sec   Loss 1.3518   LearningRate 0.0036   Epoch: 16   Global Step: 201090   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:10:36,981-Speed 3290.04 samples/sec   Loss 1.3352   LearningRate 0.0036   Epoch: 16   Global Step: 201100   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:10:40,090-Speed 3294.88 samples/sec   Loss 1.3764   LearningRate 0.0036   Epoch: 16   Global Step: 201110   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:10:43,160-Speed 3336.91 samples/sec   Loss 1.3282   LearningRate 0.0036   Epoch: 16   Global Step: 201120   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:10:46,215-Speed 3352.18 samples/sec   Loss 1.3246   LearningRate 0.0036   Epoch: 16   Global Step: 201130   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:10:49,330-Speed 3288.48 samples/sec   Loss 1.3722   LearningRate 0.0036   Epoch: 16   Global Step: 201140   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:10:52,515-Speed 3216.91 samples/sec   Loss 1.3836   LearningRate 0.0036   Epoch: 16   Global Step: 201150   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:10:55,616-Speed 3303.00 samples/sec   Loss 1.3356   LearningRate 0.0036   Epoch: 16   Global Step: 201160   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:10:58,689-Speed 3333.11 samples/sec   Loss 1.3369   LearningRate 0.0036   Epoch: 16   Global Step: 201170   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:11:01,753-Speed 3343.07 samples/sec   Loss 1.3357   LearningRate 0.0036   Epoch: 16   Global Step: 201180   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:11:04,898-Speed 3257.47 samples/sec   Loss 1.3814   LearningRate 0.0036   Epoch: 16   Global Step: 201190   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:11:07,957-Speed 3348.19 samples/sec   Loss 1.3262   LearningRate 0.0036   Epoch: 16   Global Step: 201200   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:11:11,032-Speed 3332.20 samples/sec   Loss 1.3408   LearningRate 0.0036   Epoch: 16   Global Step: 201210   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:11:14,183-Speed 3251.03 samples/sec   Loss 1.3839   LearningRate 0.0036   Epoch: 16   Global Step: 201220   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:11:17,330-Speed 3254.71 samples/sec   Loss 1.3181   LearningRate 0.0036   Epoch: 16   Global Step: 201230   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:11:20,475-Speed 3256.84 samples/sec   Loss 1.3676   LearningRate 0.0036   Epoch: 16   Global Step: 201240   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:11:23,587-Speed 3292.05 samples/sec   Loss 1.3347   LearningRate 0.0036   Epoch: 16   Global Step: 201250   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:11:26,682-Speed 3310.10 samples/sec   Loss 1.3154   LearningRate 0.0036   Epoch: 16   Global Step: 201260   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:11:29,792-Speed 3293.00 samples/sec   Loss 1.3557   LearningRate 0.0036   Epoch: 16   Global Step: 201270   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:11:32,850-Speed 3350.53 samples/sec   Loss 1.3793   LearningRate 0.0036   Epoch: 16   Global Step: 201280   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:11:35,993-Speed 3258.48 samples/sec   Loss 1.3637   LearningRate 0.0036   Epoch: 16   Global Step: 201290   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:11:39,184-Speed 3210.55 samples/sec   Loss 1.3019   LearningRate 0.0036   Epoch: 16   Global Step: 201300   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:11:42,238-Speed 3353.78 samples/sec   Loss 1.3570   LearningRate 0.0036   Epoch: 16   Global Step: 201310   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:11:45,310-Speed 3334.48 samples/sec   Loss 1.3386   LearningRate 0.0036   Epoch: 16   Global Step: 201320   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:11:48,373-Speed 3343.73 samples/sec   Loss 1.3143   LearningRate 0.0036   Epoch: 16   Global Step: 201330   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:11:51,479-Speed 3298.60 samples/sec   Loss 1.3656   LearningRate 0.0036   Epoch: 16   Global Step: 201340   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:11:54,611-Speed 3270.56 samples/sec   Loss 1.3876   LearningRate 0.0036   Epoch: 16   Global Step: 201350   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:11:57,695-Speed 3321.85 samples/sec   Loss 1.3602   LearningRate 0.0036   Epoch: 16   Global Step: 201360   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:12:00,804-Speed 3294.03 samples/sec   Loss 1.3418   LearningRate 0.0036   Epoch: 16   Global Step: 201370   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:12:03,944-Speed 3262.72 samples/sec   Loss 1.3515   LearningRate 0.0036   Epoch: 16   Global Step: 201380   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:12:07,105-Speed 3239.57 samples/sec   Loss 1.3540   LearningRate 0.0036   Epoch: 16   Global Step: 201390   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:12:10,168-Speed 3345.20 samples/sec   Loss 1.3499   LearningRate 0.0036   Epoch: 16   Global Step: 201400   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:12:13,282-Speed 3289.44 samples/sec   Loss 1.3646   LearningRate 0.0036   Epoch: 16   Global Step: 201410   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:12:16,371-Speed 3315.70 samples/sec   Loss 1.3567   LearningRate 0.0036   Epoch: 16   Global Step: 201420   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:12:19,471-Speed 3303.83 samples/sec   Loss 1.2903   LearningRate 0.0036   Epoch: 16   Global Step: 201430   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:12:22,522-Speed 3357.98 samples/sec   Loss 1.3415   LearningRate 0.0036   Epoch: 16   Global Step: 201440   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:12:25,647-Speed 3277.76 samples/sec   Loss 1.3202   LearningRate 0.0036   Epoch: 16   Global Step: 201450   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:12:28,765-Speed 3285.72 samples/sec   Loss 1.3682   LearningRate 0.0036   Epoch: 16   Global Step: 201460   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:12:31,843-Speed 3327.54 samples/sec   Loss 1.3323   LearningRate 0.0036   Epoch: 16   Global Step: 201470   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:12:34,900-Speed 3350.16 samples/sec   Loss 1.3093   LearningRate 0.0036   Epoch: 16   Global Step: 201480   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:12:38,030-Speed 3273.49 samples/sec   Loss 1.3512   LearningRate 0.0036   Epoch: 16   Global Step: 201490   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:12:41,167-Speed 3265.32 samples/sec   Loss 1.3193   LearningRate 0.0036   Epoch: 16   Global Step: 201500   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:12:44,296-Speed 3273.81 samples/sec   Loss 1.3678   LearningRate 0.0036   Epoch: 16   Global Step: 201510   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:12:47,386-Speed 3314.57 samples/sec   Loss 1.3475   LearningRate 0.0036   Epoch: 16   Global Step: 201520   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:12:50,496-Speed 3294.17 samples/sec   Loss 1.3718   LearningRate 0.0036   Epoch: 16   Global Step: 201530   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:12:53,573-Speed 3328.92 samples/sec   Loss 1.3396   LearningRate 0.0036   Epoch: 16   Global Step: 201540   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:12:56,647-Speed 3332.11 samples/sec   Loss 1.2886   LearningRate 0.0036   Epoch: 16   Global Step: 201550   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:12:59,789-Speed 3262.35 samples/sec   Loss 1.3711   LearningRate 0.0036   Epoch: 16   Global Step: 201560   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:13:02,928-Speed 3263.33 samples/sec   Loss 1.3430   LearningRate 0.0036   Epoch: 16   Global Step: 201570   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:13:06,014-Speed 3318.65 samples/sec   Loss 1.2763   LearningRate 0.0036   Epoch: 16   Global Step: 201580   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:13:09,085-Speed 3335.62 samples/sec   Loss 1.3748   LearningRate 0.0036   Epoch: 16   Global Step: 201590   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:13:12,223-Speed 3264.51 samples/sec   Loss 1.3565   LearningRate 0.0036   Epoch: 16   Global Step: 201600   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:13:15,383-Speed 3240.82 samples/sec   Loss 1.3546   LearningRate 0.0036   Epoch: 16   Global Step: 201610   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:13:18,466-Speed 3322.56 samples/sec   Loss 1.3551   LearningRate 0.0035   Epoch: 16   Global Step: 201620   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:13:21,532-Speed 3341.03 samples/sec   Loss 1.3137   LearningRate 0.0035   Epoch: 16   Global Step: 201630   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:13:24,625-Speed 3311.89 samples/sec   Loss 1.3492   LearningRate 0.0035   Epoch: 16   Global Step: 201640   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:13:27,724-Speed 3305.72 samples/sec   Loss 1.3452   LearningRate 0.0035   Epoch: 16   Global Step: 201650   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:13:30,813-Speed 3316.10 samples/sec   Loss 1.3375   LearningRate 0.0035   Epoch: 16   Global Step: 201660   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:13:33,882-Speed 3337.62 samples/sec   Loss 1.3740   LearningRate 0.0035   Epoch: 16   Global Step: 201670   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:13:37,021-Speed 3263.24 samples/sec   Loss 1.3238   LearningRate 0.0035   Epoch: 16   Global Step: 201680   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:13:40,142-Speed 3282.56 samples/sec   Loss 1.3241   LearningRate 0.0035   Epoch: 16   Global Step: 201690   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:13:43,243-Speed 3302.35 samples/sec   Loss 1.3527   LearningRate 0.0035   Epoch: 16   Global Step: 201700   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:13:46,325-Speed 3323.58 samples/sec   Loss 1.3416   LearningRate 0.0035   Epoch: 16   Global Step: 201710   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:13:49,456-Speed 3272.52 samples/sec   Loss 1.3722   LearningRate 0.0035   Epoch: 16   Global Step: 201720   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:13:52,609-Speed 3247.96 samples/sec   Loss 1.3338   LearningRate 0.0035   Epoch: 16   Global Step: 201730   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:13:55,720-Speed 3293.29 samples/sec   Loss 1.2912   LearningRate 0.0035   Epoch: 16   Global Step: 201740   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:13:58,820-Speed 3303.43 samples/sec   Loss 1.3459   LearningRate 0.0035   Epoch: 16   Global Step: 201750   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:14:01,876-Speed 3351.81 samples/sec   Loss 1.3222   LearningRate 0.0035   Epoch: 16   Global Step: 201760   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:14:05,004-Speed 3275.44 samples/sec   Loss 1.3485   LearningRate 0.0035   Epoch: 16   Global Step: 201770   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:14:08,135-Speed 3271.58 samples/sec   Loss 1.3527   LearningRate 0.0035   Epoch: 16   Global Step: 201780   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:14:11,232-Speed 3307.16 samples/sec   Loss 1.3116   LearningRate 0.0035   Epoch: 16   Global Step: 201790   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:14:14,329-Speed 3307.07 samples/sec   Loss 1.3930   LearningRate 0.0035   Epoch: 16   Global Step: 201800   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:14:17,440-Speed 3293.20 samples/sec   Loss 1.2870   LearningRate 0.0035   Epoch: 16   Global Step: 201810   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:14:20,485-Speed 3363.82 samples/sec   Loss 1.3159   LearningRate 0.0035   Epoch: 16   Global Step: 201820   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:14:23,622-Speed 3265.19 samples/sec   Loss 1.3966   LearningRate 0.0035   Epoch: 16   Global Step: 201830   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:14:26,763-Speed 3261.29 samples/sec   Loss 1.3851   LearningRate 0.0035   Epoch: 16   Global Step: 201840   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:14:29,908-Speed 3256.95 samples/sec   Loss 1.3158   LearningRate 0.0035   Epoch: 16   Global Step: 201850   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:14:32,982-Speed 3332.12 samples/sec   Loss 1.3583   LearningRate 0.0035   Epoch: 16   Global Step: 201860   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:14:36,061-Speed 3326.39 samples/sec   Loss 1.3252   LearningRate 0.0035   Epoch: 16   Global Step: 201870   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:14:39,115-Speed 3354.70 samples/sec   Loss 1.3485   LearningRate 0.0035   Epoch: 16   Global Step: 201880   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:14:42,184-Speed 3337.61 samples/sec   Loss 1.3165   LearningRate 0.0035   Epoch: 16   Global Step: 201890   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:14:45,238-Speed 3354.55 samples/sec   Loss 1.3291   LearningRate 0.0035   Epoch: 16   Global Step: 201900   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:14:48,324-Speed 3318.16 samples/sec   Loss 1.3394   LearningRate 0.0035   Epoch: 16   Global Step: 201910   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:14:51,393-Speed 3340.61 samples/sec   Loss 1.3784   LearningRate 0.0035   Epoch: 16   Global Step: 201920   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:14:54,448-Speed 3353.36 samples/sec   Loss 1.3469   LearningRate 0.0035   Epoch: 16   Global Step: 201930   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:14:57,505-Speed 3351.21 samples/sec   Loss 1.3503   LearningRate 0.0035   Epoch: 16   Global Step: 201940   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:15:00,561-Speed 3351.65 samples/sec   Loss 1.3423   LearningRate 0.0035   Epoch: 16   Global Step: 201950   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:15:03,697-Speed 3265.46 samples/sec   Loss 1.3785   LearningRate 0.0035   Epoch: 16   Global Step: 201960   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:15:06,810-Speed 3291.30 samples/sec   Loss 1.3456   LearningRate 0.0035   Epoch: 16   Global Step: 201970   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:15:09,937-Speed 3275.80 samples/sec   Loss 1.3554   LearningRate 0.0035   Epoch: 16   Global Step: 201980   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:15:13,180-Speed 3158.16 samples/sec   Loss 1.3260   LearningRate 0.0035   Epoch: 16   Global Step: 201990   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:15:16,339-Speed 3242.30 samples/sec   Loss 1.3295   LearningRate 0.0035   Epoch: 16   Global Step: 202000   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:15:19,469-Speed 3272.52 samples/sec   Loss 1.3908   LearningRate 0.0035   Epoch: 16   Global Step: 202010   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:15:22,548-Speed 3327.04 samples/sec   Loss 1.3829   LearningRate 0.0035   Epoch: 16   Global Step: 202020   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:15:25,715-Speed 3234.58 samples/sec   Loss 1.3184   LearningRate 0.0035   Epoch: 16   Global Step: 202030   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:15:28,882-Speed 3234.03 samples/sec   Loss 1.3559   LearningRate 0.0035   Epoch: 16   Global Step: 202040   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:15:31,956-Speed 3332.32 samples/sec   Loss 1.3324   LearningRate 0.0035   Epoch: 16   Global Step: 202050   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:15:35,085-Speed 3273.59 samples/sec   Loss 1.3718   LearningRate 0.0035   Epoch: 16   Global Step: 202060   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:15:38,280-Speed 3206.40 samples/sec   Loss 1.4057   LearningRate 0.0035   Epoch: 16   Global Step: 202070   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:15:41,367-Speed 3318.21 samples/sec   Loss 1.3898   LearningRate 0.0035   Epoch: 16   Global Step: 202080   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:15:44,466-Speed 3305.40 samples/sec   Loss 1.3625   LearningRate 0.0035   Epoch: 16   Global Step: 202090   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:15:47,595-Speed 3273.55 samples/sec   Loss 1.3632   LearningRate 0.0035   Epoch: 16   Global Step: 202100   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:15:50,649-Speed 3353.59 samples/sec   Loss 1.3963   LearningRate 0.0035   Epoch: 16   Global Step: 202110   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:15:53,709-Speed 3348.34 samples/sec   Loss 1.3678   LearningRate 0.0035   Epoch: 16   Global Step: 202120   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:15:56,755-Speed 3362.58 samples/sec   Loss 1.3956   LearningRate 0.0035   Epoch: 16   Global Step: 202130   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:15:59,875-Speed 3282.55 samples/sec   Loss 1.3438   LearningRate 0.0035   Epoch: 16   Global Step: 202140   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:16:02,938-Speed 3344.52 samples/sec   Loss 1.3248   LearningRate 0.0035   Epoch: 16   Global Step: 202150   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:16:05,996-Speed 3350.10 samples/sec   Loss 1.3308   LearningRate 0.0035   Epoch: 16   Global Step: 202160   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:16:09,046-Speed 3358.41 samples/sec   Loss 1.3630   LearningRate 0.0035   Epoch: 16   Global Step: 202170   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:16:12,136-Speed 3314.39 samples/sec   Loss 1.3556   LearningRate 0.0035   Epoch: 16   Global Step: 202180   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:16:15,222-Speed 3319.69 samples/sec   Loss 1.3543   LearningRate 0.0035   Epoch: 16   Global Step: 202190   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:16:18,285-Speed 3344.24 samples/sec   Loss 1.4146   LearningRate 0.0035   Epoch: 16   Global Step: 202200   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:16:21,342-Speed 3350.30 samples/sec   Loss 1.3685   LearningRate 0.0035   Epoch: 16   Global Step: 202210   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:16:24,407-Speed 3341.91 samples/sec   Loss 1.3513   LearningRate 0.0035   Epoch: 16   Global Step: 202220   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:16:27,560-Speed 3249.46 samples/sec   Loss 1.3119   LearningRate 0.0035   Epoch: 16   Global Step: 202230   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:16:30,690-Speed 3272.58 samples/sec   Loss 1.3608   LearningRate 0.0035   Epoch: 16   Global Step: 202240   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:16:33,780-Speed 3314.14 samples/sec   Loss 1.3520   LearningRate 0.0035   Epoch: 16   Global Step: 202250   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:16:36,873-Speed 3312.65 samples/sec   Loss 1.3684   LearningRate 0.0035   Epoch: 16   Global Step: 202260   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:16:39,957-Speed 3320.29 samples/sec   Loss 1.3322   LearningRate 0.0035   Epoch: 16   Global Step: 202270   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:16:43,043-Speed 3319.61 samples/sec   Loss 1.3748   LearningRate 0.0034   Epoch: 16   Global Step: 202280   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:16:46,115-Speed 3334.84 samples/sec   Loss 1.3589   LearningRate 0.0034   Epoch: 16   Global Step: 202290   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:16:49,189-Speed 3332.65 samples/sec   Loss 1.3503   LearningRate 0.0034   Epoch: 16   Global Step: 202300   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:16:52,253-Speed 3342.54 samples/sec   Loss 1.3397   LearningRate 0.0034   Epoch: 16   Global Step: 202310   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:16:55,363-Speed 3293.35 samples/sec   Loss 1.3759   LearningRate 0.0034   Epoch: 16   Global Step: 202320   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:16:58,418-Speed 3353.44 samples/sec   Loss 1.3162   LearningRate 0.0034   Epoch: 16   Global Step: 202330   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:17:01,533-Speed 3288.03 samples/sec   Loss 1.3640   LearningRate 0.0034   Epoch: 16   Global Step: 202340   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:17:04,651-Speed 3285.01 samples/sec   Loss 1.3626   LearningRate 0.0034   Epoch: 16   Global Step: 202350   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:17:07,801-Speed 3251.81 samples/sec   Loss 1.3347   LearningRate 0.0034   Epoch: 16   Global Step: 202360   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:17:10,892-Speed 3314.36 samples/sec   Loss 1.3434   LearningRate 0.0034   Epoch: 16   Global Step: 202370   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:17:14,065-Speed 3227.86 samples/sec   Loss 1.3233   LearningRate 0.0034   Epoch: 16   Global Step: 202380   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:17:17,190-Speed 3277.66 samples/sec   Loss 1.3734   LearningRate 0.0034   Epoch: 16   Global Step: 202390   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:17:20,261-Speed 3335.61 samples/sec   Loss 1.3083   LearningRate 0.0034   Epoch: 16   Global Step: 202400   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:17:23,313-Speed 3356.04 samples/sec   Loss 1.4089   LearningRate 0.0034   Epoch: 16   Global Step: 202410   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:17:26,421-Speed 3296.04 samples/sec   Loss 1.3911   LearningRate 0.0034   Epoch: 16   Global Step: 202420   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:17:29,563-Speed 3260.02 samples/sec   Loss 1.3581   LearningRate 0.0034   Epoch: 16   Global Step: 202430   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:17:32,629-Speed 3341.86 samples/sec   Loss 1.3222   LearningRate 0.0034   Epoch: 16   Global Step: 202440   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:17:35,684-Speed 3352.71 samples/sec   Loss 1.3613   LearningRate 0.0034   Epoch: 16   Global Step: 202450   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:17:38,793-Speed 3294.37 samples/sec   Loss 1.3337   LearningRate 0.0034   Epoch: 16   Global Step: 202460   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:17:41,843-Speed 3358.68 samples/sec   Loss 1.3770   LearningRate 0.0034   Epoch: 16   Global Step: 202470   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:17:44,970-Speed 3275.88 samples/sec   Loss 1.3297   LearningRate 0.0034   Epoch: 16   Global Step: 202480   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:17:48,032-Speed 3345.71 samples/sec   Loss 1.3851   LearningRate 0.0034   Epoch: 16   Global Step: 202490   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:17:51,155-Speed 3279.47 samples/sec   Loss 1.3711   LearningRate 0.0034   Epoch: 16   Global Step: 202500   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:17:54,231-Speed 3329.94 samples/sec   Loss 1.3721   LearningRate 0.0034   Epoch: 16   Global Step: 202510   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:17:57,290-Speed 3348.55 samples/sec   Loss 1.3636   LearningRate 0.0034   Epoch: 16   Global Step: 202520   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:18:00,343-Speed 3355.55 samples/sec   Loss 1.3157   LearningRate 0.0034   Epoch: 16   Global Step: 202530   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:18:03,432-Speed 3315.63 samples/sec   Loss 1.3643   LearningRate 0.0034   Epoch: 16   Global Step: 202540   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:18:06,507-Speed 3331.48 samples/sec   Loss 1.3558   LearningRate 0.0034   Epoch: 16   Global Step: 202550   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:18:09,556-Speed 3359.61 samples/sec   Loss 1.3557   LearningRate 0.0034   Epoch: 16   Global Step: 202560   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:18:12,679-Speed 3280.14 samples/sec   Loss 1.3384   LearningRate 0.0034   Epoch: 16   Global Step: 202570   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:18:15,726-Speed 3361.20 samples/sec   Loss 1.3438   LearningRate 0.0034   Epoch: 16   Global Step: 202580   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:18:18,778-Speed 3355.89 samples/sec   Loss 1.4079   LearningRate 0.0034   Epoch: 16   Global Step: 202590   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:18:21,845-Speed 3339.97 samples/sec   Loss 1.3308   LearningRate 0.0034   Epoch: 16   Global Step: 202600   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:18:24,923-Speed 3328.00 samples/sec   Loss 1.3809   LearningRate 0.0034   Epoch: 16   Global Step: 202610   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:18:28,006-Speed 3323.23 samples/sec   Loss 1.3558   LearningRate 0.0034   Epoch: 16   Global Step: 202620   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:18:31,125-Speed 3283.71 samples/sec   Loss 1.3308   LearningRate 0.0034   Epoch: 16   Global Step: 202630   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:18:34,217-Speed 3313.00 samples/sec   Loss 1.3894   LearningRate 0.0034   Epoch: 16   Global Step: 202640   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:18:37,307-Speed 3314.65 samples/sec   Loss 1.3404   LearningRate 0.0034   Epoch: 16   Global Step: 202650   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:18:40,407-Speed 3303.87 samples/sec   Loss 1.3530   LearningRate 0.0034   Epoch: 16   Global Step: 202660   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:18:43,524-Speed 3286.54 samples/sec   Loss 1.3907   LearningRate 0.0034   Epoch: 16   Global Step: 202670   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:18:46,598-Speed 3332.70 samples/sec   Loss 1.3047   LearningRate 0.0034   Epoch: 16   Global Step: 202680   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:18:49,704-Speed 3298.07 samples/sec   Loss 1.3437   LearningRate 0.0034   Epoch: 16   Global Step: 202690   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:18:52,816-Speed 3290.90 samples/sec   Loss 1.3913   LearningRate 0.0034   Epoch: 16   Global Step: 202700   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:18:55,911-Speed 3310.26 samples/sec   Loss 1.3214   LearningRate 0.0034   Epoch: 16   Global Step: 202710   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:18:59,034-Speed 3279.95 samples/sec   Loss 1.3233   LearningRate 0.0034   Epoch: 16   Global Step: 202720   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:19:02,153-Speed 3283.61 samples/sec   Loss 1.3752   LearningRate 0.0034   Epoch: 16   Global Step: 202730   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:19:05,265-Speed 3291.31 samples/sec   Loss 1.3661   LearningRate 0.0034   Epoch: 16   Global Step: 202740   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:19:08,322-Speed 3351.32 samples/sec   Loss 1.3878   LearningRate 0.0034   Epoch: 16   Global Step: 202750   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:19:11,416-Speed 3310.47 samples/sec   Loss 1.3857   LearningRate 0.0034   Epoch: 16   Global Step: 202760   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:19:14,521-Speed 3299.82 samples/sec   Loss 1.3205   LearningRate 0.0034   Epoch: 16   Global Step: 202770   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:19:17,625-Speed 3300.14 samples/sec   Loss 1.3652   LearningRate 0.0034   Epoch: 16   Global Step: 202780   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:19:20,689-Speed 3342.77 samples/sec   Loss 1.3870   LearningRate 0.0034   Epoch: 16   Global Step: 202790   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:19:23,809-Speed 3283.05 samples/sec   Loss 1.3421   LearningRate 0.0034   Epoch: 16   Global Step: 202800   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:19:26,905-Speed 3308.54 samples/sec   Loss 1.3422   LearningRate 0.0034   Epoch: 16   Global Step: 202810   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:19:30,045-Speed 3261.58 samples/sec   Loss 1.3697   LearningRate 0.0034   Epoch: 16   Global Step: 202820   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:19:33,098-Speed 3355.80 samples/sec   Loss 1.3244   LearningRate 0.0034   Epoch: 16   Global Step: 202830   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:19:36,157-Speed 3348.44 samples/sec   Loss 1.3394   LearningRate 0.0034   Epoch: 16   Global Step: 202840   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:19:39,220-Speed 3343.92 samples/sec   Loss 1.3534   LearningRate 0.0034   Epoch: 16   Global Step: 202850   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:19:42,390-Speed 3231.63 samples/sec   Loss 1.3501   LearningRate 0.0034   Epoch: 16   Global Step: 202860   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:19:45,441-Speed 3357.34 samples/sec   Loss 1.3524   LearningRate 0.0034   Epoch: 16   Global Step: 202870   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:19:48,543-Speed 3302.25 samples/sec   Loss 1.3525   LearningRate 0.0034   Epoch: 16   Global Step: 202880   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:19:51,630-Speed 3318.09 samples/sec   Loss 1.3741   LearningRate 0.0034   Epoch: 16   Global Step: 202890   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:19:54,767-Speed 3266.05 samples/sec   Loss 1.3459   LearningRate 0.0034   Epoch: 16   Global Step: 202900   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:19:57,847-Speed 3325.60 samples/sec   Loss 1.3201   LearningRate 0.0034   Epoch: 16   Global Step: 202910   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:20:00,959-Speed 3291.87 samples/sec   Loss 1.3798   LearningRate 0.0034   Epoch: 16   Global Step: 202920   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:20:04,069-Speed 3292.89 samples/sec   Loss 1.3584   LearningRate 0.0034   Epoch: 16   Global Step: 202930   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:20:07,191-Speed 3281.69 samples/sec   Loss 1.3647   LearningRate 0.0034   Epoch: 16   Global Step: 202940   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:20:10,292-Speed 3302.59 samples/sec   Loss 1.3716   LearningRate 0.0034   Epoch: 16   Global Step: 202950   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:20:13,425-Speed 3270.06 samples/sec   Loss 1.3036   LearningRate 0.0033   Epoch: 16   Global Step: 202960   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:20:16,629-Speed 3197.19 samples/sec   Loss 1.3227   LearningRate 0.0033   Epoch: 16   Global Step: 202970   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:20:19,714-Speed 3319.51 samples/sec   Loss 1.3397   LearningRate 0.0033   Epoch: 16   Global Step: 202980   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:20:22,810-Speed 3308.46 samples/sec   Loss 1.3049   LearningRate 0.0033   Epoch: 16   Global Step: 202990   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:20:25,907-Speed 3308.49 samples/sec   Loss 1.3865   LearningRate 0.0033   Epoch: 16   Global Step: 203000   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:20:29,099-Speed 3208.20 samples/sec   Loss 1.4458   LearningRate 0.0033   Epoch: 16   Global Step: 203010   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:20:32,241-Speed 3260.73 samples/sec   Loss 1.4146   LearningRate 0.0033   Epoch: 16   Global Step: 203020   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:20:35,337-Speed 3308.54 samples/sec   Loss 1.3939   LearningRate 0.0033   Epoch: 16   Global Step: 203030   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:20:38,397-Speed 3347.49 samples/sec   Loss 1.3581   LearningRate 0.0033   Epoch: 16   Global Step: 203040   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:20:41,517-Speed 3282.32 samples/sec   Loss 1.4055   LearningRate 0.0033   Epoch: 16   Global Step: 203050   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:20:44,608-Speed 3314.76 samples/sec   Loss 1.4011   LearningRate 0.0033   Epoch: 16   Global Step: 203060   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:20:47,708-Speed 3303.51 samples/sec   Loss 1.3445   LearningRate 0.0033   Epoch: 16   Global Step: 203070   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:20:50,840-Speed 3271.14 samples/sec   Loss 1.3657   LearningRate 0.0033   Epoch: 16   Global Step: 203080   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:20:53,913-Speed 3334.92 samples/sec   Loss 1.4430   LearningRate 0.0033   Epoch: 16   Global Step: 203090   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:20:56,960-Speed 3361.49 samples/sec   Loss 1.4319   LearningRate 0.0033   Epoch: 16   Global Step: 203100   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:21:00,120-Speed 3240.80 samples/sec   Loss 1.3846   LearningRate 0.0033   Epoch: 16   Global Step: 203110   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:21:03,220-Speed 3304.76 samples/sec   Loss 1.3372   LearningRate 0.0033   Epoch: 16   Global Step: 203120   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:21:06,348-Speed 3274.39 samples/sec   Loss 1.3429   LearningRate 0.0033   Epoch: 16   Global Step: 203130   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:21:09,408-Speed 3347.34 samples/sec   Loss 1.3248   LearningRate 0.0033   Epoch: 16   Global Step: 203140   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:21:12,506-Speed 3307.48 samples/sec   Loss 1.3452   LearningRate 0.0033   Epoch: 16   Global Step: 203150   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:21:15,574-Speed 3337.98 samples/sec   Loss 1.3786   LearningRate 0.0033   Epoch: 16   Global Step: 203160   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:21:18,710-Speed 3267.16 samples/sec   Loss 1.3348   LearningRate 0.0033   Epoch: 16   Global Step: 203170   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:21:21,764-Speed 3352.97 samples/sec   Loss 1.3416   LearningRate 0.0033   Epoch: 16   Global Step: 203180   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:21:24,876-Speed 3291.71 samples/sec   Loss 1.3586   LearningRate 0.0033   Epoch: 16   Global Step: 203190   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:21:27,988-Speed 3292.29 samples/sec   Loss 1.4014   LearningRate 0.0033   Epoch: 16   Global Step: 203200   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:21:31,121-Speed 3269.23 samples/sec   Loss 1.3633   LearningRate 0.0033   Epoch: 16   Global Step: 203210   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:21:34,201-Speed 3325.24 samples/sec   Loss 1.3879   LearningRate 0.0033   Epoch: 16   Global Step: 203220   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:21:37,285-Speed 3322.18 samples/sec   Loss 1.3828   LearningRate 0.0033   Epoch: 16   Global Step: 203230   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:21:40,382-Speed 3306.97 samples/sec   Loss 1.3275   LearningRate 0.0033   Epoch: 16   Global Step: 203240   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:21:43,473-Speed 3313.72 samples/sec   Loss 1.3962   LearningRate 0.0033   Epoch: 16   Global Step: 203250   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:21:46,588-Speed 3289.07 samples/sec   Loss 1.3695   LearningRate 0.0033   Epoch: 16   Global Step: 203260   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:21:49,691-Speed 3300.70 samples/sec   Loss 1.4090   LearningRate 0.0033   Epoch: 16   Global Step: 203270   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:21:52,864-Speed 3228.52 samples/sec   Loss 1.3511   LearningRate 0.0033   Epoch: 16   Global Step: 203280   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:21:55,985-Speed 3282.27 samples/sec   Loss 1.3785   LearningRate 0.0033   Epoch: 16   Global Step: 203290   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:21:59,077-Speed 3311.64 samples/sec   Loss 1.3190   LearningRate 0.0033   Epoch: 16   Global Step: 203300   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:22:02,216-Speed 3263.42 samples/sec   Loss 1.4191   LearningRate 0.0033   Epoch: 16   Global Step: 203310   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:22:05,404-Speed 3212.91 samples/sec   Loss 1.3854   LearningRate 0.0033   Epoch: 16   Global Step: 203320   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:22:08,492-Speed 3317.41 samples/sec   Loss 1.3145   LearningRate 0.0033   Epoch: 16   Global Step: 203330   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:22:11,647-Speed 3246.27 samples/sec   Loss 1.3808   LearningRate 0.0033   Epoch: 16   Global Step: 203340   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:22:14,859-Speed 3189.77 samples/sec   Loss 1.3803   LearningRate 0.0033   Epoch: 16   Global Step: 203350   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:22:17,967-Speed 3295.68 samples/sec   Loss 1.4206   LearningRate 0.0033   Epoch: 16   Global Step: 203360   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:22:21,059-Speed 3312.34 samples/sec   Loss 1.3648   LearningRate 0.0033   Epoch: 16   Global Step: 203370   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:22:24,143-Speed 3322.15 samples/sec   Loss 1.3652   LearningRate 0.0033   Epoch: 16   Global Step: 203380   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:22:27,266-Speed 3279.85 samples/sec   Loss 1.3177   LearningRate 0.0033   Epoch: 16   Global Step: 203390   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:22:30,424-Speed 3243.07 samples/sec   Loss 1.3921   LearningRate 0.0033   Epoch: 16   Global Step: 203400   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:22:33,542-Speed 3285.42 samples/sec   Loss 1.3834   LearningRate 0.0033   Epoch: 16   Global Step: 203410   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:22:36,688-Speed 3256.15 samples/sec   Loss 1.4269   LearningRate 0.0033   Epoch: 16   Global Step: 203420   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:22:39,898-Speed 3190.33 samples/sec   Loss 1.3798   LearningRate 0.0033   Epoch: 16   Global Step: 203430   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:22:43,071-Speed 3228.77 samples/sec   Loss 1.4039   LearningRate 0.0033   Epoch: 16   Global Step: 203440   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:22:46,233-Speed 3238.94 samples/sec   Loss 1.3564   LearningRate 0.0033   Epoch: 16   Global Step: 203450   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:22:49,290-Speed 3351.21 samples/sec   Loss 1.3544   LearningRate 0.0033   Epoch: 16   Global Step: 203460   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:22:52,362-Speed 3333.72 samples/sec   Loss 1.3726   LearningRate 0.0033   Epoch: 16   Global Step: 203470   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:22:55,487-Speed 3278.12 samples/sec   Loss 1.4188   LearningRate 0.0033   Epoch: 16   Global Step: 203480   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:22:58,540-Speed 3355.25 samples/sec   Loss 1.3470   LearningRate 0.0033   Epoch: 16   Global Step: 203490   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:23:01,611-Speed 3335.33 samples/sec   Loss 1.3985   LearningRate 0.0033   Epoch: 16   Global Step: 203500   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:23:04,744-Speed 3270.00 samples/sec   Loss 1.3912   LearningRate 0.0033   Epoch: 16   Global Step: 203510   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:23:07,850-Speed 3297.27 samples/sec   Loss 1.3822   LearningRate 0.0033   Epoch: 16   Global Step: 203520   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:23:10,924-Speed 3332.96 samples/sec   Loss 1.3257   LearningRate 0.0033   Epoch: 16   Global Step: 203530   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:23:14,055-Speed 3271.07 samples/sec   Loss 1.3611   LearningRate 0.0033   Epoch: 16   Global Step: 203540   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:23:17,193-Speed 3264.39 samples/sec   Loss 1.3030   LearningRate 0.0033   Epoch: 16   Global Step: 203550   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:23:20,257-Speed 3343.53 samples/sec   Loss 1.3549   LearningRate 0.0033   Epoch: 16   Global Step: 203560   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:23:23,409-Speed 3249.61 samples/sec   Loss 1.3741   LearningRate 0.0033   Epoch: 16   Global Step: 203570   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:23:26,598-Speed 3212.08 samples/sec   Loss 1.3764   LearningRate 0.0033   Epoch: 16   Global Step: 203580   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:23:29,766-Speed 3233.31 samples/sec   Loss 1.3721   LearningRate 0.0033   Epoch: 16   Global Step: 203590   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:23:32,842-Speed 3329.62 samples/sec   Loss 1.3913   LearningRate 0.0033   Epoch: 16   Global Step: 203600   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:23:35,979-Speed 3266.08 samples/sec   Loss 1.3265   LearningRate 0.0033   Epoch: 16   Global Step: 203610   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:23:39,083-Speed 3298.99 samples/sec   Loss 1.4528   LearningRate 0.0033   Epoch: 16   Global Step: 203620   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:23:42,213-Speed 3273.67 samples/sec   Loss 1.3361   LearningRate 0.0033   Epoch: 16   Global Step: 203630   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:23:45,293-Speed 3326.01 samples/sec   Loss 1.3334   LearningRate 0.0032   Epoch: 16   Global Step: 203640   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:23:48,439-Speed 3255.20 samples/sec   Loss 1.3764   LearningRate 0.0032   Epoch: 16   Global Step: 203650   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:23:51,558-Speed 3283.94 samples/sec   Loss 1.4145   LearningRate 0.0032   Epoch: 16   Global Step: 203660   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:23:54,700-Speed 3260.21 samples/sec   Loss 1.4251   LearningRate 0.0032   Epoch: 16   Global Step: 203670   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:23:57,806-Speed 3298.22 samples/sec   Loss 1.3296   LearningRate 0.0032   Epoch: 16   Global Step: 203680   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:24:00,918-Speed 3291.30 samples/sec   Loss 1.3624   LearningRate 0.0032   Epoch: 16   Global Step: 203690   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:24:04,060-Speed 3260.12 samples/sec   Loss 1.3654   LearningRate 0.0032   Epoch: 16   Global Step: 203700   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:24:07,237-Speed 3223.55 samples/sec   Loss 1.3338   LearningRate 0.0032   Epoch: 16   Global Step: 203710   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:24:10,330-Speed 3312.26 samples/sec   Loss 1.3730   LearningRate 0.0032   Epoch: 16   Global Step: 203720   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:24:13,437-Speed 3296.70 samples/sec   Loss 1.3594   LearningRate 0.0032   Epoch: 16   Global Step: 203730   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:24:16,634-Speed 3204.40 samples/sec   Loss 1.3518   LearningRate 0.0032   Epoch: 16   Global Step: 203740   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:24:19,830-Speed 3204.84 samples/sec   Loss 1.3852   LearningRate 0.0032   Epoch: 16   Global Step: 203750   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:24:22,910-Speed 3324.83 samples/sec   Loss 1.3624   LearningRate 0.0032   Epoch: 16   Global Step: 203760   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:24:26,074-Speed 3237.44 samples/sec   Loss 1.3460   LearningRate 0.0032   Epoch: 16   Global Step: 203770   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:24:29,251-Speed 3224.65 samples/sec   Loss 1.3446   LearningRate 0.0032   Epoch: 16   Global Step: 203780   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:24:32,325-Speed 3332.46 samples/sec   Loss 1.3381   LearningRate 0.0032   Epoch: 16   Global Step: 203790   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:24:35,509-Speed 3218.27 samples/sec   Loss 1.3826   LearningRate 0.0032   Epoch: 16   Global Step: 203800   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:24:38,635-Speed 3276.91 samples/sec   Loss 1.3590   LearningRate 0.0032   Epoch: 16   Global Step: 203810   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:24:41,716-Speed 3324.10 samples/sec   Loss 1.3445   LearningRate 0.0032   Epoch: 16   Global Step: 203820   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:24:44,834-Speed 3285.57 samples/sec   Loss 1.3933   LearningRate 0.0032   Epoch: 16   Global Step: 203830   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:24:47,952-Speed 3285.52 samples/sec   Loss 1.4171   LearningRate 0.0032   Epoch: 16   Global Step: 203840   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:24:51,025-Speed 3333.36 samples/sec   Loss 1.3509   LearningRate 0.0032   Epoch: 16   Global Step: 203850   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:24:54,132-Speed 3296.07 samples/sec   Loss 1.3945   LearningRate 0.0032   Epoch: 16   Global Step: 203860   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:24:57,205-Speed 3333.13 samples/sec   Loss 1.3831   LearningRate 0.0032   Epoch: 16   Global Step: 203870   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 19:25:00,281-Speed 3330.11 samples/sec   Loss 1.3912   LearningRate 0.0032   Epoch: 16   Global Step: 203880   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:25:03,386-Speed 3299.58 samples/sec   Loss 1.3678   LearningRate 0.0032   Epoch: 16   Global Step: 203890   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:25:06,541-Speed 3246.07 samples/sec   Loss 1.3263   LearningRate 0.0032   Epoch: 16   Global Step: 203900   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:25:09,647-Speed 3298.45 samples/sec   Loss 1.3271   LearningRate 0.0032   Epoch: 16   Global Step: 203910   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:25:12,743-Speed 3308.01 samples/sec   Loss 1.3939   LearningRate 0.0032   Epoch: 16   Global Step: 203920   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:25:15,953-Speed 3191.22 samples/sec   Loss 1.4053   LearningRate 0.0032   Epoch: 16   Global Step: 203930   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:25:19,036-Speed 3322.77 samples/sec   Loss 1.3244   LearningRate 0.0032   Epoch: 16   Global Step: 203940   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:25:22,096-Speed 3347.93 samples/sec   Loss 1.4159   LearningRate 0.0032   Epoch: 16   Global Step: 203950   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:25:25,229-Speed 3269.75 samples/sec   Loss 1.3294   LearningRate 0.0032   Epoch: 16   Global Step: 203960   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:25:28,339-Speed 3293.50 samples/sec   Loss 1.3835   LearningRate 0.0032   Epoch: 16   Global Step: 203970   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:25:31,407-Speed 3338.59 samples/sec   Loss 1.3512   LearningRate 0.0032   Epoch: 16   Global Step: 203980   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:25:34,482-Speed 3330.29 samples/sec   Loss 1.3646   LearningRate 0.0032   Epoch: 16   Global Step: 203990   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:25:37,624-Speed 3260.26 samples/sec   Loss 1.3643   LearningRate 0.0032   Epoch: 16   Global Step: 204000   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:25:40,731-Speed 3296.83 samples/sec   Loss 1.3634   LearningRate 0.0032   Epoch: 16   Global Step: 204010   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:25:43,867-Speed 3266.71 samples/sec   Loss 1.3755   LearningRate 0.0032   Epoch: 16   Global Step: 204020   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:25:46,941-Speed 3332.80 samples/sec   Loss 1.4204   LearningRate 0.0032   Epoch: 16   Global Step: 204030   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:25:50,013-Speed 3334.26 samples/sec   Loss 1.3414   LearningRate 0.0032   Epoch: 16   Global Step: 204040   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:25:53,198-Speed 3215.62 samples/sec   Loss 1.3773   LearningRate 0.0032   Epoch: 16   Global Step: 204050   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:25:56,310-Speed 3291.42 samples/sec   Loss 1.3165   LearningRate 0.0032   Epoch: 16   Global Step: 204060   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:25:59,383-Speed 3333.34 samples/sec   Loss 1.3740   LearningRate 0.0032   Epoch: 16   Global Step: 204070   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:26:02,462-Speed 3326.42 samples/sec   Loss 1.3544   LearningRate 0.0032   Epoch: 16   Global Step: 204080   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:26:05,632-Speed 3231.91 samples/sec   Loss 1.3680   LearningRate 0.0032   Epoch: 16   Global Step: 204090   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:26:08,719-Speed 3318.58 samples/sec   Loss 1.3242   LearningRate 0.0032   Epoch: 16   Global Step: 204100   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:26:11,837-Speed 3284.43 samples/sec   Loss 1.3615   LearningRate 0.0032   Epoch: 16   Global Step: 204110   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:26:15,027-Speed 3210.54 samples/sec   Loss 1.3676   LearningRate 0.0032   Epoch: 16   Global Step: 204120   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:26:18,270-Speed 3158.52 samples/sec   Loss 1.3554   LearningRate 0.0032   Epoch: 16   Global Step: 204130   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:26:21,317-Speed 3362.13 samples/sec   Loss 1.3728   LearningRate 0.0032   Epoch: 16   Global Step: 204140   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:26:24,407-Speed 3314.75 samples/sec   Loss 1.3252   LearningRate 0.0032   Epoch: 16   Global Step: 204150   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:26:27,535-Speed 3275.52 samples/sec   Loss 1.3942   LearningRate 0.0032   Epoch: 16   Global Step: 204160   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:26:30,653-Speed 3284.99 samples/sec   Loss 1.3847   LearningRate 0.0032   Epoch: 16   Global Step: 204170   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:26:33,717-Speed 3342.85 samples/sec   Loss 1.3955   LearningRate 0.0032   Epoch: 16   Global Step: 204180   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:26:36,831-Speed 3289.65 samples/sec   Loss 1.3736   LearningRate 0.0032   Epoch: 16   Global Step: 204190   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:26:39,930-Speed 3304.53 samples/sec   Loss 1.3745   LearningRate 0.0032   Epoch: 16   Global Step: 204200   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:26:43,046-Speed 3288.14 samples/sec   Loss 1.3782   LearningRate 0.0032   Epoch: 16   Global Step: 204210   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:26:46,184-Speed 3264.56 samples/sec   Loss 1.3825   LearningRate 0.0032   Epoch: 16   Global Step: 204220   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:26:49,356-Speed 3228.74 samples/sec   Loss 1.3233   LearningRate 0.0032   Epoch: 16   Global Step: 204230   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:26:52,508-Speed 3249.58 samples/sec   Loss 1.3347   LearningRate 0.0032   Epoch: 16   Global Step: 204240   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:26:55,612-Speed 3299.83 samples/sec   Loss 1.3922   LearningRate 0.0032   Epoch: 16   Global Step: 204250   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:26:58,696-Speed 3322.42 samples/sec   Loss 1.3345   LearningRate 0.0032   Epoch: 16   Global Step: 204260   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:27:01,820-Speed 3278.46 samples/sec   Loss 1.3422   LearningRate 0.0032   Epoch: 16   Global Step: 204270   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:27:04,968-Speed 3253.54 samples/sec   Loss 1.3739   LearningRate 0.0032   Epoch: 16   Global Step: 204280   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:27:08,121-Speed 3248.49 samples/sec   Loss 1.3991   LearningRate 0.0032   Epoch: 16   Global Step: 204290   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:27:11,189-Speed 3339.33 samples/sec   Loss 1.3621   LearningRate 0.0032   Epoch: 16   Global Step: 204300   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:27:14,302-Speed 3290.38 samples/sec   Loss 1.3677   LearningRate 0.0032   Epoch: 16   Global Step: 204310   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:27:17,396-Speed 3309.69 samples/sec   Loss 1.3770   LearningRate 0.0032   Epoch: 16   Global Step: 204320   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:27:20,469-Speed 3334.30 samples/sec   Loss 1.4062   LearningRate 0.0031   Epoch: 16   Global Step: 204330   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:27:23,669-Speed 3200.59 samples/sec   Loss 1.3988   LearningRate 0.0031   Epoch: 16   Global Step: 204340   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:27:26,782-Speed 3290.61 samples/sec   Loss 1.3937   LearningRate 0.0031   Epoch: 16   Global Step: 204350   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:27:29,870-Speed 3316.94 samples/sec   Loss 1.3537   LearningRate 0.0031   Epoch: 16   Global Step: 204360   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:27:32,936-Speed 3340.48 samples/sec   Loss 1.3664   LearningRate 0.0031   Epoch: 16   Global Step: 204370   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:27:36,021-Speed 3320.31 samples/sec   Loss 1.3581   LearningRate 0.0031   Epoch: 16   Global Step: 204380   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:27:39,090-Speed 3337.60 samples/sec   Loss 1.3795   LearningRate 0.0031   Epoch: 16   Global Step: 204390   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:27:42,137-Speed 3361.15 samples/sec   Loss 1.4101   LearningRate 0.0031   Epoch: 16   Global Step: 204400   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:27:45,217-Speed 3326.34 samples/sec   Loss 1.3196   LearningRate 0.0031   Epoch: 16   Global Step: 204410   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:27:48,328-Speed 3292.38 samples/sec   Loss 1.4079   LearningRate 0.0031   Epoch: 16   Global Step: 204420   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:27:51,516-Speed 3212.67 samples/sec   Loss 1.4010   LearningRate 0.0031   Epoch: 16   Global Step: 204430   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:27:54,630-Speed 3289.09 samples/sec   Loss 1.4278   LearningRate 0.0031   Epoch: 16   Global Step: 204440   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:27:57,747-Speed 3286.56 samples/sec   Loss 1.3800   LearningRate 0.0031   Epoch: 16   Global Step: 204450   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:28:00,836-Speed 3316.85 samples/sec   Loss 1.3613   LearningRate 0.0031   Epoch: 16   Global Step: 204460   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:28:03,928-Speed 3312.15 samples/sec   Loss 1.3753   LearningRate 0.0031   Epoch: 16   Global Step: 204470   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:28:07,067-Speed 3263.60 samples/sec   Loss 1.3458   LearningRate 0.0031   Epoch: 16   Global Step: 204480   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:28:10,132-Speed 3341.96 samples/sec   Loss 1.3816   LearningRate 0.0031   Epoch: 16   Global Step: 204490   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:28:13,318-Speed 3214.93 samples/sec   Loss 1.3768   LearningRate 0.0031   Epoch: 16   Global Step: 204500   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:28:16,403-Speed 3319.95 samples/sec   Loss 1.3639   LearningRate 0.0031   Epoch: 16   Global Step: 204510   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:28:19,554-Speed 3250.22 samples/sec   Loss 1.4085   LearningRate 0.0031   Epoch: 16   Global Step: 204520   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:28:22,627-Speed 3333.91 samples/sec   Loss 1.3823   LearningRate 0.0031   Epoch: 16   Global Step: 204530   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:28:25,720-Speed 3311.51 samples/sec   Loss 1.3514   LearningRate 0.0031   Epoch: 16   Global Step: 204540   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:28:28,832-Speed 3292.41 samples/sec   Loss 1.3878   LearningRate 0.0031   Epoch: 16   Global Step: 204550   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:28:31,946-Speed 3288.94 samples/sec   Loss 1.4053   LearningRate 0.0031   Epoch: 16   Global Step: 204560   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:28:35,055-Speed 3295.10 samples/sec   Loss 1.3846   LearningRate 0.0031   Epoch: 16   Global Step: 204570   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:28:38,207-Speed 3249.76 samples/sec   Loss 1.3638   LearningRate 0.0031   Epoch: 16   Global Step: 204580   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:28:41,356-Speed 3252.78 samples/sec   Loss 1.3754   LearningRate 0.0031   Epoch: 16   Global Step: 204590   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:28:44,422-Speed 3340.65 samples/sec   Loss 1.3575   LearningRate 0.0031   Epoch: 16   Global Step: 204600   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:28:47,516-Speed 3311.04 samples/sec   Loss 1.3538   LearningRate 0.0031   Epoch: 16   Global Step: 204610   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:28:50,597-Speed 3324.42 samples/sec   Loss 1.3344   LearningRate 0.0031   Epoch: 16   Global Step: 204620   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:28:53,702-Speed 3299.57 samples/sec   Loss 1.3443   LearningRate 0.0031   Epoch: 16   Global Step: 204630   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:28:56,831-Speed 3273.40 samples/sec   Loss 1.3800   LearningRate 0.0031   Epoch: 16   Global Step: 204640   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:28:59,911-Speed 3325.76 samples/sec   Loss 1.3311   LearningRate 0.0031   Epoch: 16   Global Step: 204650   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:29:03,010-Speed 3304.99 samples/sec   Loss 1.3657   LearningRate 0.0031   Epoch: 16   Global Step: 204660   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:29:06,177-Speed 3234.44 samples/sec   Loss 1.3694   LearningRate 0.0031   Epoch: 16   Global Step: 204670   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:29:09,270-Speed 3312.09 samples/sec   Loss 1.4155   LearningRate 0.0031   Epoch: 16   Global Step: 204680   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:29:12,389-Speed 3284.09 samples/sec   Loss 1.4084   LearningRate 0.0031   Epoch: 16   Global Step: 204690   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:29:15,471-Speed 3323.33 samples/sec   Loss 1.3862   LearningRate 0.0031   Epoch: 16   Global Step: 204700   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:29:18,613-Speed 3260.57 samples/sec   Loss 1.3589   LearningRate 0.0031   Epoch: 16   Global Step: 204710   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:29:21,681-Speed 3338.74 samples/sec   Loss 1.3771   LearningRate 0.0031   Epoch: 16   Global Step: 204720   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:29:24,788-Speed 3296.91 samples/sec   Loss 1.3639   LearningRate 0.0031   Epoch: 16   Global Step: 204730   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:29:27,855-Speed 3339.46 samples/sec   Loss 1.3554   LearningRate 0.0031   Epoch: 16   Global Step: 204740   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:29:31,058-Speed 3197.66 samples/sec   Loss 1.3927   LearningRate 0.0031   Epoch: 16   Global Step: 204750   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:29:34,173-Speed 3288.29 samples/sec   Loss 1.3889   LearningRate 0.0031   Epoch: 16   Global Step: 204760   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:29:37,330-Speed 3244.67 samples/sec   Loss 1.3874   LearningRate 0.0031   Epoch: 16   Global Step: 204770   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:29:40,470-Speed 3261.56 samples/sec   Loss 1.3764   LearningRate 0.0031   Epoch: 16   Global Step: 204780   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:29:43,595-Speed 3277.98 samples/sec   Loss 1.4246   LearningRate 0.0031   Epoch: 16   Global Step: 204790   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:29:46,754-Speed 3242.59 samples/sec   Loss 1.3535   LearningRate 0.0031   Epoch: 16   Global Step: 204800   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:29:49,863-Speed 3294.85 samples/sec   Loss 1.3805   LearningRate 0.0031   Epoch: 16   Global Step: 204810   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:29:53,045-Speed 3219.38 samples/sec   Loss 1.4080   LearningRate 0.0031   Epoch: 16   Global Step: 204820   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:29:56,141-Speed 3308.46 samples/sec   Loss 1.3994   LearningRate 0.0031   Epoch: 16   Global Step: 204830   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:29:59,245-Speed 3299.48 samples/sec   Loss 1.3415   LearningRate 0.0031   Epoch: 16   Global Step: 204840   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:30:02,353-Speed 3296.03 samples/sec   Loss 1.4251   LearningRate 0.0031   Epoch: 16   Global Step: 204850   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:30:05,472-Speed 3284.77 samples/sec   Loss 1.3690   LearningRate 0.0031   Epoch: 16   Global Step: 204860   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:30:08,598-Speed 3276.79 samples/sec   Loss 1.3863   LearningRate 0.0031   Epoch: 16   Global Step: 204870   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:30:11,693-Speed 3308.51 samples/sec   Loss 1.3344   LearningRate 0.0031   Epoch: 16   Global Step: 204880   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:30:14,786-Speed 3312.50 samples/sec   Loss 1.3871   LearningRate 0.0031   Epoch: 16   Global Step: 204890   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:30:17,923-Speed 3265.93 samples/sec   Loss 1.3533   LearningRate 0.0031   Epoch: 16   Global Step: 204900   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:30:21,005-Speed 3322.44 samples/sec   Loss 1.3126   LearningRate 0.0031   Epoch: 16   Global Step: 204910   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:30:24,108-Speed 3301.33 samples/sec   Loss 1.3833   LearningRate 0.0031   Epoch: 16   Global Step: 204920   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:30:27,260-Speed 3249.98 samples/sec   Loss 1.4358   LearningRate 0.0031   Epoch: 16   Global Step: 204930   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:30:30,398-Speed 3264.39 samples/sec   Loss 1.4338   LearningRate 0.0031   Epoch: 16   Global Step: 204940   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:30:33,486-Speed 3316.95 samples/sec   Loss 1.3405   LearningRate 0.0031   Epoch: 16   Global Step: 204950   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:30:36,562-Speed 3330.27 samples/sec   Loss 1.3894   LearningRate 0.0031   Epoch: 16   Global Step: 204960   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:30:39,675-Speed 3289.96 samples/sec   Loss 1.3374   LearningRate 0.0031   Epoch: 16   Global Step: 204970   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:30:42,795-Speed 3282.99 samples/sec   Loss 1.3849   LearningRate 0.0031   Epoch: 16   Global Step: 204980   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:30:45,901-Speed 3298.22 samples/sec   Loss 1.4211   LearningRate 0.0031   Epoch: 16   Global Step: 204990   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:30:49,016-Speed 3288.18 samples/sec   Loss 1.4149   LearningRate 0.0031   Epoch: 16   Global Step: 205000   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:30:52,187-Speed 3230.28 samples/sec   Loss 1.3348   LearningRate 0.0031   Epoch: 16   Global Step: 205010   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:30:55,291-Speed 3300.37 samples/sec   Loss 1.3921   LearningRate 0.0031   Epoch: 16   Global Step: 205020   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:30:58,460-Speed 3232.17 samples/sec   Loss 1.3825   LearningRate 0.0031   Epoch: 16   Global Step: 205030   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:31:01,572-Speed 3291.42 samples/sec   Loss 1.4395   LearningRate 0.0030   Epoch: 16   Global Step: 205040   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:31:04,628-Speed 3351.10 samples/sec   Loss 1.3981   LearningRate 0.0030   Epoch: 16   Global Step: 205050   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:31:07,701-Speed 3333.64 samples/sec   Loss 1.3134   LearningRate 0.0030   Epoch: 16   Global Step: 205060   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:31:10,787-Speed 3319.08 samples/sec   Loss 1.4401   LearningRate 0.0030   Epoch: 16   Global Step: 205070   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:31:13,896-Speed 3295.32 samples/sec   Loss 1.3616   LearningRate 0.0030   Epoch: 16   Global Step: 205080   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:31:17,054-Speed 3243.10 samples/sec   Loss 1.3926   LearningRate 0.0030   Epoch: 16   Global Step: 205090   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:31:20,159-Speed 3298.27 samples/sec   Loss 1.3793   LearningRate 0.0030   Epoch: 16   Global Step: 205100   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:31:23,360-Speed 3200.12 samples/sec   Loss 1.3886   LearningRate 0.0030   Epoch: 16   Global Step: 205110   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:31:26,478-Speed 3285.98 samples/sec   Loss 1.3344   LearningRate 0.0030   Epoch: 16   Global Step: 205120   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:31:29,660-Speed 3218.43 samples/sec   Loss 1.3570   LearningRate 0.0030   Epoch: 16   Global Step: 205130   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:31:32,793-Speed 3270.39 samples/sec   Loss 1.4096   LearningRate 0.0030   Epoch: 16   Global Step: 205140   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:31:35,866-Speed 3332.85 samples/sec   Loss 1.4071   LearningRate 0.0030   Epoch: 16   Global Step: 205150   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:31:38,964-Speed 3306.25 samples/sec   Loss 1.3959   LearningRate 0.0030   Epoch: 16   Global Step: 205160   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:31:42,100-Speed 3266.34 samples/sec   Loss 1.3876   LearningRate 0.0030   Epoch: 16   Global Step: 205170   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:31:45,214-Speed 3289.68 samples/sec   Loss 1.3243   LearningRate 0.0030   Epoch: 16   Global Step: 205180   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:31:48,379-Speed 3236.24 samples/sec   Loss 1.3766   LearningRate 0.0030   Epoch: 16   Global Step: 205190   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:31:51,515-Speed 3265.52 samples/sec   Loss 1.3469   LearningRate 0.0030   Epoch: 16   Global Step: 205200   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:31:54,598-Speed 3323.16 samples/sec   Loss 1.3137   LearningRate 0.0030   Epoch: 16   Global Step: 205210   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:31:57,691-Speed 3312.10 samples/sec   Loss 1.3639   LearningRate 0.0030   Epoch: 16   Global Step: 205220   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:32:00,784-Speed 3311.35 samples/sec   Loss 1.3721   LearningRate 0.0030   Epoch: 16   Global Step: 205230   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:32:03,921-Speed 3265.31 samples/sec   Loss 1.3536   LearningRate 0.0030   Epoch: 16   Global Step: 205240   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:32:07,066-Speed 3257.26 samples/sec   Loss 1.3917   LearningRate 0.0030   Epoch: 16   Global Step: 205250   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:32:10,159-Speed 3311.20 samples/sec   Loss 1.3454   LearningRate 0.0030   Epoch: 16   Global Step: 205260   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:32:13,285-Speed 3277.16 samples/sec   Loss 1.3789   LearningRate 0.0030   Epoch: 16   Global Step: 205270   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:32:16,348-Speed 3344.64 samples/sec   Loss 1.4084   LearningRate 0.0030   Epoch: 16   Global Step: 205280   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:32:19,493-Speed 3256.20 samples/sec   Loss 1.3583   LearningRate 0.0030   Epoch: 16   Global Step: 205290   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:32:22,570-Speed 3329.54 samples/sec   Loss 1.4281   LearningRate 0.0030   Epoch: 16   Global Step: 205300   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:32:25,726-Speed 3244.97 samples/sec   Loss 1.4000   LearningRate 0.0030   Epoch: 16   Global Step: 205310   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:32:28,885-Speed 3242.65 samples/sec   Loss 1.3780   LearningRate 0.0030   Epoch: 16   Global Step: 205320   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:32:31,961-Speed 3330.26 samples/sec   Loss 1.3427   LearningRate 0.0030   Epoch: 16   Global Step: 205330   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:32:35,025-Speed 3343.42 samples/sec   Loss 1.3258   LearningRate 0.0030   Epoch: 16   Global Step: 205340   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:32:38,158-Speed 3269.36 samples/sec   Loss 1.4051   LearningRate 0.0030   Epoch: 16   Global Step: 205350   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:32:41,235-Speed 3328.12 samples/sec   Loss 1.3456   LearningRate 0.0030   Epoch: 16   Global Step: 205360   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:32:44,354-Speed 3284.27 samples/sec   Loss 1.3893   LearningRate 0.0030   Epoch: 16   Global Step: 205370   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:32:47,421-Speed 3340.20 samples/sec   Loss 1.3618   LearningRate 0.0030   Epoch: 16   Global Step: 205380   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:32:50,563-Speed 3260.30 samples/sec   Loss 1.3525   LearningRate 0.0030   Epoch: 16   Global Step: 205390   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:32:53,744-Speed 3219.75 samples/sec   Loss 1.3666   LearningRate 0.0030   Epoch: 16   Global Step: 205400   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:32:56,820-Speed 3329.92 samples/sec   Loss 1.3703   LearningRate 0.0030   Epoch: 16   Global Step: 205410   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:32:59,935-Speed 3288.66 samples/sec   Loss 1.3767   LearningRate 0.0030   Epoch: 16   Global Step: 205420   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:33:03,105-Speed 3231.19 samples/sec   Loss 1.4040   LearningRate 0.0030   Epoch: 16   Global Step: 205430   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:33:06,282-Speed 3223.91 samples/sec   Loss 1.3590   LearningRate 0.0030   Epoch: 16   Global Step: 205440   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:33:09,371-Speed 3316.35 samples/sec   Loss 1.3717   LearningRate 0.0030   Epoch: 16   Global Step: 205450   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:33:12,490-Speed 3283.51 samples/sec   Loss 1.3895   LearningRate 0.0030   Epoch: 16   Global Step: 205460   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:33:15,608-Speed 3286.12 samples/sec   Loss 1.4132   LearningRate 0.0030   Epoch: 16   Global Step: 205470   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:33:18,815-Speed 3193.49 samples/sec   Loss 1.3926   LearningRate 0.0030   Epoch: 16   Global Step: 205480   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:33:21,922-Speed 3297.07 samples/sec   Loss 1.3426   LearningRate 0.0030   Epoch: 16   Global Step: 205490   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:33:24,995-Speed 3332.48 samples/sec   Loss 1.3842   LearningRate 0.0030   Epoch: 16   Global Step: 205500   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:33:28,189-Speed 3207.05 samples/sec   Loss 1.3880   LearningRate 0.0030   Epoch: 16   Global Step: 205510   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:33:31,281-Speed 3312.52 samples/sec   Loss 1.3554   LearningRate 0.0030   Epoch: 16   Global Step: 205520   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:33:34,365-Speed 3322.33 samples/sec   Loss 1.3301   LearningRate 0.0030   Epoch: 16   Global Step: 205530   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:33:37,519-Speed 3246.97 samples/sec   Loss 1.3918   LearningRate 0.0030   Epoch: 16   Global Step: 205540   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:33:40,633-Speed 3289.26 samples/sec   Loss 1.3530   LearningRate 0.0030   Epoch: 16   Global Step: 205550   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:33:43,815-Speed 3219.26 samples/sec   Loss 1.3434   LearningRate 0.0030   Epoch: 16   Global Step: 205560   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:33:46,926-Speed 3293.33 samples/sec   Loss 1.3516   LearningRate 0.0030   Epoch: 16   Global Step: 205570   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:33:50,070-Speed 3257.52 samples/sec   Loss 1.3668   LearningRate 0.0030   Epoch: 16   Global Step: 205580   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:33:53,274-Speed 3197.27 samples/sec   Loss 1.3514   LearningRate 0.0030   Epoch: 16   Global Step: 205590   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:33:56,347-Speed 3332.91 samples/sec   Loss 1.3769   LearningRate 0.0030   Epoch: 16   Global Step: 205600   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:33:59,469-Speed 3281.03 samples/sec   Loss 1.3883   LearningRate 0.0030   Epoch: 16   Global Step: 205610   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:34:02,626-Speed 3244.86 samples/sec   Loss 1.3637   LearningRate 0.0030   Epoch: 16   Global Step: 205620   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:34:05,724-Speed 3306.37 samples/sec   Loss 1.3726   LearningRate 0.0030   Epoch: 16   Global Step: 205630   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:34:08,816-Speed 3313.35 samples/sec   Loss 1.4309   LearningRate 0.0030   Epoch: 16   Global Step: 205640   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:34:11,978-Speed 3239.81 samples/sec   Loss 1.3691   LearningRate 0.0030   Epoch: 16   Global Step: 205650   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:34:15,071-Speed 3310.88 samples/sec   Loss 1.3783   LearningRate 0.0030   Epoch: 16   Global Step: 205660   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:34:18,219-Speed 3254.31 samples/sec   Loss 1.3339   LearningRate 0.0030   Epoch: 16   Global Step: 205670   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:34:21,354-Speed 3267.61 samples/sec   Loss 1.3619   LearningRate 0.0030   Epoch: 16   Global Step: 205680   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:34:24,497-Speed 3258.76 samples/sec   Loss 1.3701   LearningRate 0.0030   Epoch: 16   Global Step: 205690   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:34:27,635-Speed 3263.51 samples/sec   Loss 1.3515   LearningRate 0.0030   Epoch: 16   Global Step: 205700   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:34:30,739-Speed 3300.52 samples/sec   Loss 1.4006   LearningRate 0.0030   Epoch: 16   Global Step: 205710   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:34:33,857-Speed 3285.23 samples/sec   Loss 1.3954   LearningRate 0.0030   Epoch: 16   Global Step: 205720   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:34:36,946-Speed 3316.51 samples/sec   Loss 1.3568   LearningRate 0.0030   Epoch: 16   Global Step: 205730   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:34:40,041-Speed 3308.61 samples/sec   Loss 1.4052   LearningRate 0.0030   Epoch: 16   Global Step: 205740   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:34:43,163-Speed 3281.80 samples/sec   Loss 1.3524   LearningRate 0.0030   Epoch: 16   Global Step: 205750   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:34:46,257-Speed 3310.11 samples/sec   Loss 1.3716   LearningRate 0.0029   Epoch: 16   Global Step: 205760   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:34:49,376-Speed 3284.17 samples/sec   Loss 1.3969   LearningRate 0.0029   Epoch: 16   Global Step: 205770   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:34:52,491-Speed 3287.92 samples/sec   Loss 1.4042   LearningRate 0.0029   Epoch: 16   Global Step: 205780   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:34:55,604-Speed 3290.77 samples/sec   Loss 1.3606   LearningRate 0.0029   Epoch: 16   Global Step: 205790   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:34:58,692-Speed 3316.97 samples/sec   Loss 1.3362   LearningRate 0.0029   Epoch: 16   Global Step: 205800   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:35:01,819-Speed 3276.43 samples/sec   Loss 1.3540   LearningRate 0.0029   Epoch: 16   Global Step: 205810   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:35:04,934-Speed 3287.83 samples/sec   Loss 1.3490   LearningRate 0.0029   Epoch: 16   Global Step: 205820   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:35:08,091-Speed 3244.74 samples/sec   Loss 1.3573   LearningRate 0.0029   Epoch: 16   Global Step: 205830   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:35:11,180-Speed 3315.63 samples/sec   Loss 1.3720   LearningRate 0.0029   Epoch: 16   Global Step: 205840   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:35:14,253-Speed 3333.36 samples/sec   Loss 1.4333   LearningRate 0.0029   Epoch: 16   Global Step: 205850   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:35:17,394-Speed 3261.30 samples/sec   Loss 1.3712   LearningRate 0.0029   Epoch: 16   Global Step: 205860   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:35:20,482-Speed 3317.58 samples/sec   Loss 1.4188   LearningRate 0.0029   Epoch: 16   Global Step: 205870   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:35:23,606-Speed 3278.81 samples/sec   Loss 1.3598   LearningRate 0.0029   Epoch: 16   Global Step: 205880   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:35:26,757-Speed 3251.00 samples/sec   Loss 1.3793   LearningRate 0.0029   Epoch: 16   Global Step: 205890   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:35:29,924-Speed 3233.63 samples/sec   Loss 1.4166   LearningRate 0.0029   Epoch: 16   Global Step: 205900   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:35:33,049-Speed 3278.33 samples/sec   Loss 1.3423   LearningRate 0.0029   Epoch: 16   Global Step: 205910   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:35:36,217-Speed 3233.46 samples/sec   Loss 1.3467   LearningRate 0.0029   Epoch: 16   Global Step: 205920   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:35:39,319-Speed 3301.46 samples/sec   Loss 1.3710   LearningRate 0.0029   Epoch: 16   Global Step: 205930   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:35:42,397-Speed 3327.86 samples/sec   Loss 1.4208   LearningRate 0.0029   Epoch: 16   Global Step: 205940   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:35:45,471-Speed 3332.37 samples/sec   Loss 1.3856   LearningRate 0.0029   Epoch: 16   Global Step: 205950   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:35:48,558-Speed 3318.39 samples/sec   Loss 1.3811   LearningRate 0.0029   Epoch: 16   Global Step: 205960   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:35:51,640-Speed 3323.02 samples/sec   Loss 1.3958   LearningRate 0.0029   Epoch: 16   Global Step: 205970   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:35:54,722-Speed 3323.53 samples/sec   Loss 1.3282   LearningRate 0.0029   Epoch: 16   Global Step: 205980   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:35:57,857-Speed 3267.93 samples/sec   Loss 1.4024   LearningRate 0.0029   Epoch: 16   Global Step: 205990   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:36:00,982-Speed 3278.19 samples/sec   Loss 1.3363   LearningRate 0.0029   Epoch: 16   Global Step: 206000   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:36:04,141-Speed 3241.84 samples/sec   Loss 1.3837   LearningRate 0.0029   Epoch: 16   Global Step: 206010   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:36:07,277-Speed 3266.64 samples/sec   Loss 1.3882   LearningRate 0.0029   Epoch: 16   Global Step: 206020   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:36:10,369-Speed 3313.12 samples/sec   Loss 1.3611   LearningRate 0.0029   Epoch: 16   Global Step: 206030   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:36:13,481-Speed 3291.86 samples/sec   Loss 1.3627   LearningRate 0.0029   Epoch: 16   Global Step: 206040   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:36:16,634-Speed 3248.44 samples/sec   Loss 1.3578   LearningRate 0.0029   Epoch: 16   Global Step: 206050   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:36:19,746-Speed 3291.23 samples/sec   Loss 1.4148   LearningRate 0.0029   Epoch: 16   Global Step: 206060   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:36:22,893-Speed 3254.69 samples/sec   Loss 1.3399   LearningRate 0.0029   Epoch: 16   Global Step: 206070   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:36:25,987-Speed 3311.03 samples/sec   Loss 1.3795   LearningRate 0.0029   Epoch: 16   Global Step: 206080   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:36:29,093-Speed 3298.02 samples/sec   Loss 1.3761   LearningRate 0.0029   Epoch: 16   Global Step: 206090   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:36:32,198-Speed 3299.45 samples/sec   Loss 1.3990   LearningRate 0.0029   Epoch: 16   Global Step: 206100   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:36:35,292-Speed 3309.97 samples/sec   Loss 1.3654   LearningRate 0.0029   Epoch: 16   Global Step: 206110   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:36:38,443-Speed 3250.95 samples/sec   Loss 1.3605   LearningRate 0.0029   Epoch: 16   Global Step: 206120   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:36:41,592-Speed 3252.95 samples/sec   Loss 1.3583   LearningRate 0.0029   Epoch: 16   Global Step: 206130   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:36:44,682-Speed 3314.80 samples/sec   Loss 1.4042   LearningRate 0.0029   Epoch: 16   Global Step: 206140   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:36:47,769-Speed 3318.03 samples/sec   Loss 1.3613   LearningRate 0.0029   Epoch: 16   Global Step: 206150   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:36:50,834-Speed 3342.27 samples/sec   Loss 1.3211   LearningRate 0.0029   Epoch: 16   Global Step: 206160   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:36:54,054-Speed 3181.00 samples/sec   Loss 1.3518   LearningRate 0.0029   Epoch: 16   Global Step: 206170   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:36:57,142-Speed 3317.50 samples/sec   Loss 1.4579   LearningRate 0.0029   Epoch: 16   Global Step: 206180   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:37:00,265-Speed 3280.13 samples/sec   Loss 1.3364   LearningRate 0.0029   Epoch: 16   Global Step: 206190   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:37:03,359-Speed 3310.06 samples/sec   Loss 1.3508   LearningRate 0.0029   Epoch: 16   Global Step: 206200   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:37:06,551-Speed 3209.71 samples/sec   Loss 1.3839   LearningRate 0.0029   Epoch: 16   Global Step: 206210   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:37:09,637-Speed 3318.63 samples/sec   Loss 1.3495   LearningRate 0.0029   Epoch: 16   Global Step: 206220   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:37:12,740-Speed 3301.72 samples/sec   Loss 1.3299   LearningRate 0.0029   Epoch: 16   Global Step: 206230   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:37:15,872-Speed 3269.92 samples/sec   Loss 1.3952   LearningRate 0.0029   Epoch: 16   Global Step: 206240   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:37:19,011-Speed 3263.92 samples/sec   Loss 1.4030   LearningRate 0.0029   Epoch: 16   Global Step: 206250   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:37:22,082-Speed 3334.91 samples/sec   Loss 1.3498   LearningRate 0.0029   Epoch: 16   Global Step: 206260   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:37:25,207-Speed 3278.61 samples/sec   Loss 1.3374   LearningRate 0.0029   Epoch: 16   Global Step: 206270   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:37:28,310-Speed 3300.48 samples/sec   Loss 1.3862   LearningRate 0.0029   Epoch: 16   Global Step: 206280   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:37:31,388-Speed 3327.40 samples/sec   Loss 1.3887   LearningRate 0.0029   Epoch: 16   Global Step: 206290   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:37:34,479-Speed 3314.11 samples/sec   Loss 1.3786   LearningRate 0.0029   Epoch: 16   Global Step: 206300   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:37:37,620-Speed 3260.70 samples/sec   Loss 1.3567   LearningRate 0.0029   Epoch: 16   Global Step: 206310   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:37:40,738-Speed 3285.04 samples/sec   Loss 1.3770   LearningRate 0.0029   Epoch: 16   Global Step: 206320   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:37:43,852-Speed 3289.60 samples/sec   Loss 1.4107   LearningRate 0.0029   Epoch: 16   Global Step: 206330   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:37:46,937-Speed 3320.58 samples/sec   Loss 1.3296   LearningRate 0.0029   Epoch: 16   Global Step: 206340   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:37:50,087-Speed 3251.99 samples/sec   Loss 1.3932   LearningRate 0.0029   Epoch: 16   Global Step: 206350   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:37:53,238-Speed 3250.83 samples/sec   Loss 1.3854   LearningRate 0.0029   Epoch: 16   Global Step: 206360   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:37:56,390-Speed 3249.74 samples/sec   Loss 1.3759   LearningRate 0.0029   Epoch: 16   Global Step: 206370   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:37:59,523-Speed 3268.58 samples/sec   Loss 1.4160   LearningRate 0.0029   Epoch: 16   Global Step: 206380   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:38:02,629-Speed 3298.29 samples/sec   Loss 1.3699   LearningRate 0.0029   Epoch: 16   Global Step: 206390   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:38:05,768-Speed 3262.49 samples/sec   Loss 1.4352   LearningRate 0.0029   Epoch: 16   Global Step: 206400   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:38:08,857-Speed 3316.94 samples/sec   Loss 1.3891   LearningRate 0.0029   Epoch: 16   Global Step: 206410   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:38:11,962-Speed 3297.91 samples/sec   Loss 1.3576   LearningRate 0.0029   Epoch: 16   Global Step: 206420   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:38:15,053-Speed 3314.28 samples/sec   Loss 1.3448   LearningRate 0.0029   Epoch: 16   Global Step: 206430   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:38:18,171-Speed 3285.00 samples/sec   Loss 1.3546   LearningRate 0.0029   Epoch: 16   Global Step: 206440   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:38:21,251-Speed 3325.78 samples/sec   Loss 1.3887   LearningRate 0.0029   Epoch: 16   Global Step: 206450   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:38:24,379-Speed 3275.04 samples/sec   Loss 1.4042   LearningRate 0.0029   Epoch: 16   Global Step: 206460   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:38:27,484-Speed 3299.25 samples/sec   Loss 1.3875   LearningRate 0.0029   Epoch: 16   Global Step: 206470   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:38:30,577-Speed 3311.65 samples/sec   Loss 1.4069   LearningRate 0.0029   Epoch: 16   Global Step: 206480   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:38:33,670-Speed 3310.51 samples/sec   Loss 1.3366   LearningRate 0.0028   Epoch: 16   Global Step: 206490   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:38:36,822-Speed 3250.20 samples/sec   Loss 1.3424   LearningRate 0.0028   Epoch: 16   Global Step: 206500   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:38:39,973-Speed 3251.12 samples/sec   Loss 1.3578   LearningRate 0.0028   Epoch: 16   Global Step: 206510   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:38:43,066-Speed 3311.99 samples/sec   Loss 1.3318   LearningRate 0.0028   Epoch: 16   Global Step: 206520   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:38:46,202-Speed 3266.30 samples/sec   Loss 1.3742   LearningRate 0.0028   Epoch: 16   Global Step: 206530   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:38:49,320-Speed 3284.59 samples/sec   Loss 1.3462   LearningRate 0.0028   Epoch: 16   Global Step: 206540   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:38:52,433-Speed 3291.10 samples/sec   Loss 1.3785   LearningRate 0.0028   Epoch: 16   Global Step: 206550   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:38:55,535-Speed 3301.67 samples/sec   Loss 1.4118   LearningRate 0.0028   Epoch: 16   Global Step: 206560   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:38:58,669-Speed 3268.89 samples/sec   Loss 1.3726   LearningRate 0.0028   Epoch: 16   Global Step: 206570   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:39:01,797-Speed 3274.61 samples/sec   Loss 1.4344   LearningRate 0.0028   Epoch: 16   Global Step: 206580   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:39:04,912-Speed 3287.56 samples/sec   Loss 1.3991   LearningRate 0.0028   Epoch: 16   Global Step: 206590   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:39:08,007-Speed 3310.11 samples/sec   Loss 1.4042   LearningRate 0.0028   Epoch: 16   Global Step: 206600   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:39:11,105-Speed 3306.55 samples/sec   Loss 1.3693   LearningRate 0.0028   Epoch: 16   Global Step: 206610   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:39:14,243-Speed 3264.63 samples/sec   Loss 1.3134   LearningRate 0.0028   Epoch: 16   Global Step: 206620   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:39:17,338-Speed 3309.28 samples/sec   Loss 1.4024   LearningRate 0.0028   Epoch: 16   Global Step: 206630   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:39:20,430-Speed 3312.58 samples/sec   Loss 1.4044   LearningRate 0.0028   Epoch: 16   Global Step: 206640   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:39:23,561-Speed 3271.52 samples/sec   Loss 1.3745   LearningRate 0.0028   Epoch: 16   Global Step: 206650   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:39:26,670-Speed 3295.82 samples/sec   Loss 1.3753   LearningRate 0.0028   Epoch: 16   Global Step: 206660   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:39:29,806-Speed 3266.03 samples/sec   Loss 1.4145   LearningRate 0.0028   Epoch: 16   Global Step: 206670   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:39:32,945-Speed 3262.48 samples/sec   Loss 1.3852   LearningRate 0.0028   Epoch: 16   Global Step: 206680   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:39:36,041-Speed 3308.15 samples/sec   Loss 1.3661   LearningRate 0.0028   Epoch: 16   Global Step: 206690   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:39:39,157-Speed 3287.68 samples/sec   Loss 1.4364   LearningRate 0.0028   Epoch: 16   Global Step: 206700   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:39:42,317-Speed 3241.31 samples/sec   Loss 1.3753   LearningRate 0.0028   Epoch: 16   Global Step: 206710   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:39:45,402-Speed 3321.25 samples/sec   Loss 1.3465   LearningRate 0.0028   Epoch: 16   Global Step: 206720   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:39:48,480-Speed 3327.66 samples/sec   Loss 1.4117   LearningRate 0.0028   Epoch: 16   Global Step: 206730   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:39:51,589-Speed 3294.59 samples/sec   Loss 1.3808   LearningRate 0.0028   Epoch: 16   Global Step: 206740   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:39:54,798-Speed 3205.74 samples/sec   Loss 1.3842   LearningRate 0.0028   Epoch: 16   Global Step: 206750   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:39:57,866-Speed 3338.72 samples/sec   Loss 1.4021   LearningRate 0.0028   Epoch: 16   Global Step: 206760   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:40:01,030-Speed 3237.63 samples/sec   Loss 1.4062   LearningRate 0.0028   Epoch: 16   Global Step: 206770   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:40:04,193-Speed 3237.93 samples/sec   Loss 1.3672   LearningRate 0.0028   Epoch: 16   Global Step: 206780   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:40:07,348-Speed 3247.54 samples/sec   Loss 1.3484   LearningRate 0.0028   Epoch: 16   Global Step: 206790   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:40:10,411-Speed 3343.67 samples/sec   Loss 1.3781   LearningRate 0.0028   Epoch: 16   Global Step: 206800   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:40:13,517-Speed 3298.17 samples/sec   Loss 1.3953   LearningRate 0.0028   Epoch: 16   Global Step: 206810   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:40:16,645-Speed 3274.42 samples/sec   Loss 1.3974   LearningRate 0.0028   Epoch: 16   Global Step: 206820   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:40:19,719-Speed 3332.52 samples/sec   Loss 1.3811   LearningRate 0.0028   Epoch: 16   Global Step: 206830   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:40:22,807-Speed 3316.94 samples/sec   Loss 1.4026   LearningRate 0.0028   Epoch: 16   Global Step: 206840   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:40:25,964-Speed 3244.50 samples/sec   Loss 1.3736   LearningRate 0.0028   Epoch: 16   Global Step: 206850   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:40:29,121-Speed 3244.47 samples/sec   Loss 1.3281   LearningRate 0.0028   Epoch: 16   Global Step: 206860   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:40:32,244-Speed 3279.79 samples/sec   Loss 1.3831   LearningRate 0.0028   Epoch: 16   Global Step: 206870   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:40:35,356-Speed 3291.62 samples/sec   Loss 1.3979   LearningRate 0.0028   Epoch: 16   Global Step: 206880   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:40:38,483-Speed 3275.84 samples/sec   Loss 1.4293   LearningRate 0.0028   Epoch: 16   Global Step: 206890   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:40:41,623-Speed 3262.01 samples/sec   Loss 1.3344   LearningRate 0.0028   Epoch: 16   Global Step: 206900   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:40:44,721-Speed 3306.67 samples/sec   Loss 1.4220   LearningRate 0.0028   Epoch: 16   Global Step: 206910   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:40:47,788-Speed 3339.88 samples/sec   Loss 1.3727   LearningRate 0.0028   Epoch: 16   Global Step: 206920   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:40:50,873-Speed 3320.12 samples/sec   Loss 1.3458   LearningRate 0.0028   Epoch: 16   Global Step: 206930   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:40:54,000-Speed 3275.90 samples/sec   Loss 1.3644   LearningRate 0.0028   Epoch: 16   Global Step: 206940   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:40:57,074-Speed 3332.30 samples/sec   Loss 1.3973   LearningRate 0.0028   Epoch: 16   Global Step: 206950   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:41:00,134-Speed 3347.70 samples/sec   Loss 1.3602   LearningRate 0.0028   Epoch: 16   Global Step: 206960   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:41:03,207-Speed 3332.84 samples/sec   Loss 1.3674   LearningRate 0.0028   Epoch: 16   Global Step: 206970   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:41:06,359-Speed 3249.90 samples/sec   Loss 1.3404   LearningRate 0.0028   Epoch: 16   Global Step: 206980   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:41:09,436-Speed 3329.06 samples/sec   Loss 1.3695   LearningRate 0.0028   Epoch: 16   Global Step: 206990   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:41:12,559-Speed 3280.55 samples/sec   Loss 1.3643   LearningRate 0.0028   Epoch: 16   Global Step: 207000   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:41:15,780-Speed 3179.63 samples/sec   Loss 1.3902   LearningRate 0.0028   Epoch: 16   Global Step: 207010   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:41:18,882-Speed 3302.40 samples/sec   Loss 1.3527   LearningRate 0.0028   Epoch: 16   Global Step: 207020   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:41:21,986-Speed 3298.95 samples/sec   Loss 1.4275   LearningRate 0.0028   Epoch: 16   Global Step: 207030   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-04-27 19:41:25,043-Speed 3352.13 samples/sec   Loss 1.3722   LearningRate 0.0028   Epoch: 16   Global Step: 207040   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:41:28,207-Speed 3236.68 samples/sec   Loss 1.3621   LearningRate 0.0028   Epoch: 16   Global Step: 207050   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:41:31,282-Speed 3331.72 samples/sec   Loss 1.4244   LearningRate 0.0028   Epoch: 16   Global Step: 207060   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:41:34,399-Speed 3285.81 samples/sec   Loss 1.3626   LearningRate 0.0028   Epoch: 16   Global Step: 207070   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:41:37,559-Speed 3242.09 samples/sec   Loss 1.3345   LearningRate 0.0028   Epoch: 16   Global Step: 207080   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:41:40,749-Speed 3210.99 samples/sec   Loss 1.3761   LearningRate 0.0028   Epoch: 16   Global Step: 207090   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:41:43,855-Speed 3298.10 samples/sec   Loss 1.4183   LearningRate 0.0028   Epoch: 16   Global Step: 207100   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:41:46,954-Speed 3304.59 samples/sec   Loss 1.3485   LearningRate 0.0028   Epoch: 16   Global Step: 207110   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:41:50,070-Speed 3287.56 samples/sec   Loss 1.3866   LearningRate 0.0028   Epoch: 16   Global Step: 207120   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:41:53,182-Speed 3291.66 samples/sec   Loss 1.4350   LearningRate 0.0028   Epoch: 16   Global Step: 207130   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:41:56,268-Speed 3319.56 samples/sec   Loss 1.4441   LearningRate 0.0028   Epoch: 16   Global Step: 207140   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:41:59,357-Speed 3315.22 samples/sec   Loss 1.3956   LearningRate 0.0028   Epoch: 16   Global Step: 207150   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:42:02,464-Speed 3297.53 samples/sec   Loss 1.3312   LearningRate 0.0028   Epoch: 16   Global Step: 207160   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:42:05,573-Speed 3294.29 samples/sec   Loss 1.3240   LearningRate 0.0028   Epoch: 16   Global Step: 207170   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:42:08,620-Speed 3361.98 samples/sec   Loss 1.3701   LearningRate 0.0028   Epoch: 16   Global Step: 207180   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:42:11,737-Speed 3286.77 samples/sec   Loss 1.3609   LearningRate 0.0028   Epoch: 16   Global Step: 207190   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:42:14,885-Speed 3254.14 samples/sec   Loss 1.3666   LearningRate 0.0028   Epoch: 16   Global Step: 207200   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:42:17,965-Speed 3325.41 samples/sec   Loss 1.3447   LearningRate 0.0028   Epoch: 16   Global Step: 207210   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:42:21,047-Speed 3323.17 samples/sec   Loss 1.4001   LearningRate 0.0028   Epoch: 16   Global Step: 207220   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:42:24,122-Speed 3331.95 samples/sec   Loss 1.4115   LearningRate 0.0027   Epoch: 16   Global Step: 207230   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:42:27,306-Speed 3216.96 samples/sec   Loss 1.3707   LearningRate 0.0027   Epoch: 16   Global Step: 207240   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:42:30,444-Speed 3263.66 samples/sec   Loss 1.3558   LearningRate 0.0027   Epoch: 16   Global Step: 207250   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:42:33,529-Speed 3320.30 samples/sec   Loss 1.3627   LearningRate 0.0027   Epoch: 16   Global Step: 207260   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:42:36,663-Speed 3268.90 samples/sec   Loss 1.3759   LearningRate 0.0027   Epoch: 16   Global Step: 207270   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:42:39,744-Speed 3323.87 samples/sec   Loss 1.3359   LearningRate 0.0027   Epoch: 16   Global Step: 207280   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:42:42,846-Speed 3302.19 samples/sec   Loss 1.3704   LearningRate 0.0027   Epoch: 16   Global Step: 207290   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:42:45,932-Speed 3319.39 samples/sec   Loss 1.3779   LearningRate 0.0027   Epoch: 16   Global Step: 207300   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:42:49,044-Speed 3291.67 samples/sec   Loss 1.3703   LearningRate 0.0027   Epoch: 16   Global Step: 207310   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:42:52,247-Speed 3198.56 samples/sec   Loss 1.3613   LearningRate 0.0027   Epoch: 16   Global Step: 207320   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:42:55,352-Speed 3299.02 samples/sec   Loss 1.3856   LearningRate 0.0027   Epoch: 16   Global Step: 207330   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:42:58,469-Speed 3286.20 samples/sec   Loss 1.3675   LearningRate 0.0027   Epoch: 16   Global Step: 207340   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:43:01,662-Speed 3210.78 samples/sec   Loss 1.4030   LearningRate 0.0027   Epoch: 16   Global Step: 207350   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:43:04,746-Speed 3321.17 samples/sec   Loss 1.3482   LearningRate 0.0027   Epoch: 16   Global Step: 207360   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:43:07,819-Speed 3333.96 samples/sec   Loss 1.3853   LearningRate 0.0027   Epoch: 16   Global Step: 207370   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:43:10,859-Speed 3368.32 samples/sec   Loss 1.3624   LearningRate 0.0027   Epoch: 16   Global Step: 207380   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:43:13,989-Speed 3273.80 samples/sec   Loss 1.3605   LearningRate 0.0027   Epoch: 16   Global Step: 207390   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:43:17,116-Speed 3275.18 samples/sec   Loss 1.3408   LearningRate 0.0027   Epoch: 16   Global Step: 207400   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:43:20,219-Speed 3300.93 samples/sec   Loss 1.3723   LearningRate 0.0027   Epoch: 16   Global Step: 207410   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:43:23,311-Speed 3312.68 samples/sec   Loss 1.3979   LearningRate 0.0027   Epoch: 16   Global Step: 207420   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:43:26,387-Speed 3330.44 samples/sec   Loss 1.3169   LearningRate 0.0027   Epoch: 16   Global Step: 207430   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:43:29,484-Speed 3307.37 samples/sec   Loss 1.3303   LearningRate 0.0027   Epoch: 16   Global Step: 207440   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:43:32,570-Speed 3319.60 samples/sec   Loss 1.3775   LearningRate 0.0027   Epoch: 16   Global Step: 207450   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:43:35,698-Speed 3273.97 samples/sec   Loss 1.4161   LearningRate 0.0027   Epoch: 16   Global Step: 207460   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:43:38,834-Speed 3266.51 samples/sec   Loss 1.3960   LearningRate 0.0027   Epoch: 16   Global Step: 207470   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:43:41,914-Speed 3325.88 samples/sec   Loss 1.3466   LearningRate 0.0027   Epoch: 16   Global Step: 207480   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:43:45,021-Speed 3297.13 samples/sec   Loss 1.3453   LearningRate 0.0027   Epoch: 16   Global Step: 207490   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:43:48,127-Speed 3298.01 samples/sec   Loss 1.4027   LearningRate 0.0027   Epoch: 16   Global Step: 207500   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:43:51,272-Speed 3256.16 samples/sec   Loss 1.3743   LearningRate 0.0027   Epoch: 16   Global Step: 207510   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:43:54,346-Speed 3332.63 samples/sec   Loss 1.3502   LearningRate 0.0027   Epoch: 16   Global Step: 207520   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:43:57,435-Speed 3316.42 samples/sec   Loss 1.3630   LearningRate 0.0027   Epoch: 16   Global Step: 207530   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:44:00,624-Speed 3211.99 samples/sec   Loss 1.3745   LearningRate 0.0027   Epoch: 16   Global Step: 207540   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:44:03,733-Speed 3294.57 samples/sec   Loss 1.3494   LearningRate 0.0027   Epoch: 16   Global Step: 207550   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:44:06,936-Speed 3197.93 samples/sec   Loss 1.4171   LearningRate 0.0027   Epoch: 16   Global Step: 207560   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:44:10,049-Speed 3289.75 samples/sec   Loss 1.3464   LearningRate 0.0027   Epoch: 16   Global Step: 207570   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:44:13,134-Speed 3320.99 samples/sec   Loss 1.3731   LearningRate 0.0027   Epoch: 16   Global Step: 207580   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:44:16,313-Speed 3222.48 samples/sec   Loss 1.3767   LearningRate 0.0027   Epoch: 16   Global Step: 207590   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:44:19,441-Speed 3274.94 samples/sec   Loss 1.3836   LearningRate 0.0027   Epoch: 16   Global Step: 207600   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:44:22,521-Speed 3325.52 samples/sec   Loss 1.3258   LearningRate 0.0027   Epoch: 16   Global Step: 207610   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:44:25,690-Speed 3232.41 samples/sec   Loss 1.3574   LearningRate 0.0027   Epoch: 16   Global Step: 207620   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:44:28,771-Speed 3324.64 samples/sec   Loss 1.3990   LearningRate 0.0027   Epoch: 16   Global Step: 207630   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:44:31,865-Speed 3310.01 samples/sec   Loss 1.4055   LearningRate 0.0027   Epoch: 16   Global Step: 207640   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:44:35,020-Speed 3246.86 samples/sec   Loss 1.3811   LearningRate 0.0027   Epoch: 16   Global Step: 207650   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:44:38,114-Speed 3311.34 samples/sec   Loss 1.3934   LearningRate 0.0027   Epoch: 16   Global Step: 207660   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:44:41,189-Speed 3330.45 samples/sec   Loss 1.3586   LearningRate 0.0027   Epoch: 16   Global Step: 207670   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:44:44,295-Speed 3298.64 samples/sec   Loss 1.4191   LearningRate 0.0027   Epoch: 16   Global Step: 207680   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:44:47,418-Speed 3279.90 samples/sec   Loss 1.3557   LearningRate 0.0027   Epoch: 16   Global Step: 207690   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:44:50,519-Speed 3303.35 samples/sec   Loss 1.3799   LearningRate 0.0027   Epoch: 16   Global Step: 207700   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:44:53,618-Speed 3305.06 samples/sec   Loss 1.3806   LearningRate 0.0027   Epoch: 16   Global Step: 207710   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:44:56,722-Speed 3300.41 samples/sec   Loss 1.3366   LearningRate 0.0027   Epoch: 16   Global Step: 207720   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:44:59,838-Speed 3286.65 samples/sec   Loss 1.3501   LearningRate 0.0027   Epoch: 16   Global Step: 207730   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:45:02,950-Speed 3291.99 samples/sec   Loss 1.3995   LearningRate 0.0027   Epoch: 16   Global Step: 207740   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:45:06,032-Speed 3323.73 samples/sec   Loss 1.3097   LearningRate 0.0027   Epoch: 16   Global Step: 207750   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:45:09,135-Speed 3301.13 samples/sec   Loss 1.3563   LearningRate 0.0027   Epoch: 16   Global Step: 207760   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:45:12,293-Speed 3243.76 samples/sec   Loss 1.3799   LearningRate 0.0027   Epoch: 16   Global Step: 207770   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:45:15,416-Speed 3279.28 samples/sec   Loss 1.4311   LearningRate 0.0027   Epoch: 16   Global Step: 207780   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:45:18,559-Speed 3260.28 samples/sec   Loss 1.3951   LearningRate 0.0027   Epoch: 16   Global Step: 207790   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:45:21,649-Speed 3314.05 samples/sec   Loss 1.3526   LearningRate 0.0027   Epoch: 16   Global Step: 207800   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-04-27 19:45:24,798-Speed 3252.43 samples/sec   Loss 1.3697   LearningRate 0.0027   Epoch: 16   Global Step: 207810   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:45:27,976-Speed 3224.21 samples/sec   Loss 1.4126   LearningRate 0.0027   Epoch: 16   Global Step: 207820   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:45:31,153-Speed 3223.77 samples/sec   Loss 1.4042   LearningRate 0.0027   Epoch: 16   Global Step: 207830   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:45:34,279-Speed 3276.55 samples/sec   Loss 1.3723   LearningRate 0.0027   Epoch: 16   Global Step: 207840   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:45:37,433-Speed 3248.36 samples/sec   Loss 1.3636   LearningRate 0.0027   Epoch: 16   Global Step: 207850   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:45:40,536-Speed 3300.18 samples/sec   Loss 1.3851   LearningRate 0.0027   Epoch: 16   Global Step: 207860   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:45:43,631-Speed 3310.62 samples/sec   Loss 1.3701   LearningRate 0.0027   Epoch: 16   Global Step: 207870   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:45:46,755-Speed 3278.49 samples/sec   Loss 1.3695   LearningRate 0.0027   Epoch: 16   Global Step: 207880   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:45:49,866-Speed 3293.19 samples/sec   Loss 1.3712   LearningRate 0.0027   Epoch: 16   Global Step: 207890   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:45:53,039-Speed 3227.70 samples/sec   Loss 1.3637   LearningRate 0.0027   Epoch: 16   Global Step: 207900   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:45:56,135-Speed 3308.30 samples/sec   Loss 1.3756   LearningRate 0.0027   Epoch: 16   Global Step: 207910   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:45:59,263-Speed 3275.80 samples/sec   Loss 1.3197   LearningRate 0.0027   Epoch: 16   Global Step: 207920   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:46:02,396-Speed 3268.66 samples/sec   Loss 1.4447   LearningRate 0.0027   Epoch: 16   Global Step: 207930   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-04-27 19:46:05,502-Speed 3298.22 samples/sec   Loss 1.3701   LearningRate 0.0027   Epoch: 16   Global Step: 207940   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:46:08,635-Speed 3269.05 samples/sec   Loss 1.3723   LearningRate 0.0027   Epoch: 16   Global Step: 207950   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-04-27 19:46:11,710-Speed 3331.40 samples/sec   Loss 1.3718   LearningRate 0.0027   Epoch: 16   Global Step: 207960   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:46:14,842-Speed 3270.75 samples/sec   Loss 1.4333   LearningRate 0.0027   Epoch: 16   Global Step: 207970   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:46:17,954-Speed 3292.01 samples/sec   Loss 1.4008   LearningRate 0.0027   Epoch: 16   Global Step: 207980   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:46:21,051-Speed 3306.35 samples/sec   Loss 1.3697   LearningRate 0.0026   Epoch: 16   Global Step: 207990   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:46:24,150-Speed 3306.11 samples/sec   Loss 1.3438   LearningRate 0.0026   Epoch: 16   Global Step: 208000   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:46:27,254-Speed 3300.32 samples/sec   Loss 1.3945   LearningRate 0.0026   Epoch: 16   Global Step: 208010   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:46:30,384-Speed 3271.86 samples/sec   Loss 1.3888   LearningRate 0.0026   Epoch: 16   Global Step: 208020   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:46:33,450-Speed 3341.01 samples/sec   Loss 1.4277   LearningRate 0.0026   Epoch: 16   Global Step: 208030   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:46:36,527-Speed 3329.69 samples/sec   Loss 1.4158   LearningRate 0.0026   Epoch: 16   Global Step: 208040   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:46:39,586-Speed 3348.20 samples/sec   Loss 1.3938   LearningRate 0.0026   Epoch: 16   Global Step: 208050   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:46:42,699-Speed 3291.21 samples/sec   Loss 1.3369   LearningRate 0.0026   Epoch: 16   Global Step: 208060   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:46:45,758-Speed 3348.08 samples/sec   Loss 1.3931   LearningRate 0.0026   Epoch: 16   Global Step: 208070   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:46:48,861-Speed 3300.89 samples/sec   Loss 1.3386   LearningRate 0.0026   Epoch: 16   Global Step: 208080   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:46:51,978-Speed 3286.13 samples/sec   Loss 1.3461   LearningRate 0.0026   Epoch: 16   Global Step: 208090   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:46:55,112-Speed 3268.30 samples/sec   Loss 1.3083   LearningRate 0.0026   Epoch: 16   Global Step: 208100   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:46:58,170-Speed 3350.32 samples/sec   Loss 1.3696   LearningRate 0.0026   Epoch: 16   Global Step: 208110   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:47:01,232-Speed 3344.75 samples/sec   Loss 1.3425   LearningRate 0.0026   Epoch: 16   Global Step: 208120   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:47:04,334-Speed 3302.86 samples/sec   Loss 1.3942   LearningRate 0.0026   Epoch: 16   Global Step: 208130   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:47:07,449-Speed 3287.71 samples/sec   Loss 1.3582   LearningRate 0.0026   Epoch: 16   Global Step: 208140   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:47:10,604-Speed 3246.98 samples/sec   Loss 1.3362   LearningRate 0.0026   Epoch: 16   Global Step: 208150   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:47:13,660-Speed 3351.52 samples/sec   Loss 1.3753   LearningRate 0.0026   Epoch: 16   Global Step: 208160   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:47:16,804-Speed 3258.42 samples/sec   Loss 1.3964   LearningRate 0.0026   Epoch: 16   Global Step: 208170   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:47:19,904-Speed 3303.66 samples/sec   Loss 1.3835   LearningRate 0.0026   Epoch: 16   Global Step: 208180   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:47:23,005-Speed 3304.07 samples/sec   Loss 1.3756   LearningRate 0.0026   Epoch: 16   Global Step: 208190   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:47:26,151-Speed 3255.90 samples/sec   Loss 1.3864   LearningRate 0.0026   Epoch: 16   Global Step: 208200   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:47:29,328-Speed 3223.71 samples/sec   Loss 1.4201   LearningRate 0.0026   Epoch: 16   Global Step: 208210   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:47:32,426-Speed 3305.95 samples/sec   Loss 1.3882   LearningRate 0.0026   Epoch: 16   Global Step: 208220   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:47:35,516-Speed 3314.96 samples/sec   Loss 1.4143   LearningRate 0.0026   Epoch: 16   Global Step: 208230   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:47:38,625-Speed 3294.80 samples/sec   Loss 1.3648   LearningRate 0.0026   Epoch: 16   Global Step: 208240   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:47:41,695-Speed 3336.76 samples/sec   Loss 1.3643   LearningRate 0.0026   Epoch: 16   Global Step: 208250   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 19:47:44,809-Speed 3289.05 samples/sec   Loss 1.3879   LearningRate 0.0026   Epoch: 16   Global Step: 208260   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:47:47,973-Speed 3238.22 samples/sec   Loss 1.3695   LearningRate 0.0026   Epoch: 16   Global Step: 208270   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:47:51,140-Speed 3233.26 samples/sec   Loss 1.3694   LearningRate 0.0026   Epoch: 16   Global Step: 208280   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:47:54,241-Speed 3304.29 samples/sec   Loss 1.3962   LearningRate 0.0026   Epoch: 16   Global Step: 208290   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:47:57,328-Speed 3317.68 samples/sec   Loss 1.4391   LearningRate 0.0026   Epoch: 16   Global Step: 208300   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:48:00,535-Speed 3194.28 samples/sec   Loss 1.3106   LearningRate 0.0026   Epoch: 16   Global Step: 208310   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:48:03,616-Speed 3324.44 samples/sec   Loss 1.4052   LearningRate 0.0026   Epoch: 16   Global Step: 208320   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:48:06,731-Speed 3288.31 samples/sec   Loss 1.3866   LearningRate 0.0026   Epoch: 16   Global Step: 208330   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:48:09,803-Speed 3334.73 samples/sec   Loss 1.3828   LearningRate 0.0026   Epoch: 16   Global Step: 208340   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:48:12,936-Speed 3270.01 samples/sec   Loss 1.3225   LearningRate 0.0026   Epoch: 16   Global Step: 208350   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:48:16,028-Speed 3312.45 samples/sec   Loss 1.3782   LearningRate 0.0026   Epoch: 16   Global Step: 208360   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:48:19,169-Speed 3261.57 samples/sec   Loss 1.3321   LearningRate 0.0026   Epoch: 16   Global Step: 208370   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:48:22,233-Speed 3342.74 samples/sec   Loss 1.3738   LearningRate 0.0026   Epoch: 16   Global Step: 208380   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:48:25,350-Speed 3286.42 samples/sec   Loss 1.4254   LearningRate 0.0026   Epoch: 16   Global Step: 208390   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:48:28,572-Speed 3179.88 samples/sec   Loss 1.3793   LearningRate 0.0026   Epoch: 16   Global Step: 208400   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:48:31,709-Speed 3264.58 samples/sec   Loss 1.3395   LearningRate 0.0026   Epoch: 16   Global Step: 208410   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:48:34,857-Speed 3254.10 samples/sec   Loss 1.3467   LearningRate 0.0026   Epoch: 16   Global Step: 208420   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:48:37,983-Speed 3276.37 samples/sec   Loss 1.4075   LearningRate 0.0026   Epoch: 16   Global Step: 208430   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:48:41,070-Speed 3318.74 samples/sec   Loss 1.3999   LearningRate 0.0026   Epoch: 16   Global Step: 208440   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:48:44,189-Speed 3283.57 samples/sec   Loss 1.3470   LearningRate 0.0026   Epoch: 16   Global Step: 208450   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:48:47,268-Speed 3327.72 samples/sec   Loss 1.3563   LearningRate 0.0026   Epoch: 16   Global Step: 208460   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:48:50,463-Speed 3205.10 samples/sec   Loss 1.3452   LearningRate 0.0026   Epoch: 16   Global Step: 208470   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:48:53,544-Speed 3324.92 samples/sec   Loss 1.3708   LearningRate 0.0026   Epoch: 16   Global Step: 208480   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:48:56,594-Speed 3359.14 samples/sec   Loss 1.4171   LearningRate 0.0026   Epoch: 16   Global Step: 208490   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:48:59,659-Speed 3341.39 samples/sec   Loss 1.3261   LearningRate 0.0026   Epoch: 16   Global Step: 208500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 19:49:02,695-Speed 3374.12 samples/sec   Loss 1.3819   LearningRate 0.0026   Epoch: 16   Global Step: 208510   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:49:05,770-Speed 3331.38 samples/sec   Loss 1.3804   LearningRate 0.0026   Epoch: 16   Global Step: 208520   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:49:08,856-Speed 3318.47 samples/sec   Loss 1.4246   LearningRate 0.0026   Epoch: 16   Global Step: 208530   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:49:12,041-Speed 3216.76 samples/sec   Loss 1.4267   LearningRate 0.0026   Epoch: 16   Global Step: 208540   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:49:15,121-Speed 3325.75 samples/sec   Loss 1.4536   LearningRate 0.0026   Epoch: 16   Global Step: 208550   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:49:18,194-Speed 3333.24 samples/sec   Loss 1.3707   LearningRate 0.0026   Epoch: 16   Global Step: 208560   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:49:21,289-Speed 3309.60 samples/sec   Loss 1.3757   LearningRate 0.0026   Epoch: 16   Global Step: 208570   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:49:24,470-Speed 3219.84 samples/sec   Loss 1.4341   LearningRate 0.0026   Epoch: 16   Global Step: 208580   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:49:27,622-Speed 3250.26 samples/sec   Loss 1.3703   LearningRate 0.0026   Epoch: 16   Global Step: 208590   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:49:30,699-Speed 3328.28 samples/sec   Loss 1.3302   LearningRate 0.0026   Epoch: 16   Global Step: 208600   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:49:33,776-Speed 3329.92 samples/sec   Loss 1.3373   LearningRate 0.0026   Epoch: 16   Global Step: 208610   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 19:49:36,863-Speed 3318.25 samples/sec   Loss 1.3567   LearningRate 0.0026   Epoch: 16   Global Step: 208620   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 19:49:39,941-Speed 3327.60 samples/sec   Loss 1.4237   LearningRate 0.0026   Epoch: 16   Global Step: 208630   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:49:43,087-Speed 3255.58 samples/sec   Loss 1.3378   LearningRate 0.0026   Epoch: 16   Global Step: 208640   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:49:46,189-Speed 3301.84 samples/sec   Loss 1.3639   LearningRate 0.0026   Epoch: 16   Global Step: 208650   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:49:49,320-Speed 3271.55 samples/sec   Loss 1.3610   LearningRate 0.0026   Epoch: 16   Global Step: 208660   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:49:52,449-Speed 3273.86 samples/sec   Loss 1.3432   LearningRate 0.0026   Epoch: 16   Global Step: 208670   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:49:55,589-Speed 3262.66 samples/sec   Loss 1.3908   LearningRate 0.0026   Epoch: 16   Global Step: 208680   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:49:58,717-Speed 3274.22 samples/sec   Loss 1.3595   LearningRate 0.0026   Epoch: 16   Global Step: 208690   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:50:01,805-Speed 3317.08 samples/sec   Loss 1.4001   LearningRate 0.0026   Epoch: 16   Global Step: 208700   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:50:04,871-Speed 3340.83 samples/sec   Loss 1.3581   LearningRate 0.0026   Epoch: 16   Global Step: 208710   Fp16 Grad Scale: 4096   Required: 3 hours
Training: 2022-04-27 19:50:07,941-Speed 3336.72 samples/sec   Loss 1.3966   LearningRate 0.0026   Epoch: 16   Global Step: 208720   Fp16 Grad Scale: 4096   Required: 3 hours
Training: 2022-04-27 19:50:11,008-Speed 3339.92 samples/sec   Loss 1.3942   LearningRate 0.0026   Epoch: 16   Global Step: 208730   Fp16 Grad Scale: 4096   Required: 3 hours
Training: 2022-04-27 19:50:14,156-Speed 3253.78 samples/sec   Loss 1.3262   LearningRate 0.0026   Epoch: 16   Global Step: 208740   Fp16 Grad Scale: 4096   Required: 3 hours
Training: 2022-04-27 19:50:17,310-Speed 3248.20 samples/sec   Loss 1.3894   LearningRate 0.0026   Epoch: 16   Global Step: 208750   Fp16 Grad Scale: 4096   Required: 3 hours
Training: 2022-04-27 19:50:20,373-Speed 3344.42 samples/sec   Loss 1.3674   LearningRate 0.0025   Epoch: 16   Global Step: 208760   Fp16 Grad Scale: 4096   Required: 3 hours
Training: 2022-04-27 19:50:23,550-Speed 3224.57 samples/sec   Loss 1.3860   LearningRate 0.0025   Epoch: 16   Global Step: 208770   Fp16 Grad Scale: 4096   Required: 3 hours
Training: 2022-04-27 19:50:26,676-Speed 3276.39 samples/sec   Loss 1.3781   LearningRate 0.0025   Epoch: 16   Global Step: 208780   Fp16 Grad Scale: 4096   Required: 3 hours
Training: 2022-04-27 19:50:29,734-Speed 3350.48 samples/sec   Loss 1.3401   LearningRate 0.0025   Epoch: 16   Global Step: 208790   Fp16 Grad Scale: 4096   Required: 3 hours
Training: 2022-04-27 19:50:32,786-Speed 3355.66 samples/sec   Loss 1.4181   LearningRate 0.0025   Epoch: 16   Global Step: 208800   Fp16 Grad Scale: 4096   Required: 3 hours
Training: 2022-04-27 19:50:35,856-Speed 3336.82 samples/sec   Loss 1.3762   LearningRate 0.0025   Epoch: 16   Global Step: 208810   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:50:38,949-Speed 3311.79 samples/sec   Loss 1.4054   LearningRate 0.0025   Epoch: 16   Global Step: 208820   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:50:42,069-Speed 3282.48 samples/sec   Loss 1.4028   LearningRate 0.0025   Epoch: 16   Global Step: 208830   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:50:45,123-Speed 3354.24 samples/sec   Loss 1.3744   LearningRate 0.0025   Epoch: 16   Global Step: 208840   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:50:48,185-Speed 3345.42 samples/sec   Loss 1.3279   LearningRate 0.0025   Epoch: 16   Global Step: 208850   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:50:51,334-Speed 3252.87 samples/sec   Loss 1.4134   LearningRate 0.0025   Epoch: 16   Global Step: 208860   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:50:54,473-Speed 3262.99 samples/sec   Loss 1.4125   LearningRate 0.0025   Epoch: 16   Global Step: 208870   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:50:57,565-Speed 3312.52 samples/sec   Loss 1.3895   LearningRate 0.0025   Epoch: 16   Global Step: 208880   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:51:00,694-Speed 3274.21 samples/sec   Loss 1.3418   LearningRate 0.0025   Epoch: 16   Global Step: 208890   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:51:03,825-Speed 3271.89 samples/sec   Loss 1.3946   LearningRate 0.0025   Epoch: 16   Global Step: 208900   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:51:06,909-Speed 3321.45 samples/sec   Loss 1.3280   LearningRate 0.0025   Epoch: 16   Global Step: 208910   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:51:09,952-Speed 3365.43 samples/sec   Loss 1.3498   LearningRate 0.0025   Epoch: 16   Global Step: 208920   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:51:13,081-Speed 3274.68 samples/sec   Loss 1.3034   LearningRate 0.0025   Epoch: 16   Global Step: 208930   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:51:16,204-Speed 3280.12 samples/sec   Loss 1.3898   LearningRate 0.0025   Epoch: 16   Global Step: 208940   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:51:19,296-Speed 3312.22 samples/sec   Loss 1.3972   LearningRate 0.0025   Epoch: 16   Global Step: 208950   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:51:22,360-Speed 3342.81 samples/sec   Loss 1.3949   LearningRate 0.0025   Epoch: 16   Global Step: 208960   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:51:25,644-Speed 3119.95 samples/sec   Loss 1.3627   LearningRate 0.0025   Epoch: 16   Global Step: 208970   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:51:28,837-Speed 3207.67 samples/sec   Loss 1.3972   LearningRate 0.0025   Epoch: 16   Global Step: 208980   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:51:31,978-Speed 3261.42 samples/sec   Loss 1.3949   LearningRate 0.0025   Epoch: 16   Global Step: 208990   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:51:35,147-Speed 3232.34 samples/sec   Loss 1.3767   LearningRate 0.0025   Epoch: 16   Global Step: 209000   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:51:38,258-Speed 3292.96 samples/sec   Loss 1.3744   LearningRate 0.0025   Epoch: 16   Global Step: 209010   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:51:41,425-Speed 3234.49 samples/sec   Loss 1.4114   LearningRate 0.0025   Epoch: 16   Global Step: 209020   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:51:44,570-Speed 3257.31 samples/sec   Loss 1.3775   LearningRate 0.0025   Epoch: 16   Global Step: 209030   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:51:47,713-Speed 3258.46 samples/sec   Loss 1.3232   LearningRate 0.0025   Epoch: 16   Global Step: 209040   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:51:50,850-Speed 3265.51 samples/sec   Loss 1.3702   LearningRate 0.0025   Epoch: 16   Global Step: 209050   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:51:54,018-Speed 3233.39 samples/sec   Loss 1.3829   LearningRate 0.0025   Epoch: 16   Global Step: 209060   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:51:57,091-Speed 3333.27 samples/sec   Loss 1.3880   LearningRate 0.0025   Epoch: 16   Global Step: 209070   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:52:00,217-Speed 3276.35 samples/sec   Loss 1.4159   LearningRate 0.0025   Epoch: 16   Global Step: 209080   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:52:03,364-Speed 3255.37 samples/sec   Loss 1.3764   LearningRate 0.0025   Epoch: 16   Global Step: 209090   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:52:06,558-Speed 3206.39 samples/sec   Loss 1.3406   LearningRate 0.0025   Epoch: 16   Global Step: 209100   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:52:09,645-Speed 3318.75 samples/sec   Loss 1.3390   LearningRate 0.0025   Epoch: 16   Global Step: 209110   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:52:12,769-Speed 3278.80 samples/sec   Loss 1.3829   LearningRate 0.0025   Epoch: 16   Global Step: 209120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 19:52:15,924-Speed 3246.87 samples/sec   Loss 1.3516   LearningRate 0.0025   Epoch: 16   Global Step: 209130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 19:52:19,080-Speed 3245.63 samples/sec   Loss 1.3536   LearningRate 0.0025   Epoch: 16   Global Step: 209140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 19:52:22,202-Speed 3281.07 samples/sec   Loss 1.4125   LearningRate 0.0025   Epoch: 16   Global Step: 209150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 19:52:25,345-Speed 3258.77 samples/sec   Loss 1.3747   LearningRate 0.0025   Epoch: 16   Global Step: 209160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 19:52:28,423-Speed 3327.51 samples/sec   Loss 1.3885   LearningRate 0.0025   Epoch: 16   Global Step: 209170   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 19:52:31,518-Speed 3309.89 samples/sec   Loss 1.3711   LearningRate 0.0025   Epoch: 16   Global Step: 209180   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 19:52:34,605-Speed 3318.83 samples/sec   Loss 1.3910   LearningRate 0.0025   Epoch: 16   Global Step: 209190   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:52:37,718-Speed 3290.36 samples/sec   Loss 1.3430   LearningRate 0.0025   Epoch: 16   Global Step: 209200   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:52:40,846-Speed 3274.27 samples/sec   Loss 1.4279   LearningRate 0.0025   Epoch: 16   Global Step: 209210   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:52:43,958-Speed 3291.93 samples/sec   Loss 1.3243   LearningRate 0.0025   Epoch: 16   Global Step: 209220   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:52:47,052-Speed 3310.98 samples/sec   Loss 1.3644   LearningRate 0.0025   Epoch: 16   Global Step: 209230   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:52:50,150-Speed 3305.78 samples/sec   Loss 1.3893   LearningRate 0.0025   Epoch: 16   Global Step: 209240   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:52:53,273-Speed 3280.01 samples/sec   Loss 1.3852   LearningRate 0.0025   Epoch: 16   Global Step: 209250   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:52:56,393-Speed 3283.64 samples/sec   Loss 1.3490   LearningRate 0.0025   Epoch: 16   Global Step: 209260   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:52:59,546-Speed 3248.57 samples/sec   Loss 1.3922   LearningRate 0.0025   Epoch: 16   Global Step: 209270   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:53:02,664-Speed 3285.42 samples/sec   Loss 1.3950   LearningRate 0.0025   Epoch: 16   Global Step: 209280   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:53:05,850-Speed 3214.69 samples/sec   Loss 1.3013   LearningRate 0.0025   Epoch: 16   Global Step: 209290   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:53:08,967-Speed 3287.35 samples/sec   Loss 1.3566   LearningRate 0.0025   Epoch: 16   Global Step: 209300   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:53:12,055-Speed 3316.82 samples/sec   Loss 1.3652   LearningRate 0.0025   Epoch: 16   Global Step: 209310   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:53:15,174-Speed 3284.65 samples/sec   Loss 1.3543   LearningRate 0.0025   Epoch: 16   Global Step: 209320   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:53:18,253-Speed 3325.98 samples/sec   Loss 1.3652   LearningRate 0.0025   Epoch: 16   Global Step: 209330   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:53:21,362-Speed 3294.84 samples/sec   Loss 1.3633   LearningRate 0.0025   Epoch: 16   Global Step: 209340   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:53:24,466-Speed 3300.55 samples/sec   Loss 1.3784   LearningRate 0.0025   Epoch: 16   Global Step: 209350   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:53:27,574-Speed 3295.15 samples/sec   Loss 1.3172   LearningRate 0.0025   Epoch: 16   Global Step: 209360   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:53:30,698-Speed 3278.90 samples/sec   Loss 1.3579   LearningRate 0.0025   Epoch: 16   Global Step: 209370   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:53:33,769-Speed 3335.55 samples/sec   Loss 1.4175   LearningRate 0.0025   Epoch: 16   Global Step: 209380   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:53:36,911-Speed 3259.86 samples/sec   Loss 1.3082   LearningRate 0.0025   Epoch: 16   Global Step: 209390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 19:53:40,063-Speed 3249.82 samples/sec   Loss 1.3856   LearningRate 0.0025   Epoch: 16   Global Step: 209400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 19:53:43,176-Speed 3290.38 samples/sec   Loss 1.3790   LearningRate 0.0025   Epoch: 16   Global Step: 209410   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:53:46,256-Speed 3325.75 samples/sec   Loss 1.3658   LearningRate 0.0025   Epoch: 16   Global Step: 209420   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:53:49,348-Speed 3312.93 samples/sec   Loss 1.4021   LearningRate 0.0025   Epoch: 16   Global Step: 209430   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:53:52,544-Speed 3205.69 samples/sec   Loss 1.3930   LearningRate 0.0025   Epoch: 16   Global Step: 209440   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:53:55,611-Speed 3339.30 samples/sec   Loss 1.3685   LearningRate 0.0025   Epoch: 16   Global Step: 209450   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:53:58,707-Speed 3308.09 samples/sec   Loss 1.3341   LearningRate 0.0025   Epoch: 16   Global Step: 209460   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:54:01,798-Speed 3314.25 samples/sec   Loss 1.3956   LearningRate 0.0025   Epoch: 16   Global Step: 209470   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:54:04,870-Speed 3335.29 samples/sec   Loss 1.3493   LearningRate 0.0025   Epoch: 16   Global Step: 209480   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:54:07,977-Speed 3296.80 samples/sec   Loss 1.3693   LearningRate 0.0025   Epoch: 16   Global Step: 209490   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:54:11,080-Speed 3301.30 samples/sec   Loss 1.3683   LearningRate 0.0025   Epoch: 16   Global Step: 209500   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:54:14,153-Speed 3332.98 samples/sec   Loss 1.4294   LearningRate 0.0025   Epoch: 16   Global Step: 209510   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:54:17,318-Speed 3236.58 samples/sec   Loss 1.3502   LearningRate 0.0025   Epoch: 16   Global Step: 209520   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:54:20,359-Speed 3368.31 samples/sec   Loss 1.3576   LearningRate 0.0025   Epoch: 16   Global Step: 209530   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:54:23,556-Speed 3204.93 samples/sec   Loss 1.3697   LearningRate 0.0024   Epoch: 16   Global Step: 209540   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:54:26,717-Speed 3240.35 samples/sec   Loss 1.3241   LearningRate 0.0024   Epoch: 16   Global Step: 209550   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:54:29,805-Speed 3316.45 samples/sec   Loss 1.4060   LearningRate 0.0024   Epoch: 16   Global Step: 209560   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:54:32,906-Speed 3304.10 samples/sec   Loss 1.4041   LearningRate 0.0024   Epoch: 16   Global Step: 209570   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:54:35,969-Speed 3343.53 samples/sec   Loss 1.3713   LearningRate 0.0024   Epoch: 16   Global Step: 209580   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:54:39,075-Speed 3297.85 samples/sec   Loss 1.3272   LearningRate 0.0024   Epoch: 16   Global Step: 209590   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:54:42,211-Speed 3266.96 samples/sec   Loss 1.3711   LearningRate 0.0024   Epoch: 16   Global Step: 209600   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:54:45,325-Speed 3288.76 samples/sec   Loss 1.3668   LearningRate 0.0024   Epoch: 16   Global Step: 209610   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:54:48,417-Speed 3312.52 samples/sec   Loss 1.3827   LearningRate 0.0024   Epoch: 16   Global Step: 209620   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:54:51,515-Speed 3306.82 samples/sec   Loss 1.3273   LearningRate 0.0024   Epoch: 16   Global Step: 209630   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:54:54,605-Speed 3314.89 samples/sec   Loss 1.3606   LearningRate 0.0024   Epoch: 16   Global Step: 209640   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:54:57,660-Speed 3352.83 samples/sec   Loss 1.3771   LearningRate 0.0024   Epoch: 16   Global Step: 209650   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:55:00,735-Speed 3332.33 samples/sec   Loss 1.3680   LearningRate 0.0024   Epoch: 16   Global Step: 209660   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:55:03,831-Speed 3308.42 samples/sec   Loss 1.4440   LearningRate 0.0024   Epoch: 16   Global Step: 209670   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:55:06,965-Speed 3268.20 samples/sec   Loss 1.3508   LearningRate 0.0024   Epoch: 16   Global Step: 209680   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:55:10,049-Speed 3321.93 samples/sec   Loss 1.3977   LearningRate 0.0024   Epoch: 16   Global Step: 209690   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:55:13,188-Speed 3262.30 samples/sec   Loss 1.3448   LearningRate 0.0024   Epoch: 16   Global Step: 209700   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:55:16,302-Speed 3289.90 samples/sec   Loss 1.3695   LearningRate 0.0024   Epoch: 16   Global Step: 209710   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:55:19,411-Speed 3294.47 samples/sec   Loss 1.3372   LearningRate 0.0024   Epoch: 16   Global Step: 209720   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:55:22,492-Speed 3324.55 samples/sec   Loss 1.3649   LearningRate 0.0024   Epoch: 16   Global Step: 209730   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:55:25,555-Speed 3344.34 samples/sec   Loss 1.3404   LearningRate 0.0024   Epoch: 16   Global Step: 209740   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:55:28,607-Speed 3356.97 samples/sec   Loss 1.3278   LearningRate 0.0024   Epoch: 16   Global Step: 209750   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:55:31,656-Speed 3358.63 samples/sec   Loss 1.3745   LearningRate 0.0024   Epoch: 16   Global Step: 209760   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:55:34,703-Speed 3362.19 samples/sec   Loss 1.3930   LearningRate 0.0024   Epoch: 16   Global Step: 209770   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:55:37,867-Speed 3238.07 samples/sec   Loss 1.3916   LearningRate 0.0024   Epoch: 16   Global Step: 209780   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:55:41,055-Speed 3212.31 samples/sec   Loss 1.3867   LearningRate 0.0024   Epoch: 16   Global Step: 209790   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:55:44,146-Speed 3313.98 samples/sec   Loss 1.3550   LearningRate 0.0024   Epoch: 16   Global Step: 209800   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:55:47,253-Speed 3297.53 samples/sec   Loss 1.3886   LearningRate 0.0024   Epoch: 16   Global Step: 209810   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:55:50,391-Speed 3264.05 samples/sec   Loss 1.3708   LearningRate 0.0024   Epoch: 16   Global Step: 209820   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:55:53,504-Speed 3290.57 samples/sec   Loss 1.3675   LearningRate 0.0024   Epoch: 16   Global Step: 209830   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:55:56,642-Speed 3264.12 samples/sec   Loss 1.3306   LearningRate 0.0024   Epoch: 16   Global Step: 209840   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:55:59,733-Speed 3314.02 samples/sec   Loss 1.3721   LearningRate 0.0024   Epoch: 16   Global Step: 209850   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:56:02,827-Speed 3310.68 samples/sec   Loss 1.3850   LearningRate 0.0024   Epoch: 16   Global Step: 209860   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:56:05,959-Speed 3269.85 samples/sec   Loss 1.3107   LearningRate 0.0024   Epoch: 16   Global Step: 209870   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:56:09,077-Speed 3285.68 samples/sec   Loss 1.3686   LearningRate 0.0024   Epoch: 16   Global Step: 209880   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:56:12,175-Speed 3305.84 samples/sec   Loss 1.3925   LearningRate 0.0024   Epoch: 16   Global Step: 209890   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:56:15,290-Speed 3288.42 samples/sec   Loss 1.3879   LearningRate 0.0024   Epoch: 16   Global Step: 209900   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:56:18,400-Speed 3293.80 samples/sec   Loss 1.3374   LearningRate 0.0024   Epoch: 16   Global Step: 209910   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:56:21,475-Speed 3331.46 samples/sec   Loss 1.3876   LearningRate 0.0024   Epoch: 16   Global Step: 209920   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:56:24,614-Speed 3262.73 samples/sec   Loss 1.3667   LearningRate 0.0024   Epoch: 16   Global Step: 209930   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:56:27,820-Speed 3195.63 samples/sec   Loss 1.3676   LearningRate 0.0024   Epoch: 16   Global Step: 209940   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:56:30,905-Speed 3319.70 samples/sec   Loss 1.3676   LearningRate 0.0024   Epoch: 16   Global Step: 209950   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:56:33,969-Speed 3342.95 samples/sec   Loss 1.4659   LearningRate 0.0024   Epoch: 16   Global Step: 209960   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:56:37,079-Speed 3293.60 samples/sec   Loss 1.3709   LearningRate 0.0024   Epoch: 16   Global Step: 209970   Fp16 Grad Scale: 4096   Required: 3 hours
Training: 2022-04-27 19:56:40,250-Speed 3230.81 samples/sec   Loss 1.3512   LearningRate 0.0024   Epoch: 16   Global Step: 209980   Fp16 Grad Scale: 4096   Required: 3 hours
Training: 2022-04-27 19:56:43,452-Speed 3199.73 samples/sec   Loss 1.3644   LearningRate 0.0024   Epoch: 16   Global Step: 209990   Fp16 Grad Scale: 4096   Required: 3 hours
Training: 2022-04-27 19:56:46,546-Speed 3310.69 samples/sec   Loss 1.3936   LearningRate 0.0024   Epoch: 16   Global Step: 210000   Fp16 Grad Scale: 4096   Required: 3 hours
Training: 2022-04-27 19:56:49,647-Speed 3303.27 samples/sec   Loss 1.3052   LearningRate 0.0024   Epoch: 16   Global Step: 210010   Fp16 Grad Scale: 4096   Required: 3 hours
Training: 2022-04-27 19:56:52,827-Speed 3221.02 samples/sec   Loss 1.4020   LearningRate 0.0024   Epoch: 16   Global Step: 210020   Fp16 Grad Scale: 4096   Required: 3 hours
Training: 2022-04-27 19:56:55,897-Speed 3336.15 samples/sec   Loss 1.3936   LearningRate 0.0024   Epoch: 16   Global Step: 210030   Fp16 Grad Scale: 4096   Required: 3 hours
Training: 2022-04-27 19:56:58,971-Speed 3332.54 samples/sec   Loss 1.3728   LearningRate 0.0024   Epoch: 16   Global Step: 210040   Fp16 Grad Scale: 4096   Required: 3 hours
Training: 2022-04-27 19:57:02,090-Speed 3284.15 samples/sec   Loss 1.3725   LearningRate 0.0024   Epoch: 16   Global Step: 210050   Fp16 Grad Scale: 4096   Required: 3 hours
Training: 2022-04-27 19:57:05,185-Speed 3309.39 samples/sec   Loss 1.4073   LearningRate 0.0024   Epoch: 16   Global Step: 210060   Fp16 Grad Scale: 4096   Required: 3 hours
Training: 2022-04-27 19:57:08,239-Speed 3355.10 samples/sec   Loss 1.3350   LearningRate 0.0024   Epoch: 16   Global Step: 210070   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:57:11,328-Speed 3315.62 samples/sec   Loss 1.3330   LearningRate 0.0024   Epoch: 16   Global Step: 210080   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:57:14,448-Speed 3283.19 samples/sec   Loss 1.4138   LearningRate 0.0024   Epoch: 16   Global Step: 210090   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:57:17,563-Speed 3288.23 samples/sec   Loss 1.3337   LearningRate 0.0024   Epoch: 16   Global Step: 210100   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:57:20,660-Speed 3307.53 samples/sec   Loss 1.3420   LearningRate 0.0024   Epoch: 16   Global Step: 210110   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:57:23,783-Speed 3279.82 samples/sec   Loss 1.3548   LearningRate 0.0024   Epoch: 16   Global Step: 210120   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:57:26,937-Speed 3248.24 samples/sec   Loss 1.3355   LearningRate 0.0024   Epoch: 16   Global Step: 210130   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:57:30,070-Speed 3268.96 samples/sec   Loss 1.4200   LearningRate 0.0024   Epoch: 16   Global Step: 210140   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:57:33,146-Speed 3330.60 samples/sec   Loss 1.3328   LearningRate 0.0024   Epoch: 16   Global Step: 210150   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:57:36,329-Speed 3218.16 samples/sec   Loss 1.3518   LearningRate 0.0024   Epoch: 16   Global Step: 210160   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:57:39,429-Speed 3304.68 samples/sec   Loss 1.3511   LearningRate 0.0024   Epoch: 16   Global Step: 210170   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:57:42,555-Speed 3276.40 samples/sec   Loss 1.3583   LearningRate 0.0024   Epoch: 16   Global Step: 210180   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:57:45,630-Speed 3331.84 samples/sec   Loss 1.3645   LearningRate 0.0024   Epoch: 16   Global Step: 210190   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:57:48,755-Speed 3277.70 samples/sec   Loss 1.3720   LearningRate 0.0024   Epoch: 16   Global Step: 210200   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:57:51,852-Speed 3307.30 samples/sec   Loss 1.3622   LearningRate 0.0024   Epoch: 16   Global Step: 210210   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:57:54,921-Speed 3337.77 samples/sec   Loss 1.3444   LearningRate 0.0024   Epoch: 16   Global Step: 210220   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:57:57,983-Speed 3345.73 samples/sec   Loss 1.3682   LearningRate 0.0024   Epoch: 16   Global Step: 210230   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:58:01,108-Speed 3277.49 samples/sec   Loss 1.3852   LearningRate 0.0024   Epoch: 16   Global Step: 210240   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:58:04,202-Speed 3310.53 samples/sec   Loss 1.3469   LearningRate 0.0024   Epoch: 16   Global Step: 210250   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:58:07,397-Speed 3206.36 samples/sec   Loss 1.3814   LearningRate 0.0024   Epoch: 16   Global Step: 210260   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:58:10,540-Speed 3259.66 samples/sec   Loss 1.4169   LearningRate 0.0024   Epoch: 16   Global Step: 210270   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 19:58:13,627-Speed 3317.48 samples/sec   Loss 1.3488   LearningRate 0.0024   Epoch: 16   Global Step: 210280   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 19:58:16,765-Speed 3264.55 samples/sec   Loss 1.3277   LearningRate 0.0024   Epoch: 16   Global Step: 210290   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 19:58:19,841-Speed 3329.60 samples/sec   Loss 1.3432   LearningRate 0.0024   Epoch: 16   Global Step: 210300   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:58:22,938-Speed 3308.36 samples/sec   Loss 1.3687   LearningRate 0.0024   Epoch: 16   Global Step: 210310   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:58:26,041-Speed 3300.65 samples/sec   Loss 1.3751   LearningRate 0.0024   Epoch: 16   Global Step: 210320   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:58:29,126-Speed 3320.79 samples/sec   Loss 1.3405   LearningRate 0.0024   Epoch: 16   Global Step: 210330   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:58:32,200-Speed 3332.04 samples/sec   Loss 1.3770   LearningRate 0.0023   Epoch: 16   Global Step: 210340   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:58:35,311-Speed 3292.31 samples/sec   Loss 1.3989   LearningRate 0.0023   Epoch: 16   Global Step: 210350   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:58:38,431-Speed 3283.85 samples/sec   Loss 1.3624   LearningRate 0.0023   Epoch: 16   Global Step: 210360   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:58:41,574-Speed 3258.87 samples/sec   Loss 1.3951   LearningRate 0.0023   Epoch: 16   Global Step: 210370   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:58:44,642-Speed 3338.62 samples/sec   Loss 1.3912   LearningRate 0.0023   Epoch: 16   Global Step: 210380   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:58:47,791-Speed 3253.54 samples/sec   Loss 1.3165   LearningRate 0.0023   Epoch: 16   Global Step: 210390   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:58:50,954-Speed 3238.02 samples/sec   Loss 1.3701   LearningRate 0.0023   Epoch: 16   Global Step: 210400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 19:58:54,159-Speed 3195.89 samples/sec   Loss 1.3594   LearningRate 0.0023   Epoch: 16   Global Step: 210410   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 19:58:57,244-Speed 3320.29 samples/sec   Loss 1.3168   LearningRate 0.0023   Epoch: 16   Global Step: 210420   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 19:59:00,374-Speed 3272.88 samples/sec   Loss 1.3800   LearningRate 0.0023   Epoch: 16   Global Step: 210430   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 19:59:03,518-Speed 3257.75 samples/sec   Loss 1.3555   LearningRate 0.0023   Epoch: 16   Global Step: 210440   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 19:59:06,634-Speed 3287.86 samples/sec   Loss 1.3699   LearningRate 0.0023   Epoch: 16   Global Step: 210450   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 19:59:09,704-Speed 3336.57 samples/sec   Loss 1.4021   LearningRate 0.0023   Epoch: 16   Global Step: 210460   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 19:59:12,773-Speed 3337.06 samples/sec   Loss 1.3737   LearningRate 0.0023   Epoch: 16   Global Step: 210470   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 19:59:15,861-Speed 3317.96 samples/sec   Loss 1.3799   LearningRate 0.0023   Epoch: 16   Global Step: 210480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 19:59:19,045-Speed 3217.26 samples/sec   Loss 1.3499   LearningRate 0.0023   Epoch: 16   Global Step: 210490   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:59:22,126-Speed 3324.08 samples/sec   Loss 1.3661   LearningRate 0.0023   Epoch: 16   Global Step: 210500   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:59:25,232-Speed 3298.43 samples/sec   Loss 1.3825   LearningRate 0.0023   Epoch: 16   Global Step: 210510   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:59:28,316-Speed 3321.35 samples/sec   Loss 1.3754   LearningRate 0.0023   Epoch: 16   Global Step: 210520   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:59:31,414-Speed 3306.81 samples/sec   Loss 1.3725   LearningRate 0.0023   Epoch: 16   Global Step: 210530   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:59:34,515-Speed 3302.36 samples/sec   Loss 1.3440   LearningRate 0.0023   Epoch: 16   Global Step: 210540   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:59:37,610-Speed 3309.53 samples/sec   Loss 1.3238   LearningRate 0.0023   Epoch: 16   Global Step: 210550   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:59:40,740-Speed 3272.99 samples/sec   Loss 1.3375   LearningRate 0.0023   Epoch: 16   Global Step: 210560   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:59:43,813-Speed 3333.35 samples/sec   Loss 1.3967   LearningRate 0.0023   Epoch: 16   Global Step: 210570   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 19:59:46,892-Speed 3326.68 samples/sec   Loss 1.3672   LearningRate 0.0023   Epoch: 16   Global Step: 210580   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:59:49,990-Speed 3306.30 samples/sec   Loss 1.3558   LearningRate 0.0023   Epoch: 16   Global Step: 210590   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:59:53,103-Speed 3291.06 samples/sec   Loss 1.3251   LearningRate 0.0023   Epoch: 16   Global Step: 210600   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:59:56,228-Speed 3278.29 samples/sec   Loss 1.4159   LearningRate 0.0023   Epoch: 16   Global Step: 210610   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 19:59:59,305-Speed 3329.01 samples/sec   Loss 1.3664   LearningRate 0.0023   Epoch: 16   Global Step: 210620   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:00:02,395-Speed 3314.31 samples/sec   Loss 1.3719   LearningRate 0.0023   Epoch: 16   Global Step: 210630   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:00:05,464-Speed 3338.51 samples/sec   Loss 1.3266   LearningRate 0.0023   Epoch: 16   Global Step: 210640   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:00:08,536-Speed 3334.21 samples/sec   Loss 1.3791   LearningRate 0.0023   Epoch: 16   Global Step: 210650   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:00:11,625-Speed 3316.06 samples/sec   Loss 1.3600   LearningRate 0.0023   Epoch: 16   Global Step: 210660   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:00:14,785-Speed 3242.05 samples/sec   Loss 1.3242   LearningRate 0.0023   Epoch: 16   Global Step: 210670   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:00:17,893-Speed 3295.87 samples/sec   Loss 1.3707   LearningRate 0.0023   Epoch: 16   Global Step: 210680   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:00:20,947-Speed 3353.87 samples/sec   Loss 1.4555   LearningRate 0.0023   Epoch: 16   Global Step: 210690   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:00:24,075-Speed 3275.06 samples/sec   Loss 1.3253   LearningRate 0.0023   Epoch: 16   Global Step: 210700   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:00:27,205-Speed 3272.03 samples/sec   Loss 1.3417   LearningRate 0.0023   Epoch: 16   Global Step: 210710   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:00:30,333-Speed 3275.34 samples/sec   Loss 1.3694   LearningRate 0.0023   Epoch: 16   Global Step: 210720   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:00:33,416-Speed 3321.97 samples/sec   Loss 1.3316   LearningRate 0.0023   Epoch: 16   Global Step: 210730   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:00:36,543-Speed 3275.70 samples/sec   Loss 1.3456   LearningRate 0.0023   Epoch: 16   Global Step: 210740   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:00:39,652-Speed 3295.38 samples/sec   Loss 1.3584   LearningRate 0.0023   Epoch: 16   Global Step: 210750   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:00:42,787-Speed 3266.74 samples/sec   Loss 1.3476   LearningRate 0.0023   Epoch: 16   Global Step: 210760   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:00:45,855-Speed 3339.12 samples/sec   Loss 1.3394   LearningRate 0.0023   Epoch: 16   Global Step: 210770   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:00:48,952-Speed 3307.06 samples/sec   Loss 1.4037   LearningRate 0.0023   Epoch: 16   Global Step: 210780   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:00:52,115-Speed 3239.31 samples/sec   Loss 1.4054   LearningRate 0.0023   Epoch: 16   Global Step: 210790   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:00:55,198-Speed 3322.37 samples/sec   Loss 1.3253   LearningRate 0.0023   Epoch: 16   Global Step: 210800   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:00:58,287-Speed 3315.96 samples/sec   Loss 1.3554   LearningRate 0.0023   Epoch: 16   Global Step: 210810   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:01:01,420-Speed 3270.32 samples/sec   Loss 1.3547   LearningRate 0.0023   Epoch: 16   Global Step: 210820   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:01:04,542-Speed 3280.08 samples/sec   Loss 1.3331   LearningRate 0.0023   Epoch: 16   Global Step: 210830   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:01:07,650-Speed 3296.15 samples/sec   Loss 1.4020   LearningRate 0.0023   Epoch: 16   Global Step: 210840   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:01:10,763-Speed 3290.62 samples/sec   Loss 1.3626   LearningRate 0.0023   Epoch: 16   Global Step: 210850   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:01:13,952-Speed 3212.00 samples/sec   Loss 1.3596   LearningRate 0.0023   Epoch: 16   Global Step: 210860   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:01:17,112-Speed 3241.91 samples/sec   Loss 1.3600   LearningRate 0.0023   Epoch: 16   Global Step: 210870   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:01:20,199-Speed 3318.25 samples/sec   Loss 1.3861   LearningRate 0.0023   Epoch: 16   Global Step: 210880   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:01:23,275-Speed 3329.21 samples/sec   Loss 1.3400   LearningRate 0.0023   Epoch: 16   Global Step: 210890   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:01:26,406-Speed 3271.80 samples/sec   Loss 1.3597   LearningRate 0.0023   Epoch: 16   Global Step: 210900   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:01:29,502-Speed 3308.34 samples/sec   Loss 1.3753   LearningRate 0.0023   Epoch: 16   Global Step: 210910   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:01:32,595-Speed 3311.62 samples/sec   Loss 1.3463   LearningRate 0.0023   Epoch: 16   Global Step: 210920   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:01:35,689-Speed 3310.48 samples/sec   Loss 1.3811   LearningRate 0.0023   Epoch: 16   Global Step: 210930   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:01:38,847-Speed 3244.44 samples/sec   Loss 1.3921   LearningRate 0.0023   Epoch: 16   Global Step: 210940   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:01:42,017-Speed 3230.91 samples/sec   Loss 1.3692   LearningRate 0.0023   Epoch: 16   Global Step: 210950   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:01:45,109-Speed 3312.49 samples/sec   Loss 1.3658   LearningRate 0.0023   Epoch: 16   Global Step: 210960   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:01:48,309-Speed 3201.23 samples/sec   Loss 1.4011   LearningRate 0.0023   Epoch: 16   Global Step: 210970   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:01:51,409-Speed 3304.99 samples/sec   Loss 1.3509   LearningRate 0.0023   Epoch: 16   Global Step: 210980   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:01:54,553-Speed 3257.49 samples/sec   Loss 1.3598   LearningRate 0.0023   Epoch: 16   Global Step: 210990   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:01:57,624-Speed 3335.68 samples/sec   Loss 1.3043   LearningRate 0.0023   Epoch: 16   Global Step: 211000   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:02:00,777-Speed 3248.91 samples/sec   Loss 1.4073   LearningRate 0.0023   Epoch: 16   Global Step: 211010   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:02:03,866-Speed 3315.34 samples/sec   Loss 1.3871   LearningRate 0.0023   Epoch: 16   Global Step: 211020   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:02:06,959-Speed 3311.93 samples/sec   Loss 1.3290   LearningRate 0.0023   Epoch: 16   Global Step: 211030   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:02:10,053-Speed 3310.70 samples/sec   Loss 1.4145   LearningRate 0.0023   Epoch: 16   Global Step: 211040   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:02:13,181-Speed 3274.36 samples/sec   Loss 1.3689   LearningRate 0.0023   Epoch: 16   Global Step: 211050   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:02:16,306-Speed 3278.63 samples/sec   Loss 1.3302   LearningRate 0.0023   Epoch: 16   Global Step: 211060   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:02:19,457-Speed 3250.61 samples/sec   Loss 1.3197   LearningRate 0.0023   Epoch: 16   Global Step: 211070   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:02:22,548-Speed 3313.30 samples/sec   Loss 1.3649   LearningRate 0.0023   Epoch: 16   Global Step: 211080   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:02:25,671-Speed 3280.08 samples/sec   Loss 1.4359   LearningRate 0.0023   Epoch: 16   Global Step: 211090   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:02:28,796-Speed 3278.10 samples/sec   Loss 1.3681   LearningRate 0.0023   Epoch: 16   Global Step: 211100   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:02:31,946-Speed 3251.51 samples/sec   Loss 1.3724   LearningRate 0.0023   Epoch: 16   Global Step: 211110   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:02:35,636-Speed 2775.96 samples/sec   Loss 1.3336   LearningRate 0.0023   Epoch: 16   Global Step: 211120   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:02:38,801-Speed 3235.71 samples/sec   Loss 1.3859   LearningRate 0.0023   Epoch: 16   Global Step: 211130   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:02:41,963-Speed 3239.51 samples/sec   Loss 1.3713   LearningRate 0.0023   Epoch: 16   Global Step: 211140   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:02:45,223-Speed 3142.95 samples/sec   Loss 1.4018   LearningRate 0.0023   Epoch: 16   Global Step: 211150   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:03:17,341-Speed 318.83 samples/sec   Loss 1.2997   LearningRate 0.0022   Epoch: 17   Global Step: 211160   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:03:20,538-Speed 3205.46 samples/sec   Loss 1.0522   LearningRate 0.0022   Epoch: 17   Global Step: 211170   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:03:24,049-Speed 2916.82 samples/sec   Loss 1.0520   LearningRate 0.0022   Epoch: 17   Global Step: 211180   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:03:27,150-Speed 3303.79 samples/sec   Loss 1.1083   LearningRate 0.0022   Epoch: 17   Global Step: 211190   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:03:30,239-Speed 3315.99 samples/sec   Loss 1.0343   LearningRate 0.0022   Epoch: 17   Global Step: 211200   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:03:33,298-Speed 3348.58 samples/sec   Loss 1.0310   LearningRate 0.0022   Epoch: 17   Global Step: 211210   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:03:36,425-Speed 3276.46 samples/sec   Loss 1.0416   LearningRate 0.0022   Epoch: 17   Global Step: 211220   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:03:39,564-Speed 3262.47 samples/sec   Loss 1.0024   LearningRate 0.0022   Epoch: 17   Global Step: 211230   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:03:42,679-Speed 3289.19 samples/sec   Loss 1.0339   LearningRate 0.0022   Epoch: 17   Global Step: 211240   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:03:45,736-Speed 3350.67 samples/sec   Loss 1.0568   LearningRate 0.0022   Epoch: 17   Global Step: 211250   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:03:48,831-Speed 3309.21 samples/sec   Loss 1.0421   LearningRate 0.0022   Epoch: 17   Global Step: 211260   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:03:51,969-Speed 3264.38 samples/sec   Loss 0.9979   LearningRate 0.0022   Epoch: 17   Global Step: 211270   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:03:55,244-Speed 3127.19 samples/sec   Loss 1.0100   LearningRate 0.0022   Epoch: 17   Global Step: 211280   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:03:58,280-Speed 3374.63 samples/sec   Loss 0.9908   LearningRate 0.0022   Epoch: 17   Global Step: 211290   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:04:01,439-Speed 3242.16 samples/sec   Loss 0.9972   LearningRate 0.0022   Epoch: 17   Global Step: 211300   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:04:04,586-Speed 3254.51 samples/sec   Loss 1.0896   LearningRate 0.0022   Epoch: 17   Global Step: 211310   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:04:07,737-Speed 3251.18 samples/sec   Loss 1.0100   LearningRate 0.0022   Epoch: 17   Global Step: 211320   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:04:10,808-Speed 3335.67 samples/sec   Loss 1.0365   LearningRate 0.0022   Epoch: 17   Global Step: 211330   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:04:13,964-Speed 3245.48 samples/sec   Loss 1.0561   LearningRate 0.0022   Epoch: 17   Global Step: 211340   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:04:17,080-Speed 3287.53 samples/sec   Loss 1.0005   LearningRate 0.0022   Epoch: 17   Global Step: 211350   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:04:20,202-Speed 3281.31 samples/sec   Loss 1.0199   LearningRate 0.0022   Epoch: 17   Global Step: 211360   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:04:23,316-Speed 3289.75 samples/sec   Loss 1.0507   LearningRate 0.0022   Epoch: 17   Global Step: 211370   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:04:26,523-Speed 3194.93 samples/sec   Loss 0.9960   LearningRate 0.0022   Epoch: 17   Global Step: 211380   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:04:29,640-Speed 3286.05 samples/sec   Loss 1.0265   LearningRate 0.0022   Epoch: 17   Global Step: 211390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:04:32,832-Speed 3208.71 samples/sec   Loss 1.0338   LearningRate 0.0022   Epoch: 17   Global Step: 211400   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:04:36,882-Speed 2529.41 samples/sec   Loss 1.0029   LearningRate 0.0022   Epoch: 17   Global Step: 211410   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:04:40,122-Speed 3161.60 samples/sec   Loss 1.0361   LearningRate 0.0022   Epoch: 17   Global Step: 211420   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:04:43,359-Speed 3164.40 samples/sec   Loss 1.0356   LearningRate 0.0022   Epoch: 17   Global Step: 211430   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:04:48,335-Speed 2058.37 samples/sec   Loss 1.0205   LearningRate 0.0022   Epoch: 17   Global Step: 211440   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:04:51,934-Speed 2846.25 samples/sec   Loss 1.0380   LearningRate 0.0022   Epoch: 17   Global Step: 211450   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:04:55,104-Speed 3230.83 samples/sec   Loss 1.0479   LearningRate 0.0022   Epoch: 17   Global Step: 211460   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:04:58,214-Speed 3293.73 samples/sec   Loss 1.0870   LearningRate 0.0022   Epoch: 17   Global Step: 211470   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:05:01,297-Speed 3324.38 samples/sec   Loss 1.0298   LearningRate 0.0022   Epoch: 17   Global Step: 211480   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:05:04,400-Speed 3300.73 samples/sec   Loss 1.0343   LearningRate 0.0022   Epoch: 17   Global Step: 211490   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:05:07,495-Speed 3310.54 samples/sec   Loss 1.0380   LearningRate 0.0022   Epoch: 17   Global Step: 211500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:05:10,555-Speed 3347.11 samples/sec   Loss 1.0280   LearningRate 0.0022   Epoch: 17   Global Step: 211510   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:05:13,667-Speed 3291.04 samples/sec   Loss 1.0187   LearningRate 0.0022   Epoch: 17   Global Step: 211520   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:05:16,780-Speed 3291.48 samples/sec   Loss 1.0788   LearningRate 0.0022   Epoch: 17   Global Step: 211530   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:05:19,890-Speed 3292.62 samples/sec   Loss 1.0560   LearningRate 0.0022   Epoch: 17   Global Step: 211540   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:05:22,984-Speed 3310.60 samples/sec   Loss 1.0694   LearningRate 0.0022   Epoch: 17   Global Step: 211550   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:05:26,093-Speed 3295.13 samples/sec   Loss 1.0440   LearningRate 0.0022   Epoch: 17   Global Step: 211560   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:05:29,246-Speed 3248.40 samples/sec   Loss 1.0411   LearningRate 0.0022   Epoch: 17   Global Step: 211570   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:05:32,335-Speed 3318.05 samples/sec   Loss 1.0500   LearningRate 0.0022   Epoch: 17   Global Step: 211580   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:05:35,414-Speed 3327.28 samples/sec   Loss 1.0654   LearningRate 0.0022   Epoch: 17   Global Step: 211590   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:05:38,560-Speed 3255.39 samples/sec   Loss 0.9856   LearningRate 0.0022   Epoch: 17   Global Step: 211600   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:05:41,759-Speed 3201.93 samples/sec   Loss 1.0640   LearningRate 0.0022   Epoch: 17   Global Step: 211610   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:05:44,847-Speed 3317.59 samples/sec   Loss 1.0468   LearningRate 0.0022   Epoch: 17   Global Step: 211620   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:05:47,917-Speed 3336.71 samples/sec   Loss 1.0285   LearningRate 0.0022   Epoch: 17   Global Step: 211630   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:05:51,078-Speed 3240.35 samples/sec   Loss 1.0563   LearningRate 0.0022   Epoch: 17   Global Step: 211640   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:05:54,229-Speed 3251.24 samples/sec   Loss 1.0155   LearningRate 0.0022   Epoch: 17   Global Step: 211650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:05:57,306-Speed 3327.94 samples/sec   Loss 1.0722   LearningRate 0.0022   Epoch: 17   Global Step: 211660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:06:00,406-Speed 3304.33 samples/sec   Loss 1.0437   LearningRate 0.0022   Epoch: 17   Global Step: 211670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:06:03,530-Speed 3278.81 samples/sec   Loss 1.0850   LearningRate 0.0022   Epoch: 17   Global Step: 211680   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:06:06,661-Speed 3271.93 samples/sec   Loss 1.0516   LearningRate 0.0022   Epoch: 17   Global Step: 211690   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:06:09,739-Speed 3327.31 samples/sec   Loss 1.0245   LearningRate 0.0022   Epoch: 17   Global Step: 211700   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:06:12,858-Speed 3285.06 samples/sec   Loss 1.0730   LearningRate 0.0022   Epoch: 17   Global Step: 211710   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:06:15,947-Speed 3316.27 samples/sec   Loss 1.0390   LearningRate 0.0022   Epoch: 17   Global Step: 211720   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:06:19,106-Speed 3241.99 samples/sec   Loss 0.9958   LearningRate 0.0022   Epoch: 17   Global Step: 211730   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:06:22,181-Speed 3331.28 samples/sec   Loss 1.0646   LearningRate 0.0022   Epoch: 17   Global Step: 211740   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:06:25,270-Speed 3316.16 samples/sec   Loss 1.0541   LearningRate 0.0022   Epoch: 17   Global Step: 211750   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:06:28,372-Speed 3301.45 samples/sec   Loss 1.0464   LearningRate 0.0022   Epoch: 17   Global Step: 211760   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:06:31,467-Speed 3309.18 samples/sec   Loss 1.0528   LearningRate 0.0022   Epoch: 17   Global Step: 211770   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:06:34,563-Speed 3309.71 samples/sec   Loss 1.0003   LearningRate 0.0022   Epoch: 17   Global Step: 211780   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:06:37,664-Speed 3302.76 samples/sec   Loss 0.9901   LearningRate 0.0022   Epoch: 17   Global Step: 211790   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:06:40,851-Speed 3213.95 samples/sec   Loss 1.0780   LearningRate 0.0022   Epoch: 17   Global Step: 211800   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:06:43,972-Speed 3282.36 samples/sec   Loss 1.0066   LearningRate 0.0022   Epoch: 17   Global Step: 211810   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:06:47,081-Speed 3294.83 samples/sec   Loss 1.0903   LearningRate 0.0022   Epoch: 17   Global Step: 211820   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:06:50,203-Speed 3281.07 samples/sec   Loss 1.0471   LearningRate 0.0022   Epoch: 17   Global Step: 211830   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:06:53,362-Speed 3242.09 samples/sec   Loss 1.0509   LearningRate 0.0022   Epoch: 17   Global Step: 211840   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:06:56,465-Speed 3301.09 samples/sec   Loss 1.0484   LearningRate 0.0022   Epoch: 17   Global Step: 211850   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:06:59,581-Speed 3287.16 samples/sec   Loss 1.0147   LearningRate 0.0022   Epoch: 17   Global Step: 211860   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:07:02,687-Speed 3298.56 samples/sec   Loss 1.0619   LearningRate 0.0022   Epoch: 17   Global Step: 211870   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:07:05,809-Speed 3280.31 samples/sec   Loss 1.0667   LearningRate 0.0022   Epoch: 17   Global Step: 211880   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:07:08,923-Speed 3289.04 samples/sec   Loss 1.0567   LearningRate 0.0022   Epoch: 17   Global Step: 211890   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:07:12,001-Speed 3328.42 samples/sec   Loss 1.0393   LearningRate 0.0022   Epoch: 17   Global Step: 211900   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:07:15,072-Speed 3335.88 samples/sec   Loss 1.0729   LearningRate 0.0022   Epoch: 17   Global Step: 211910   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:07:18,192-Speed 3282.63 samples/sec   Loss 1.0200   LearningRate 0.0022   Epoch: 17   Global Step: 211920   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:07:21,262-Speed 3337.01 samples/sec   Loss 1.0520   LearningRate 0.0022   Epoch: 17   Global Step: 211930   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:07:24,351-Speed 3315.91 samples/sec   Loss 1.0278   LearningRate 0.0022   Epoch: 17   Global Step: 211940   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:07:27,480-Speed 3273.52 samples/sec   Loss 1.0653   LearningRate 0.0022   Epoch: 17   Global Step: 211950   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:07:30,533-Speed 3356.02 samples/sec   Loss 1.0203   LearningRate 0.0022   Epoch: 17   Global Step: 211960   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:07:33,597-Speed 3342.63 samples/sec   Loss 1.0340   LearningRate 0.0022   Epoch: 17   Global Step: 211970   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:07:36,702-Speed 3299.34 samples/sec   Loss 1.0717   LearningRate 0.0022   Epoch: 17   Global Step: 211980   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:07:39,837-Speed 3267.41 samples/sec   Loss 0.9902   LearningRate 0.0022   Epoch: 17   Global Step: 211990   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:07:42,931-Speed 3310.65 samples/sec   Loss 1.0264   LearningRate 0.0021   Epoch: 17   Global Step: 212000   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:07:45,997-Speed 3340.36 samples/sec   Loss 1.0668   LearningRate 0.0021   Epoch: 17   Global Step: 212010   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:07:49,135-Speed 3264.56 samples/sec   Loss 1.0264   LearningRate 0.0021   Epoch: 17   Global Step: 212020   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:07:52,223-Speed 3317.28 samples/sec   Loss 1.0135   LearningRate 0.0021   Epoch: 17   Global Step: 212030   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:07:55,354-Speed 3270.96 samples/sec   Loss 1.0087   LearningRate 0.0021   Epoch: 17   Global Step: 212040   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:07:58,459-Speed 3299.45 samples/sec   Loss 1.0643   LearningRate 0.0021   Epoch: 17   Global Step: 212050   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:08:01,597-Speed 3263.90 samples/sec   Loss 1.0672   LearningRate 0.0021   Epoch: 17   Global Step: 212060   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:08:04,797-Speed 3201.20 samples/sec   Loss 1.0739   LearningRate 0.0021   Epoch: 17   Global Step: 212070   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:08:07,864-Speed 3339.75 samples/sec   Loss 1.0369   LearningRate 0.0021   Epoch: 17   Global Step: 212080   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-04-27 20:08:10,949-Speed 3320.07 samples/sec   Loss 1.0525   LearningRate 0.0021   Epoch: 17   Global Step: 212090   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:08:14,084-Speed 3267.28 samples/sec   Loss 1.0797   LearningRate 0.0021   Epoch: 17   Global Step: 212100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:08:17,239-Speed 3246.43 samples/sec   Loss 1.0694   LearningRate 0.0021   Epoch: 17   Global Step: 212110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:08:20,311-Speed 3334.86 samples/sec   Loss 1.0372   LearningRate 0.0021   Epoch: 17   Global Step: 212120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:08:23,413-Speed 3302.82 samples/sec   Loss 1.0434   LearningRate 0.0021   Epoch: 17   Global Step: 212130   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:08:26,498-Speed 3320.11 samples/sec   Loss 0.9973   LearningRate 0.0021   Epoch: 17   Global Step: 212140   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:08:29,647-Speed 3252.77 samples/sec   Loss 1.0374   LearningRate 0.0021   Epoch: 17   Global Step: 212150   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:08:32,716-Speed 3337.08 samples/sec   Loss 1.0923   LearningRate 0.0021   Epoch: 17   Global Step: 212160   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:08:35,835-Speed 3285.01 samples/sec   Loss 1.0472   LearningRate 0.0021   Epoch: 17   Global Step: 212170   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:08:38,935-Speed 3303.77 samples/sec   Loss 1.0974   LearningRate 0.0021   Epoch: 17   Global Step: 212180   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:08:42,021-Speed 3319.68 samples/sec   Loss 1.0416   LearningRate 0.0021   Epoch: 17   Global Step: 212190   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:08:45,097-Speed 3329.62 samples/sec   Loss 1.0771   LearningRate 0.0021   Epoch: 17   Global Step: 212200   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:08:48,228-Speed 3271.20 samples/sec   Loss 1.0243   LearningRate 0.0021   Epoch: 17   Global Step: 212210   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:08:51,333-Speed 3299.58 samples/sec   Loss 1.0014   LearningRate 0.0021   Epoch: 17   Global Step: 212220   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:08:54,433-Speed 3304.43 samples/sec   Loss 1.0524   LearningRate 0.0021   Epoch: 17   Global Step: 212230   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:08:57,558-Speed 3277.79 samples/sec   Loss 1.0132   LearningRate 0.0021   Epoch: 17   Global Step: 212240   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:09:00,635-Speed 3328.83 samples/sec   Loss 1.0724   LearningRate 0.0021   Epoch: 17   Global Step: 212250   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:09:03,828-Speed 3207.76 samples/sec   Loss 1.0793   LearningRate 0.0021   Epoch: 17   Global Step: 212260   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:09:07,067-Speed 3163.04 samples/sec   Loss 1.0089   LearningRate 0.0021   Epoch: 17   Global Step: 212270   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:09:10,118-Speed 3357.16 samples/sec   Loss 1.0553   LearningRate 0.0021   Epoch: 17   Global Step: 212280   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:09:13,165-Speed 3361.47 samples/sec   Loss 1.0279   LearningRate 0.0021   Epoch: 17   Global Step: 212290   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:09:16,248-Speed 3323.13 samples/sec   Loss 1.0818   LearningRate 0.0021   Epoch: 17   Global Step: 212300   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:09:19,352-Speed 3299.81 samples/sec   Loss 1.0595   LearningRate 0.0021   Epoch: 17   Global Step: 212310   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:09:22,449-Speed 3307.12 samples/sec   Loss 1.1018   LearningRate 0.0021   Epoch: 17   Global Step: 212320   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:09:25,542-Speed 3311.87 samples/sec   Loss 1.0890   LearningRate 0.0021   Epoch: 17   Global Step: 212330   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:09:28,690-Speed 3254.87 samples/sec   Loss 1.0460   LearningRate 0.0021   Epoch: 17   Global Step: 212340   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:09:31,828-Speed 3263.68 samples/sec   Loss 1.0695   LearningRate 0.0021   Epoch: 17   Global Step: 212350   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:09:34,916-Speed 3316.81 samples/sec   Loss 1.0683   LearningRate 0.0021   Epoch: 17   Global Step: 212360   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:09:38,072-Speed 3246.49 samples/sec   Loss 1.0865   LearningRate 0.0021   Epoch: 17   Global Step: 212370   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:09:41,184-Speed 3291.19 samples/sec   Loss 1.0196   LearningRate 0.0021   Epoch: 17   Global Step: 212380   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:09:44,357-Speed 3228.51 samples/sec   Loss 1.0694   LearningRate 0.0021   Epoch: 17   Global Step: 212390   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:09:47,493-Speed 3265.65 samples/sec   Loss 1.0133   LearningRate 0.0021   Epoch: 17   Global Step: 212400   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:09:50,664-Speed 3230.47 samples/sec   Loss 1.0643   LearningRate 0.0021   Epoch: 17   Global Step: 212410   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:09:53,751-Speed 3318.15 samples/sec   Loss 1.0617   LearningRate 0.0021   Epoch: 17   Global Step: 212420   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:09:56,819-Speed 3338.91 samples/sec   Loss 1.0521   LearningRate 0.0021   Epoch: 17   Global Step: 212430   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:09:59,890-Speed 3335.61 samples/sec   Loss 1.0541   LearningRate 0.0021   Epoch: 17   Global Step: 212440   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:10:02,992-Speed 3302.78 samples/sec   Loss 1.0094   LearningRate 0.0021   Epoch: 17   Global Step: 212450   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:10:06,097-Speed 3298.65 samples/sec   Loss 1.0882   LearningRate 0.0021   Epoch: 17   Global Step: 212460   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:10:09,192-Speed 3309.91 samples/sec   Loss 1.0551   LearningRate 0.0021   Epoch: 17   Global Step: 212470   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:10:12,334-Speed 3259.77 samples/sec   Loss 1.0380   LearningRate 0.0021   Epoch: 17   Global Step: 212480   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:10:15,453-Speed 3283.84 samples/sec   Loss 1.0318   LearningRate 0.0021   Epoch: 17   Global Step: 212490   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:10:18,561-Speed 3295.89 samples/sec   Loss 1.0533   LearningRate 0.0021   Epoch: 17   Global Step: 212500   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:10:21,608-Speed 3361.50 samples/sec   Loss 1.0980   LearningRate 0.0021   Epoch: 17   Global Step: 212510   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:10:24,727-Speed 3284.74 samples/sec   Loss 1.0428   LearningRate 0.0021   Epoch: 17   Global Step: 212520   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:10:27,829-Speed 3302.23 samples/sec   Loss 1.0422   LearningRate 0.0021   Epoch: 17   Global Step: 212530   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:10:30,891-Speed 3345.28 samples/sec   Loss 1.0484   LearningRate 0.0021   Epoch: 17   Global Step: 212540   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:10:33,995-Speed 3300.42 samples/sec   Loss 0.9991   LearningRate 0.0021   Epoch: 17   Global Step: 212550   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:10:37,083-Speed 3317.23 samples/sec   Loss 1.0598   LearningRate 0.0021   Epoch: 17   Global Step: 212560   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:10:40,220-Speed 3264.80 samples/sec   Loss 1.1023   LearningRate 0.0021   Epoch: 17   Global Step: 212570   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:10:43,305-Speed 3320.78 samples/sec   Loss 1.1062   LearningRate 0.0021   Epoch: 17   Global Step: 212580   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:10:46,402-Speed 3307.69 samples/sec   Loss 1.0357   LearningRate 0.0021   Epoch: 17   Global Step: 212590   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:10:49,516-Speed 3288.93 samples/sec   Loss 1.0994   LearningRate 0.0021   Epoch: 17   Global Step: 212600   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:10:52,617-Speed 3302.72 samples/sec   Loss 1.0597   LearningRate 0.0021   Epoch: 17   Global Step: 212610   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:10:55,746-Speed 3274.63 samples/sec   Loss 1.0336   LearningRate 0.0021   Epoch: 17   Global Step: 212620   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:10:58,849-Speed 3301.42 samples/sec   Loss 1.0700   LearningRate 0.0021   Epoch: 17   Global Step: 212630   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:11:01,927-Speed 3327.34 samples/sec   Loss 1.0205   LearningRate 0.0021   Epoch: 17   Global Step: 212640   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:11:05,047-Speed 3283.72 samples/sec   Loss 1.0390   LearningRate 0.0021   Epoch: 17   Global Step: 212650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:11:08,113-Speed 3340.19 samples/sec   Loss 1.0222   LearningRate 0.0021   Epoch: 17   Global Step: 212660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:11:11,175-Speed 3345.86 samples/sec   Loss 1.0344   LearningRate 0.0021   Epoch: 17   Global Step: 212670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:11:14,230-Speed 3352.70 samples/sec   Loss 1.0425   LearningRate 0.0021   Epoch: 17   Global Step: 212680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:11:17,356-Speed 3276.67 samples/sec   Loss 1.0586   LearningRate 0.0021   Epoch: 17   Global Step: 212690   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:11:20,457-Speed 3303.41 samples/sec   Loss 1.0480   LearningRate 0.0021   Epoch: 17   Global Step: 212700   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:11:23,619-Speed 3239.07 samples/sec   Loss 1.0358   LearningRate 0.0021   Epoch: 17   Global Step: 212710   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:11:26,739-Speed 3282.90 samples/sec   Loss 1.0141   LearningRate 0.0021   Epoch: 17   Global Step: 212720   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:11:29,843-Speed 3300.69 samples/sec   Loss 1.0667   LearningRate 0.0021   Epoch: 17   Global Step: 212730   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:11:32,952-Speed 3294.27 samples/sec   Loss 1.0730   LearningRate 0.0021   Epoch: 17   Global Step: 212740   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:11:36,067-Speed 3288.36 samples/sec   Loss 1.0764   LearningRate 0.0021   Epoch: 17   Global Step: 212750   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:11:39,191-Speed 3279.55 samples/sec   Loss 1.0540   LearningRate 0.0021   Epoch: 17   Global Step: 212760   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:11:42,301-Speed 3294.24 samples/sec   Loss 1.0636   LearningRate 0.0021   Epoch: 17   Global Step: 212770   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:11:45,384-Speed 3322.64 samples/sec   Loss 1.0768   LearningRate 0.0021   Epoch: 17   Global Step: 212780   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:11:48,476-Speed 3312.39 samples/sec   Loss 1.0683   LearningRate 0.0021   Epoch: 17   Global Step: 212790   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:11:51,607-Speed 3271.78 samples/sec   Loss 1.0623   LearningRate 0.0021   Epoch: 17   Global Step: 212800   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:11:54,697-Speed 3315.27 samples/sec   Loss 1.0105   LearningRate 0.0021   Epoch: 17   Global Step: 212810   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:11:57,756-Speed 3348.85 samples/sec   Loss 1.0426   LearningRate 0.0021   Epoch: 17   Global Step: 212820   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:12:00,925-Speed 3231.69 samples/sec   Loss 1.0843   LearningRate 0.0021   Epoch: 17   Global Step: 212830   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:12:04,088-Speed 3238.36 samples/sec   Loss 1.0343   LearningRate 0.0021   Epoch: 17   Global Step: 212840   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:12:07,241-Speed 3248.73 samples/sec   Loss 1.0515   LearningRate 0.0021   Epoch: 17   Global Step: 212850   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:12:10,365-Speed 3279.72 samples/sec   Loss 1.0449   LearningRate 0.0020   Epoch: 17   Global Step: 212860   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:12:13,576-Speed 3189.89 samples/sec   Loss 1.0774   LearningRate 0.0020   Epoch: 17   Global Step: 212870   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:12:16,740-Speed 3237.14 samples/sec   Loss 1.0783   LearningRate 0.0020   Epoch: 17   Global Step: 212880   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:12:19,880-Speed 3262.53 samples/sec   Loss 1.0091   LearningRate 0.0020   Epoch: 17   Global Step: 212890   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:12:23,026-Speed 3255.33 samples/sec   Loss 1.0953   LearningRate 0.0020   Epoch: 17   Global Step: 212900   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:12:26,203-Speed 3224.92 samples/sec   Loss 1.0325   LearningRate 0.0020   Epoch: 17   Global Step: 212910   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:12:29,425-Speed 3178.28 samples/sec   Loss 1.0757   LearningRate 0.0020   Epoch: 17   Global Step: 212920   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:12:32,541-Speed 3287.40 samples/sec   Loss 1.0492   LearningRate 0.0020   Epoch: 17   Global Step: 212930   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:12:35,759-Speed 3183.74 samples/sec   Loss 1.0643   LearningRate 0.0020   Epoch: 17   Global Step: 212940   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:12:38,924-Speed 3236.58 samples/sec   Loss 1.0967   LearningRate 0.0020   Epoch: 17   Global Step: 212950   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:12:42,079-Speed 3246.04 samples/sec   Loss 1.0255   LearningRate 0.0020   Epoch: 17   Global Step: 212960   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:12:45,169-Speed 3315.11 samples/sec   Loss 1.0684   LearningRate 0.0020   Epoch: 17   Global Step: 212970   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:12:48,404-Speed 3166.40 samples/sec   Loss 1.0354   LearningRate 0.0020   Epoch: 17   Global Step: 212980   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:12:51,538-Speed 3268.88 samples/sec   Loss 1.0546   LearningRate 0.0020   Epoch: 17   Global Step: 212990   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:12:54,726-Speed 3212.79 samples/sec   Loss 1.0452   LearningRate 0.0020   Epoch: 17   Global Step: 213000   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:12:57,853-Speed 3275.89 samples/sec   Loss 1.0229   LearningRate 0.0020   Epoch: 17   Global Step: 213010   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:13:00,935-Speed 3322.74 samples/sec   Loss 1.0357   LearningRate 0.0020   Epoch: 17   Global Step: 213020   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:13:04,085-Speed 3253.01 samples/sec   Loss 1.0513   LearningRate 0.0020   Epoch: 17   Global Step: 213030   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:13:07,206-Speed 3281.71 samples/sec   Loss 1.0432   LearningRate 0.0020   Epoch: 17   Global Step: 213040   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:13:10,319-Speed 3290.10 samples/sec   Loss 1.0652   LearningRate 0.0020   Epoch: 17   Global Step: 213050   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:13:13,427-Speed 3296.41 samples/sec   Loss 1.0542   LearningRate 0.0020   Epoch: 17   Global Step: 213060   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:13:16,605-Speed 3222.55 samples/sec   Loss 1.0392   LearningRate 0.0020   Epoch: 17   Global Step: 213070   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:13:19,704-Speed 3305.93 samples/sec   Loss 1.0824   LearningRate 0.0020   Epoch: 17   Global Step: 213080   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:13:22,790-Speed 3319.52 samples/sec   Loss 1.0907   LearningRate 0.0020   Epoch: 17   Global Step: 213090   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:13:25,912-Speed 3281.05 samples/sec   Loss 1.0101   LearningRate 0.0020   Epoch: 17   Global Step: 213100   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:13:29,071-Speed 3242.80 samples/sec   Loss 1.0675   LearningRate 0.0020   Epoch: 17   Global Step: 213110   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:13:32,197-Speed 3276.52 samples/sec   Loss 1.0463   LearningRate 0.0020   Epoch: 17   Global Step: 213120   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:13:35,305-Speed 3295.70 samples/sec   Loss 1.0622   LearningRate 0.0020   Epoch: 17   Global Step: 213130   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:13:38,449-Speed 3258.56 samples/sec   Loss 1.0168   LearningRate 0.0020   Epoch: 17   Global Step: 213140   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:13:41,578-Speed 3273.12 samples/sec   Loss 1.0616   LearningRate 0.0020   Epoch: 17   Global Step: 213150   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:13:44,678-Speed 3304.80 samples/sec   Loss 1.0800   LearningRate 0.0020   Epoch: 17   Global Step: 213160   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:13:47,763-Speed 3319.98 samples/sec   Loss 1.0422   LearningRate 0.0020   Epoch: 17   Global Step: 213170   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:13:50,985-Speed 3179.37 samples/sec   Loss 1.0447   LearningRate 0.0020   Epoch: 17   Global Step: 213180   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:13:54,166-Speed 3220.07 samples/sec   Loss 1.0435   LearningRate 0.0020   Epoch: 17   Global Step: 213190   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:13:57,228-Speed 3345.53 samples/sec   Loss 1.0612   LearningRate 0.0020   Epoch: 17   Global Step: 213200   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:14:00,304-Speed 3330.12 samples/sec   Loss 1.0954   LearningRate 0.0020   Epoch: 17   Global Step: 213210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:14:03,393-Speed 3315.64 samples/sec   Loss 1.0890   LearningRate 0.0020   Epoch: 17   Global Step: 213220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:14:06,526-Speed 3269.09 samples/sec   Loss 1.0574   LearningRate 0.0020   Epoch: 17   Global Step: 213230   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:14:09,587-Speed 3346.37 samples/sec   Loss 1.0914   LearningRate 0.0020   Epoch: 17   Global Step: 213240   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:14:12,670-Speed 3323.45 samples/sec   Loss 1.0293   LearningRate 0.0020   Epoch: 17   Global Step: 213250   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:14:15,780-Speed 3293.27 samples/sec   Loss 1.0630   LearningRate 0.0020   Epoch: 17   Global Step: 213260   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:14:18,902-Speed 3280.84 samples/sec   Loss 1.0217   LearningRate 0.0020   Epoch: 17   Global Step: 213270   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:14:21,969-Speed 3339.92 samples/sec   Loss 1.0218   LearningRate 0.0020   Epoch: 17   Global Step: 213280   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:14:25,082-Speed 3291.24 samples/sec   Loss 1.0584   LearningRate 0.0020   Epoch: 17   Global Step: 213290   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:14:28,142-Speed 3347.30 samples/sec   Loss 1.0515   LearningRate 0.0020   Epoch: 17   Global Step: 213300   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:14:31,307-Speed 3235.83 samples/sec   Loss 1.0605   LearningRate 0.0020   Epoch: 17   Global Step: 213310   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:14:34,410-Speed 3301.89 samples/sec   Loss 1.0411   LearningRate 0.0020   Epoch: 17   Global Step: 213320   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:14:37,618-Speed 3192.18 samples/sec   Loss 1.0949   LearningRate 0.0020   Epoch: 17   Global Step: 213330   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:14:40,724-Speed 3298.09 samples/sec   Loss 1.0317   LearningRate 0.0020   Epoch: 17   Global Step: 213340   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:14:43,900-Speed 3225.31 samples/sec   Loss 1.0435   LearningRate 0.0020   Epoch: 17   Global Step: 213350   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:14:46,977-Speed 3329.28 samples/sec   Loss 1.0565   LearningRate 0.0020   Epoch: 17   Global Step: 213360   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:14:50,104-Speed 3276.18 samples/sec   Loss 1.0264   LearningRate 0.0020   Epoch: 17   Global Step: 213370   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:14:53,237-Speed 3269.27 samples/sec   Loss 1.0356   LearningRate 0.0020   Epoch: 17   Global Step: 213380   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:14:56,368-Speed 3271.52 samples/sec   Loss 1.0842   LearningRate 0.0020   Epoch: 17   Global Step: 213390   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:14:59,451-Speed 3322.63 samples/sec   Loss 1.0780   LearningRate 0.0020   Epoch: 17   Global Step: 213400   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:15:02,572-Speed 3281.61 samples/sec   Loss 1.0255   LearningRate 0.0020   Epoch: 17   Global Step: 213410   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:15:05,647-Speed 3331.65 samples/sec   Loss 1.0529   LearningRate 0.0020   Epoch: 17   Global Step: 213420   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:15:08,715-Speed 3338.54 samples/sec   Loss 1.0380   LearningRate 0.0020   Epoch: 17   Global Step: 213430   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:15:11,805-Speed 3315.31 samples/sec   Loss 1.0327   LearningRate 0.0020   Epoch: 17   Global Step: 213440   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:15:14,917-Speed 3291.28 samples/sec   Loss 1.0339   LearningRate 0.0020   Epoch: 17   Global Step: 213450   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:15:18,102-Speed 3216.48 samples/sec   Loss 1.0250   LearningRate 0.0020   Epoch: 17   Global Step: 213460   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:15:21,152-Speed 3358.05 samples/sec   Loss 1.0335   LearningRate 0.0020   Epoch: 17   Global Step: 213470   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:15:24,224-Speed 3334.71 samples/sec   Loss 1.0260   LearningRate 0.0020   Epoch: 17   Global Step: 213480   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:15:27,317-Speed 3312.14 samples/sec   Loss 1.0848   LearningRate 0.0020   Epoch: 17   Global Step: 213490   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:15:30,400-Speed 3321.93 samples/sec   Loss 1.0266   LearningRate 0.0020   Epoch: 17   Global Step: 213500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:15:33,486-Speed 3318.93 samples/sec   Loss 1.0789   LearningRate 0.0020   Epoch: 17   Global Step: 213510   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:15:36,644-Speed 3243.75 samples/sec   Loss 1.0711   LearningRate 0.0020   Epoch: 17   Global Step: 213520   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:15:39,801-Speed 3244.49 samples/sec   Loss 1.0332   LearningRate 0.0020   Epoch: 17   Global Step: 213530   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:15:42,890-Speed 3316.20 samples/sec   Loss 1.0739   LearningRate 0.0020   Epoch: 17   Global Step: 213540   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:15:45,974-Speed 3322.05 samples/sec   Loss 1.0861   LearningRate 0.0020   Epoch: 17   Global Step: 213550   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:15:49,042-Speed 3337.73 samples/sec   Loss 1.0740   LearningRate 0.0020   Epoch: 17   Global Step: 213560   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:15:52,209-Speed 3235.12 samples/sec   Loss 1.0424   LearningRate 0.0020   Epoch: 17   Global Step: 213570   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:15:55,342-Speed 3269.43 samples/sec   Loss 1.0331   LearningRate 0.0020   Epoch: 17   Global Step: 213580   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:15:58,395-Speed 3355.29 samples/sec   Loss 1.0461   LearningRate 0.0020   Epoch: 17   Global Step: 213590   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:16:01,501-Speed 3297.98 samples/sec   Loss 1.0930   LearningRate 0.0020   Epoch: 17   Global Step: 213600   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:16:04,651-Speed 3251.37 samples/sec   Loss 1.0702   LearningRate 0.0020   Epoch: 17   Global Step: 213610   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:16:07,799-Speed 3253.92 samples/sec   Loss 1.0919   LearningRate 0.0020   Epoch: 17   Global Step: 213620   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:16:10,883-Speed 3321.32 samples/sec   Loss 1.0490   LearningRate 0.0020   Epoch: 17   Global Step: 213630   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:16:14,045-Speed 3239.91 samples/sec   Loss 1.0480   LearningRate 0.0020   Epoch: 17   Global Step: 213640   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:16:17,142-Speed 3307.74 samples/sec   Loss 1.0679   LearningRate 0.0020   Epoch: 17   Global Step: 213650   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:16:20,202-Speed 3347.69 samples/sec   Loss 1.0600   LearningRate 0.0020   Epoch: 17   Global Step: 213660   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:16:23,268-Speed 3340.89 samples/sec   Loss 1.0882   LearningRate 0.0020   Epoch: 17   Global Step: 213670   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:16:26,360-Speed 3312.53 samples/sec   Loss 1.0850   LearningRate 0.0020   Epoch: 17   Global Step: 213680   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:16:29,469-Speed 3294.77 samples/sec   Loss 1.0966   LearningRate 0.0020   Epoch: 17   Global Step: 213690   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:16:32,547-Speed 3327.92 samples/sec   Loss 1.0447   LearningRate 0.0020   Epoch: 17   Global Step: 213700   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:16:35,650-Speed 3300.43 samples/sec   Loss 0.9988   LearningRate 0.0020   Epoch: 17   Global Step: 213710   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:16:38,810-Speed 3242.08 samples/sec   Loss 1.0814   LearningRate 0.0020   Epoch: 17   Global Step: 213720   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:16:41,901-Speed 3314.31 samples/sec   Loss 1.0547   LearningRate 0.0020   Epoch: 17   Global Step: 213730   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:16:44,954-Speed 3355.17 samples/sec   Loss 1.0627   LearningRate 0.0019   Epoch: 17   Global Step: 213740   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:16:48,042-Speed 3317.06 samples/sec   Loss 1.0360   LearningRate 0.0019   Epoch: 17   Global Step: 213750   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:16:51,158-Speed 3287.09 samples/sec   Loss 1.0500   LearningRate 0.0019   Epoch: 17   Global Step: 213760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:16:54,304-Speed 3255.68 samples/sec   Loss 1.0807   LearningRate 0.0019   Epoch: 17   Global Step: 213770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:16:57,385-Speed 3326.65 samples/sec   Loss 1.0395   LearningRate 0.0019   Epoch: 17   Global Step: 213780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:17:00,428-Speed 3366.38 samples/sec   Loss 1.0551   LearningRate 0.0019   Epoch: 17   Global Step: 213790   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:17:03,557-Speed 3273.02 samples/sec   Loss 1.0023   LearningRate 0.0019   Epoch: 17   Global Step: 213800   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:17:06,656-Speed 3305.88 samples/sec   Loss 1.0403   LearningRate 0.0019   Epoch: 17   Global Step: 213810   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:17:09,724-Speed 3338.86 samples/sec   Loss 1.0911   LearningRate 0.0019   Epoch: 17   Global Step: 213820   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:17:12,844-Speed 3283.58 samples/sec   Loss 1.0496   LearningRate 0.0019   Epoch: 17   Global Step: 213830   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:17:15,978-Speed 3267.42 samples/sec   Loss 1.0739   LearningRate 0.0019   Epoch: 17   Global Step: 213840   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:17:19,064-Speed 3319.83 samples/sec   Loss 1.1080   LearningRate 0.0019   Epoch: 17   Global Step: 213850   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:17:22,155-Speed 3314.35 samples/sec   Loss 1.0947   LearningRate 0.0019   Epoch: 17   Global Step: 213860   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:17:25,271-Speed 3287.62 samples/sec   Loss 1.0358   LearningRate 0.0019   Epoch: 17   Global Step: 213870   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:17:28,432-Speed 3239.90 samples/sec   Loss 1.0453   LearningRate 0.0019   Epoch: 17   Global Step: 213880   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:17:31,469-Speed 3373.71 samples/sec   Loss 1.0245   LearningRate 0.0019   Epoch: 17   Global Step: 213890   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:17:34,570-Speed 3303.57 samples/sec   Loss 1.0935   LearningRate 0.0019   Epoch: 17   Global Step: 213900   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:17:37,765-Speed 3205.81 samples/sec   Loss 1.0450   LearningRate 0.0019   Epoch: 17   Global Step: 213910   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:17:40,854-Speed 3315.74 samples/sec   Loss 1.0943   LearningRate 0.0019   Epoch: 17   Global Step: 213920   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:17:43,947-Speed 3311.39 samples/sec   Loss 1.0629   LearningRate 0.0019   Epoch: 17   Global Step: 213930   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:17:47,002-Speed 3353.78 samples/sec   Loss 1.0607   LearningRate 0.0019   Epoch: 17   Global Step: 213940   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:17:50,106-Speed 3299.84 samples/sec   Loss 1.0631   LearningRate 0.0019   Epoch: 17   Global Step: 213950   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:17:53,236-Speed 3272.41 samples/sec   Loss 1.0742   LearningRate 0.0019   Epoch: 17   Global Step: 213960   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:17:56,322-Speed 3320.01 samples/sec   Loss 1.0592   LearningRate 0.0019   Epoch: 17   Global Step: 213970   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:17:59,439-Speed 3286.19 samples/sec   Loss 1.0730   LearningRate 0.0019   Epoch: 17   Global Step: 213980   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:18:02,530-Speed 3313.27 samples/sec   Loss 1.0519   LearningRate 0.0019   Epoch: 17   Global Step: 213990   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:18:05,615-Speed 3320.97 samples/sec   Loss 1.0976   LearningRate 0.0019   Epoch: 17   Global Step: 214000   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:18:08,674-Speed 3348.54 samples/sec   Loss 1.0422   LearningRate 0.0019   Epoch: 17   Global Step: 214010   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:18:11,816-Speed 3260.40 samples/sec   Loss 1.0660   LearningRate 0.0019   Epoch: 17   Global Step: 214020   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:18:14,975-Speed 3242.37 samples/sec   Loss 1.0670   LearningRate 0.0019   Epoch: 17   Global Step: 214030   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:18:18,109-Speed 3268.07 samples/sec   Loss 1.0657   LearningRate 0.0019   Epoch: 17   Global Step: 214040   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:18:21,231-Speed 3281.95 samples/sec   Loss 1.0637   LearningRate 0.0019   Epoch: 17   Global Step: 214050   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:18:24,363-Speed 3269.40 samples/sec   Loss 1.0294   LearningRate 0.0019   Epoch: 17   Global Step: 214060   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:18:27,458-Speed 3310.62 samples/sec   Loss 1.0382   LearningRate 0.0019   Epoch: 17   Global Step: 214070   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:18:30,542-Speed 3321.45 samples/sec   Loss 1.0579   LearningRate 0.0019   Epoch: 17   Global Step: 214080   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:18:33,636-Speed 3309.67 samples/sec   Loss 1.0074   LearningRate 0.0019   Epoch: 17   Global Step: 214090   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:18:36,714-Speed 3328.74 samples/sec   Loss 1.0615   LearningRate 0.0019   Epoch: 17   Global Step: 214100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:18:39,868-Speed 3247.23 samples/sec   Loss 1.0892   LearningRate 0.0019   Epoch: 17   Global Step: 214110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:18:42,965-Speed 3308.00 samples/sec   Loss 1.0602   LearningRate 0.0019   Epoch: 17   Global Step: 214120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:18:46,034-Speed 3337.99 samples/sec   Loss 1.0876   LearningRate 0.0019   Epoch: 17   Global Step: 214130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:18:49,134-Speed 3304.39 samples/sec   Loss 1.0688   LearningRate 0.0019   Epoch: 17   Global Step: 214140   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:18:52,225-Speed 3313.79 samples/sec   Loss 1.0489   LearningRate 0.0019   Epoch: 17   Global Step: 214150   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:18:55,335-Speed 3293.46 samples/sec   Loss 1.0095   LearningRate 0.0019   Epoch: 17   Global Step: 214160   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:18:58,400-Speed 3341.65 samples/sec   Loss 1.1243   LearningRate 0.0019   Epoch: 17   Global Step: 214170   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:19:01,540-Speed 3262.05 samples/sec   Loss 1.0220   LearningRate 0.0019   Epoch: 17   Global Step: 214180   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:19:04,627-Speed 3319.03 samples/sec   Loss 1.0423   LearningRate 0.0019   Epoch: 17   Global Step: 214190   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:19:07,758-Speed 3271.44 samples/sec   Loss 1.0672   LearningRate 0.0019   Epoch: 17   Global Step: 214200   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:19:10,829-Speed 3335.29 samples/sec   Loss 1.0567   LearningRate 0.0019   Epoch: 17   Global Step: 214210   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:19:13,899-Speed 3336.40 samples/sec   Loss 1.0546   LearningRate 0.0019   Epoch: 17   Global Step: 214220   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:19:16,954-Speed 3352.66 samples/sec   Loss 1.0646   LearningRate 0.0019   Epoch: 17   Global Step: 214230   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:19:20,041-Speed 3318.72 samples/sec   Loss 1.0582   LearningRate 0.0019   Epoch: 17   Global Step: 214240   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:19:23,115-Speed 3331.71 samples/sec   Loss 1.0631   LearningRate 0.0019   Epoch: 17   Global Step: 214250   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:19:26,180-Speed 3342.90 samples/sec   Loss 1.0375   LearningRate 0.0019   Epoch: 17   Global Step: 214260   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:19:29,284-Speed 3299.82 samples/sec   Loss 1.0591   LearningRate 0.0019   Epoch: 17   Global Step: 214270   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:19:32,375-Speed 3314.46 samples/sec   Loss 1.0375   LearningRate 0.0019   Epoch: 17   Global Step: 214280   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:19:35,473-Speed 3306.41 samples/sec   Loss 1.1019   LearningRate 0.0019   Epoch: 17   Global Step: 214290   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:19:38,575-Speed 3302.27 samples/sec   Loss 1.0573   LearningRate 0.0019   Epoch: 17   Global Step: 214300   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:19:41,709-Speed 3268.34 samples/sec   Loss 1.0659   LearningRate 0.0019   Epoch: 17   Global Step: 214310   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:19:44,779-Speed 3337.05 samples/sec   Loss 1.0135   LearningRate 0.0019   Epoch: 17   Global Step: 214320   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:19:47,846-Speed 3339.41 samples/sec   Loss 1.0488   LearningRate 0.0019   Epoch: 17   Global Step: 214330   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:19:50,940-Speed 3310.37 samples/sec   Loss 1.0144   LearningRate 0.0019   Epoch: 17   Global Step: 214340   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:19:54,044-Speed 3300.13 samples/sec   Loss 1.1141   LearningRate 0.0019   Epoch: 17   Global Step: 214350   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:19:57,095-Speed 3357.27 samples/sec   Loss 1.0264   LearningRate 0.0019   Epoch: 17   Global Step: 214360   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:20:00,237-Speed 3259.70 samples/sec   Loss 1.0202   LearningRate 0.0019   Epoch: 17   Global Step: 214370   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:20:03,363-Speed 3276.78 samples/sec   Loss 1.0869   LearningRate 0.0019   Epoch: 17   Global Step: 214380   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:20:06,455-Speed 3313.81 samples/sec   Loss 1.0513   LearningRate 0.0019   Epoch: 17   Global Step: 214390   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:20:09,556-Speed 3302.35 samples/sec   Loss 1.0568   LearningRate 0.0019   Epoch: 17   Global Step: 214400   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:20:12,624-Speed 3339.17 samples/sec   Loss 1.0942   LearningRate 0.0019   Epoch: 17   Global Step: 214410   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:20:15,732-Speed 3296.06 samples/sec   Loss 1.1017   LearningRate 0.0019   Epoch: 17   Global Step: 214420   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:20:18,826-Speed 3310.21 samples/sec   Loss 1.0804   LearningRate 0.0019   Epoch: 17   Global Step: 214430   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:20:21,879-Speed 3355.47 samples/sec   Loss 1.0675   LearningRate 0.0019   Epoch: 17   Global Step: 214440   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:20:25,022-Speed 3258.72 samples/sec   Loss 1.0288   LearningRate 0.0019   Epoch: 17   Global Step: 214450   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:20:28,157-Speed 3267.88 samples/sec   Loss 1.1071   LearningRate 0.0019   Epoch: 17   Global Step: 214460   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:20:31,295-Speed 3263.79 samples/sec   Loss 1.0870   LearningRate 0.0019   Epoch: 17   Global Step: 214470   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:20:34,367-Speed 3334.81 samples/sec   Loss 1.0842   LearningRate 0.0019   Epoch: 17   Global Step: 214480   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:20:37,513-Speed 3256.09 samples/sec   Loss 1.0465   LearningRate 0.0019   Epoch: 17   Global Step: 214490   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:20:40,612-Speed 3305.13 samples/sec   Loss 1.0827   LearningRate 0.0019   Epoch: 17   Global Step: 214500   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:20:43,808-Speed 3204.64 samples/sec   Loss 0.9996   LearningRate 0.0019   Epoch: 17   Global Step: 214510   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:20:46,910-Speed 3302.30 samples/sec   Loss 1.0428   LearningRate 0.0019   Epoch: 17   Global Step: 214520   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:20:50,046-Speed 3266.48 samples/sec   Loss 1.0686   LearningRate 0.0019   Epoch: 17   Global Step: 214530   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:20:53,207-Speed 3240.36 samples/sec   Loss 1.0916   LearningRate 0.0019   Epoch: 17   Global Step: 214540   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:20:56,303-Speed 3308.89 samples/sec   Loss 1.0585   LearningRate 0.0019   Epoch: 17   Global Step: 214550   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:20:59,388-Speed 3320.29 samples/sec   Loss 1.0273   LearningRate 0.0019   Epoch: 17   Global Step: 214560   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:21:02,541-Speed 3248.38 samples/sec   Loss 1.0969   LearningRate 0.0019   Epoch: 17   Global Step: 214570   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:21:05,643-Speed 3302.39 samples/sec   Loss 1.0740   LearningRate 0.0019   Epoch: 17   Global Step: 214580   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:21:08,724-Speed 3324.77 samples/sec   Loss 1.0615   LearningRate 0.0019   Epoch: 17   Global Step: 214590   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:21:11,859-Speed 3267.21 samples/sec   Loss 1.0837   LearningRate 0.0019   Epoch: 17   Global Step: 214600   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:21:14,949-Speed 3315.35 samples/sec   Loss 1.0804   LearningRate 0.0019   Epoch: 17   Global Step: 214610   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:21:18,102-Speed 3248.00 samples/sec   Loss 1.0555   LearningRate 0.0019   Epoch: 17   Global Step: 214620   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:21:21,178-Speed 3330.05 samples/sec   Loss 1.0561   LearningRate 0.0019   Epoch: 17   Global Step: 214630   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:21:24,324-Speed 3255.80 samples/sec   Loss 1.0790   LearningRate 0.0018   Epoch: 17   Global Step: 214640   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:21:27,492-Speed 3233.31 samples/sec   Loss 1.0696   LearningRate 0.0018   Epoch: 17   Global Step: 214650   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:21:30,620-Speed 3275.07 samples/sec   Loss 1.0615   LearningRate 0.0018   Epoch: 17   Global Step: 214660   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:21:33,737-Speed 3286.51 samples/sec   Loss 1.0726   LearningRate 0.0018   Epoch: 17   Global Step: 214670   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:21:36,910-Speed 3227.79 samples/sec   Loss 1.0166   LearningRate 0.0018   Epoch: 17   Global Step: 214680   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:21:40,008-Speed 3307.26 samples/sec   Loss 1.0983   LearningRate 0.0018   Epoch: 17   Global Step: 214690   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:21:43,135-Speed 3275.65 samples/sec   Loss 1.0444   LearningRate 0.0018   Epoch: 17   Global Step: 214700   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:21:46,246-Speed 3292.10 samples/sec   Loss 1.0414   LearningRate 0.0018   Epoch: 17   Global Step: 214710   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:21:49,365-Speed 3283.88 samples/sec   Loss 1.0827   LearningRate 0.0018   Epoch: 17   Global Step: 214720   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:21:52,495-Speed 3273.38 samples/sec   Loss 1.0623   LearningRate 0.0018   Epoch: 17   Global Step: 214730   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:21:56,313-Speed 2682.68 samples/sec   Loss 1.0601   LearningRate 0.0018   Epoch: 17   Global Step: 214740   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:21:59,413-Speed 3303.37 samples/sec   Loss 1.0465   LearningRate 0.0018   Epoch: 17   Global Step: 214750   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:22:02,497-Speed 3321.67 samples/sec   Loss 1.0218   LearningRate 0.0018   Epoch: 17   Global Step: 214760   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:22:05,664-Speed 3234.66 samples/sec   Loss 1.0409   LearningRate 0.0018   Epoch: 17   Global Step: 214770   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:22:08,780-Speed 3287.02 samples/sec   Loss 1.0931   LearningRate 0.0018   Epoch: 17   Global Step: 214780   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:22:11,893-Speed 3290.90 samples/sec   Loss 1.0825   LearningRate 0.0018   Epoch: 17   Global Step: 214790   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:22:15,036-Speed 3258.95 samples/sec   Loss 1.0691   LearningRate 0.0018   Epoch: 17   Global Step: 214800   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:22:18,117-Speed 3324.98 samples/sec   Loss 1.0287   LearningRate 0.0018   Epoch: 17   Global Step: 214810   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:22:21,183-Speed 3340.97 samples/sec   Loss 1.0368   LearningRate 0.0018   Epoch: 17   Global Step: 214820   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:22:24,297-Speed 3289.18 samples/sec   Loss 1.0668   LearningRate 0.0018   Epoch: 17   Global Step: 214830   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:22:27,403-Speed 3298.55 samples/sec   Loss 1.0597   LearningRate 0.0018   Epoch: 17   Global Step: 214840   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:22:30,509-Speed 3297.49 samples/sec   Loss 1.0818   LearningRate 0.0018   Epoch: 17   Global Step: 214850   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:22:33,581-Speed 3334.66 samples/sec   Loss 1.0445   LearningRate 0.0018   Epoch: 17   Global Step: 214860   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:22:36,751-Speed 3230.64 samples/sec   Loss 1.0492   LearningRate 0.0018   Epoch: 17   Global Step: 214870   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:22:39,897-Speed 3256.30 samples/sec   Loss 1.0768   LearningRate 0.0018   Epoch: 17   Global Step: 214880   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:22:43,030-Speed 3269.59 samples/sec   Loss 1.0824   LearningRate 0.0018   Epoch: 17   Global Step: 214890   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:22:46,140-Speed 3293.97 samples/sec   Loss 1.0529   LearningRate 0.0018   Epoch: 17   Global Step: 214900   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:22:49,231-Speed 3314.12 samples/sec   Loss 1.0555   LearningRate 0.0018   Epoch: 17   Global Step: 214910   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:22:52,296-Speed 3342.78 samples/sec   Loss 1.0762   LearningRate 0.0018   Epoch: 17   Global Step: 214920   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:22:55,406-Speed 3293.74 samples/sec   Loss 1.0842   LearningRate 0.0018   Epoch: 17   Global Step: 214930   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:22:58,449-Speed 3365.86 samples/sec   Loss 1.0246   LearningRate 0.0018   Epoch: 17   Global Step: 214940   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:23:01,508-Speed 3348.02 samples/sec   Loss 1.0293   LearningRate 0.0018   Epoch: 17   Global Step: 214950   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:23:04,580-Speed 3334.56 samples/sec   Loss 1.0537   LearningRate 0.0018   Epoch: 17   Global Step: 214960   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:23:07,659-Speed 3326.87 samples/sec   Loss 1.0590   LearningRate 0.0018   Epoch: 17   Global Step: 214970   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:23:10,726-Speed 3339.98 samples/sec   Loss 1.0566   LearningRate 0.0018   Epoch: 17   Global Step: 214980   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:23:13,792-Speed 3341.53 samples/sec   Loss 1.1191   LearningRate 0.0018   Epoch: 17   Global Step: 214990   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:23:16,944-Speed 3250.24 samples/sec   Loss 1.0512   LearningRate 0.0018   Epoch: 17   Global Step: 215000   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:23:20,054-Speed 3293.47 samples/sec   Loss 1.0547   LearningRate 0.0018   Epoch: 17   Global Step: 215010   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:23:23,158-Speed 3300.07 samples/sec   Loss 1.0523   LearningRate 0.0018   Epoch: 17   Global Step: 215020   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:23:26,297-Speed 3262.32 samples/sec   Loss 1.1366   LearningRate 0.0018   Epoch: 17   Global Step: 215030   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:23:29,394-Speed 3307.75 samples/sec   Loss 1.0573   LearningRate 0.0018   Epoch: 17   Global Step: 215040   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:23:32,456-Speed 3346.12 samples/sec   Loss 1.0619   LearningRate 0.0018   Epoch: 17   Global Step: 215050   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:23:35,592-Speed 3265.45 samples/sec   Loss 1.0714   LearningRate 0.0018   Epoch: 17   Global Step: 215060   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:23:38,663-Speed 3335.45 samples/sec   Loss 1.1144   LearningRate 0.0018   Epoch: 17   Global Step: 215070   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:23:41,733-Speed 3336.91 samples/sec   Loss 1.0531   LearningRate 0.0018   Epoch: 17   Global Step: 215080   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:23:44,786-Speed 3354.84 samples/sec   Loss 1.0773   LearningRate 0.0018   Epoch: 17   Global Step: 215090   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:23:47,873-Speed 3318.60 samples/sec   Loss 1.0586   LearningRate 0.0018   Epoch: 17   Global Step: 215100   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:23:51,002-Speed 3273.51 samples/sec   Loss 1.0410   LearningRate 0.0018   Epoch: 17   Global Step: 215110   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:23:54,107-Speed 3299.10 samples/sec   Loss 1.0571   LearningRate 0.0018   Epoch: 17   Global Step: 215120   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:23:57,223-Speed 3287.48 samples/sec   Loss 1.0517   LearningRate 0.0018   Epoch: 17   Global Step: 215130   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:24:00,336-Speed 3289.96 samples/sec   Loss 1.0540   LearningRate 0.0018   Epoch: 17   Global Step: 215140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:24:03,443-Speed 3297.05 samples/sec   Loss 1.0627   LearningRate 0.0018   Epoch: 17   Global Step: 215150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:24:06,521-Speed 3328.09 samples/sec   Loss 1.0481   LearningRate 0.0018   Epoch: 17   Global Step: 215160   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:24:09,595-Speed 3332.09 samples/sec   Loss 1.0497   LearningRate 0.0018   Epoch: 17   Global Step: 215170   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:24:12,676-Speed 3324.54 samples/sec   Loss 1.0902   LearningRate 0.0018   Epoch: 17   Global Step: 215180   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:24:15,770-Speed 3311.34 samples/sec   Loss 1.0524   LearningRate 0.0018   Epoch: 17   Global Step: 215190   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:24:18,890-Speed 3282.36 samples/sec   Loss 1.1048   LearningRate 0.0018   Epoch: 17   Global Step: 215200   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:24:21,946-Speed 3352.20 samples/sec   Loss 1.0635   LearningRate 0.0018   Epoch: 17   Global Step: 215210   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:24:25,016-Speed 3335.99 samples/sec   Loss 1.0725   LearningRate 0.0018   Epoch: 17   Global Step: 215220   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:24:28,089-Speed 3334.62 samples/sec   Loss 1.0574   LearningRate 0.0018   Epoch: 17   Global Step: 215230   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:24:31,164-Speed 3331.66 samples/sec   Loss 1.0613   LearningRate 0.0018   Epoch: 17   Global Step: 215240   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:24:34,222-Speed 3349.25 samples/sec   Loss 1.0889   LearningRate 0.0018   Epoch: 17   Global Step: 215250   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:24:37,333-Speed 3292.37 samples/sec   Loss 1.0869   LearningRate 0.0018   Epoch: 17   Global Step: 215260   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:24:40,474-Speed 3261.05 samples/sec   Loss 1.0340   LearningRate 0.0018   Epoch: 17   Global Step: 215270   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:24:43,604-Speed 3272.30 samples/sec   Loss 1.0904   LearningRate 0.0018   Epoch: 17   Global Step: 215280   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:24:46,712-Speed 3296.15 samples/sec   Loss 1.0650   LearningRate 0.0018   Epoch: 17   Global Step: 215290   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:24:49,855-Speed 3258.82 samples/sec   Loss 1.1098   LearningRate 0.0018   Epoch: 17   Global Step: 215300   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:24:52,925-Speed 3336.42 samples/sec   Loss 1.0509   LearningRate 0.0018   Epoch: 17   Global Step: 215310   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:24:55,982-Speed 3350.74 samples/sec   Loss 1.0529   LearningRate 0.0018   Epoch: 17   Global Step: 215320   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:24:59,068-Speed 3319.57 samples/sec   Loss 1.0211   LearningRate 0.0018   Epoch: 17   Global Step: 215330   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:25:02,148-Speed 3325.47 samples/sec   Loss 1.0585   LearningRate 0.0018   Epoch: 17   Global Step: 215340   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:25:05,249-Speed 3303.62 samples/sec   Loss 1.0617   LearningRate 0.0018   Epoch: 17   Global Step: 215350   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:25:08,334-Speed 3320.69 samples/sec   Loss 1.1104   LearningRate 0.0018   Epoch: 17   Global Step: 215360   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:25:11,390-Speed 3351.99 samples/sec   Loss 1.0464   LearningRate 0.0018   Epoch: 17   Global Step: 215370   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:25:14,469-Speed 3326.22 samples/sec   Loss 1.0635   LearningRate 0.0018   Epoch: 17   Global Step: 215380   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:25:17,531-Speed 3345.76 samples/sec   Loss 1.0606   LearningRate 0.0018   Epoch: 17   Global Step: 215390   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:25:20,609-Speed 3327.74 samples/sec   Loss 1.1033   LearningRate 0.0018   Epoch: 17   Global Step: 215400   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:25:23,668-Speed 3348.91 samples/sec   Loss 1.0824   LearningRate 0.0018   Epoch: 17   Global Step: 215410   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:25:26,806-Speed 3264.16 samples/sec   Loss 1.0566   LearningRate 0.0018   Epoch: 17   Global Step: 215420   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:25:29,869-Speed 3344.75 samples/sec   Loss 1.0748   LearningRate 0.0018   Epoch: 17   Global Step: 215430   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:25:32,973-Speed 3299.36 samples/sec   Loss 1.0834   LearningRate 0.0018   Epoch: 17   Global Step: 215440   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:25:36,168-Speed 3206.14 samples/sec   Loss 1.0411   LearningRate 0.0018   Epoch: 17   Global Step: 215450   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:25:39,299-Speed 3271.39 samples/sec   Loss 1.0243   LearningRate 0.0018   Epoch: 17   Global Step: 215460   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:25:42,397-Speed 3306.46 samples/sec   Loss 1.0174   LearningRate 0.0018   Epoch: 17   Global Step: 215470   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:25:45,486-Speed 3316.66 samples/sec   Loss 1.0594   LearningRate 0.0018   Epoch: 17   Global Step: 215480   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:25:48,569-Speed 3322.92 samples/sec   Loss 1.0936   LearningRate 0.0018   Epoch: 17   Global Step: 215490   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:25:51,649-Speed 3325.56 samples/sec   Loss 1.0340   LearningRate 0.0018   Epoch: 17   Global Step: 215500   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:25:54,765-Speed 3286.63 samples/sec   Loss 1.0939   LearningRate 0.0018   Epoch: 17   Global Step: 215510   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:25:57,813-Speed 3360.73 samples/sec   Loss 1.0999   LearningRate 0.0018   Epoch: 17   Global Step: 215520   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:26:00,896-Speed 3322.32 samples/sec   Loss 1.0743   LearningRate 0.0018   Epoch: 17   Global Step: 215530   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:26:03,991-Speed 3309.15 samples/sec   Loss 1.0373   LearningRate 0.0018   Epoch: 17   Global Step: 215540   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:26:07,134-Speed 3259.18 samples/sec   Loss 1.1084   LearningRate 0.0018   Epoch: 17   Global Step: 215550   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:26:10,204-Speed 3336.69 samples/sec   Loss 1.0707   LearningRate 0.0017   Epoch: 17   Global Step: 215560   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:26:13,300-Speed 3308.40 samples/sec   Loss 1.0805   LearningRate 0.0017   Epoch: 17   Global Step: 215570   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:26:16,469-Speed 3232.53 samples/sec   Loss 1.0595   LearningRate 0.0017   Epoch: 17   Global Step: 215580   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:26:19,556-Speed 3318.62 samples/sec   Loss 1.0541   LearningRate 0.0017   Epoch: 17   Global Step: 215590   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:26:22,687-Speed 3271.18 samples/sec   Loss 1.0555   LearningRate 0.0017   Epoch: 17   Global Step: 215600   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:26:25,755-Speed 3339.62 samples/sec   Loss 1.0672   LearningRate 0.0017   Epoch: 17   Global Step: 215610   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:26:28,856-Speed 3303.13 samples/sec   Loss 1.0650   LearningRate 0.0017   Epoch: 17   Global Step: 215620   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:26:31,943-Speed 3318.29 samples/sec   Loss 1.0408   LearningRate 0.0017   Epoch: 17   Global Step: 215630   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:26:34,996-Speed 3355.14 samples/sec   Loss 1.1016   LearningRate 0.0017   Epoch: 17   Global Step: 215640   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:26:38,047-Speed 3357.25 samples/sec   Loss 1.0939   LearningRate 0.0017   Epoch: 17   Global Step: 215650   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:26:41,185-Speed 3263.92 samples/sec   Loss 1.0477   LearningRate 0.0017   Epoch: 17   Global Step: 215660   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:26:44,274-Speed 3315.86 samples/sec   Loss 1.0842   LearningRate 0.0017   Epoch: 17   Global Step: 215670   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:26:47,429-Speed 3247.52 samples/sec   Loss 1.0798   LearningRate 0.0017   Epoch: 17   Global Step: 215680   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:26:50,500-Speed 3335.34 samples/sec   Loss 1.0151   LearningRate 0.0017   Epoch: 17   Global Step: 215690   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:26:53,585-Speed 3320.22 samples/sec   Loss 1.0544   LearningRate 0.0017   Epoch: 17   Global Step: 215700   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:26:56,702-Speed 3286.37 samples/sec   Loss 1.0955   LearningRate 0.0017   Epoch: 17   Global Step: 215710   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:26:59,860-Speed 3243.72 samples/sec   Loss 1.0878   LearningRate 0.0017   Epoch: 17   Global Step: 215720   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:27:03,029-Speed 3232.77 samples/sec   Loss 1.0684   LearningRate 0.0017   Epoch: 17   Global Step: 215730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:27:06,163-Speed 3267.99 samples/sec   Loss 1.1064   LearningRate 0.0017   Epoch: 17   Global Step: 215740   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:27:09,240-Speed 3329.59 samples/sec   Loss 1.1018   LearningRate 0.0017   Epoch: 17   Global Step: 215750   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:27:12,373-Speed 3269.22 samples/sec   Loss 1.0663   LearningRate 0.0017   Epoch: 17   Global Step: 215760   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:27:15,470-Speed 3307.29 samples/sec   Loss 1.0233   LearningRate 0.0017   Epoch: 17   Global Step: 215770   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:27:18,527-Speed 3351.29 samples/sec   Loss 1.0241   LearningRate 0.0017   Epoch: 17   Global Step: 215780   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:27:21,561-Speed 3375.23 samples/sec   Loss 1.0661   LearningRate 0.0017   Epoch: 17   Global Step: 215790   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:27:24,735-Speed 3227.51 samples/sec   Loss 1.0737   LearningRate 0.0017   Epoch: 17   Global Step: 215800   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:27:27,868-Speed 3269.37 samples/sec   Loss 1.1193   LearningRate 0.0017   Epoch: 17   Global Step: 215810   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:27:30,983-Speed 3288.90 samples/sec   Loss 1.0843   LearningRate 0.0017   Epoch: 17   Global Step: 215820   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:27:34,082-Speed 3305.26 samples/sec   Loss 1.0125   LearningRate 0.0017   Epoch: 17   Global Step: 215830   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:27:37,191-Speed 3294.56 samples/sec   Loss 1.0904   LearningRate 0.0017   Epoch: 17   Global Step: 215840   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:27:40,267-Speed 3331.06 samples/sec   Loss 1.0437   LearningRate 0.0017   Epoch: 17   Global Step: 215850   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:27:43,448-Speed 3219.57 samples/sec   Loss 1.0629   LearningRate 0.0017   Epoch: 17   Global Step: 215860   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:27:46,547-Speed 3305.74 samples/sec   Loss 1.0491   LearningRate 0.0017   Epoch: 17   Global Step: 215870   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:27:49,664-Speed 3286.18 samples/sec   Loss 1.0583   LearningRate 0.0017   Epoch: 17   Global Step: 215880   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:27:52,737-Speed 3333.66 samples/sec   Loss 1.0670   LearningRate 0.0017   Epoch: 17   Global Step: 215890   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:27:55,799-Speed 3344.90 samples/sec   Loss 1.0901   LearningRate 0.0017   Epoch: 17   Global Step: 215900   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:27:58,878-Speed 3326.61 samples/sec   Loss 1.0908   LearningRate 0.0017   Epoch: 17   Global Step: 215910   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:28:01,988-Speed 3294.51 samples/sec   Loss 1.0474   LearningRate 0.0017   Epoch: 17   Global Step: 215920   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:28:05,056-Speed 3338.96 samples/sec   Loss 1.0263   LearningRate 0.0017   Epoch: 17   Global Step: 215930   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:28:08,113-Speed 3350.04 samples/sec   Loss 1.0825   LearningRate 0.0017   Epoch: 17   Global Step: 215940   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:28:11,270-Speed 3244.89 samples/sec   Loss 1.0470   LearningRate 0.0017   Epoch: 17   Global Step: 215950   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:28:14,400-Speed 3273.10 samples/sec   Loss 1.0707   LearningRate 0.0017   Epoch: 17   Global Step: 215960   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:28:17,498-Speed 3306.74 samples/sec   Loss 1.0993   LearningRate 0.0017   Epoch: 17   Global Step: 215970   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:28:20,568-Speed 3336.34 samples/sec   Loss 1.0652   LearningRate 0.0017   Epoch: 17   Global Step: 215980   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:28:23,754-Speed 3214.27 samples/sec   Loss 1.0506   LearningRate 0.0017   Epoch: 17   Global Step: 215990   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:28:26,831-Speed 3329.29 samples/sec   Loss 1.0844   LearningRate 0.0017   Epoch: 17   Global Step: 216000   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:28:29,915-Speed 3321.93 samples/sec   Loss 1.0737   LearningRate 0.0017   Epoch: 17   Global Step: 216010   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:28:32,979-Speed 3342.98 samples/sec   Loss 1.0459   LearningRate 0.0017   Epoch: 17   Global Step: 216020   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:28:36,090-Speed 3293.35 samples/sec   Loss 1.0648   LearningRate 0.0017   Epoch: 17   Global Step: 216030   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:28:39,170-Speed 3325.59 samples/sec   Loss 1.1125   LearningRate 0.0017   Epoch: 17   Global Step: 216040   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:28:42,224-Speed 3353.06 samples/sec   Loss 1.0546   LearningRate 0.0017   Epoch: 17   Global Step: 216050   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:28:45,323-Speed 3305.67 samples/sec   Loss 1.0598   LearningRate 0.0017   Epoch: 17   Global Step: 216060   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:28:48,462-Speed 3262.83 samples/sec   Loss 1.0393   LearningRate 0.0017   Epoch: 17   Global Step: 216070   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:28:51,653-Speed 3210.30 samples/sec   Loss 1.0614   LearningRate 0.0017   Epoch: 17   Global Step: 216080   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:28:54,717-Speed 3342.92 samples/sec   Loss 1.0449   LearningRate 0.0017   Epoch: 17   Global Step: 216090   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:28:57,831-Speed 3290.28 samples/sec   Loss 1.0200   LearningRate 0.0017   Epoch: 17   Global Step: 216100   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:29:00,931-Speed 3303.42 samples/sec   Loss 1.0190   LearningRate 0.0017   Epoch: 17   Global Step: 216110   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:29:04,007-Speed 3330.07 samples/sec   Loss 1.0959   LearningRate 0.0017   Epoch: 17   Global Step: 216120   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:29:07,092-Speed 3320.89 samples/sec   Loss 1.0951   LearningRate 0.0017   Epoch: 17   Global Step: 216130   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:29:10,162-Speed 3336.45 samples/sec   Loss 1.0593   LearningRate 0.0017   Epoch: 17   Global Step: 216140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:29:13,239-Speed 3328.62 samples/sec   Loss 1.0798   LearningRate 0.0017   Epoch: 17   Global Step: 216150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:29:16,319-Speed 3325.40 samples/sec   Loss 1.0193   LearningRate 0.0017   Epoch: 17   Global Step: 216160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:29:19,395-Speed 3330.36 samples/sec   Loss 1.0689   LearningRate 0.0017   Epoch: 17   Global Step: 216170   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:29:22,478-Speed 3322.29 samples/sec   Loss 1.0769   LearningRate 0.0017   Epoch: 17   Global Step: 216180   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:29:25,652-Speed 3227.64 samples/sec   Loss 1.0586   LearningRate 0.0017   Epoch: 17   Global Step: 216190   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:29:28,805-Speed 3248.55 samples/sec   Loss 1.0574   LearningRate 0.0017   Epoch: 17   Global Step: 216200   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:29:31,893-Speed 3317.81 samples/sec   Loss 1.0304   LearningRate 0.0017   Epoch: 17   Global Step: 216210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:29:35,028-Speed 3266.64 samples/sec   Loss 1.0296   LearningRate 0.0017   Epoch: 17   Global Step: 216220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:29:38,132-Speed 3300.41 samples/sec   Loss 1.0702   LearningRate 0.0017   Epoch: 17   Global Step: 216230   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:29:41,294-Speed 3239.71 samples/sec   Loss 1.0933   LearningRate 0.0017   Epoch: 17   Global Step: 216240   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:29:44,366-Speed 3334.59 samples/sec   Loss 1.0580   LearningRate 0.0017   Epoch: 17   Global Step: 216250   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:29:47,454-Speed 3317.84 samples/sec   Loss 1.0627   LearningRate 0.0017   Epoch: 17   Global Step: 216260   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:29:50,506-Speed 3356.03 samples/sec   Loss 1.0970   LearningRate 0.0017   Epoch: 17   Global Step: 216270   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:29:53,603-Speed 3308.03 samples/sec   Loss 1.0357   LearningRate 0.0017   Epoch: 17   Global Step: 216280   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:29:56,735-Speed 3270.25 samples/sec   Loss 1.0211   LearningRate 0.0017   Epoch: 17   Global Step: 216290   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:29:59,862-Speed 3276.22 samples/sec   Loss 1.0872   LearningRate 0.0017   Epoch: 17   Global Step: 216300   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:30:03,005-Speed 3258.70 samples/sec   Loss 1.0714   LearningRate 0.0017   Epoch: 17   Global Step: 216310   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:30:06,173-Speed 3233.82 samples/sec   Loss 1.0598   LearningRate 0.0017   Epoch: 17   Global Step: 216320   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:30:09,263-Speed 3313.79 samples/sec   Loss 1.1064   LearningRate 0.0017   Epoch: 17   Global Step: 216330   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:30:12,409-Speed 3256.39 samples/sec   Loss 1.1081   LearningRate 0.0017   Epoch: 17   Global Step: 216340   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:30:15,510-Speed 3303.42 samples/sec   Loss 1.0349   LearningRate 0.0017   Epoch: 17   Global Step: 216350   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:30:18,593-Speed 3321.94 samples/sec   Loss 1.0370   LearningRate 0.0017   Epoch: 17   Global Step: 216360   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:30:21,662-Speed 3337.41 samples/sec   Loss 1.0265   LearningRate 0.0017   Epoch: 17   Global Step: 216370   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:30:24,758-Speed 3308.61 samples/sec   Loss 1.0898   LearningRate 0.0017   Epoch: 17   Global Step: 216380   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:30:27,849-Speed 3313.87 samples/sec   Loss 1.0509   LearningRate 0.0017   Epoch: 17   Global Step: 216390   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:30:30,968-Speed 3284.27 samples/sec   Loss 1.0868   LearningRate 0.0017   Epoch: 17   Global Step: 216400   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:30:34,039-Speed 3335.95 samples/sec   Loss 1.0574   LearningRate 0.0017   Epoch: 17   Global Step: 216410   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:30:37,139-Speed 3304.30 samples/sec   Loss 1.0992   LearningRate 0.0017   Epoch: 17   Global Step: 216420   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:30:40,260-Speed 3281.51 samples/sec   Loss 1.0866   LearningRate 0.0017   Epoch: 17   Global Step: 216430   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:30:43,361-Speed 3303.59 samples/sec   Loss 1.1031   LearningRate 0.0017   Epoch: 17   Global Step: 216440   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:30:46,496-Speed 3267.33 samples/sec   Loss 1.0546   LearningRate 0.0017   Epoch: 17   Global Step: 216450   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:30:49,635-Speed 3263.17 samples/sec   Loss 1.0390   LearningRate 0.0017   Epoch: 17   Global Step: 216460   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:30:52,769-Speed 3268.17 samples/sec   Loss 1.0369   LearningRate 0.0017   Epoch: 17   Global Step: 216470   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:30:55,894-Speed 3277.86 samples/sec   Loss 1.0452   LearningRate 0.0017   Epoch: 17   Global Step: 216480   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:30:58,998-Speed 3300.40 samples/sec   Loss 1.0732   LearningRate 0.0017   Epoch: 17   Global Step: 216490   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:31:02,113-Speed 3288.35 samples/sec   Loss 1.0653   LearningRate 0.0017   Epoch: 17   Global Step: 216500   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:31:05,265-Speed 3249.87 samples/sec   Loss 1.0336   LearningRate 0.0016   Epoch: 17   Global Step: 216510   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:31:08,329-Speed 3342.58 samples/sec   Loss 1.0606   LearningRate 0.0016   Epoch: 17   Global Step: 216520   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:31:11,419-Speed 3315.34 samples/sec   Loss 1.0521   LearningRate 0.0016   Epoch: 17   Global Step: 216530   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:31:14,532-Speed 3290.75 samples/sec   Loss 1.0490   LearningRate 0.0016   Epoch: 17   Global Step: 216540   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:31:17,648-Speed 3292.39 samples/sec   Loss 1.0649   LearningRate 0.0016   Epoch: 17   Global Step: 216550   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:31:20,734-Speed 3318.71 samples/sec   Loss 1.0610   LearningRate 0.0016   Epoch: 17   Global Step: 216560   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:31:23,843-Speed 3295.21 samples/sec   Loss 1.1080   LearningRate 0.0016   Epoch: 17   Global Step: 216570   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:31:26,988-Speed 3257.02 samples/sec   Loss 1.0620   LearningRate 0.0016   Epoch: 17   Global Step: 216580   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:31:30,091-Speed 3301.02 samples/sec   Loss 1.0566   LearningRate 0.0016   Epoch: 17   Global Step: 216590   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:31:33,178-Speed 3318.71 samples/sec   Loss 1.1078   LearningRate 0.0016   Epoch: 17   Global Step: 216600   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:31:36,280-Speed 3301.38 samples/sec   Loss 1.0816   LearningRate 0.0016   Epoch: 17   Global Step: 216610   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:31:39,428-Speed 3253.67 samples/sec   Loss 1.0581   LearningRate 0.0016   Epoch: 17   Global Step: 216620   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:31:42,510-Speed 3323.96 samples/sec   Loss 1.0640   LearningRate 0.0016   Epoch: 17   Global Step: 216630   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:31:45,617-Speed 3296.63 samples/sec   Loss 1.0821   LearningRate 0.0016   Epoch: 17   Global Step: 216640   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:31:48,786-Speed 3232.73 samples/sec   Loss 1.0759   LearningRate 0.0016   Epoch: 17   Global Step: 216650   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:31:51,888-Speed 3301.40 samples/sec   Loss 1.0725   LearningRate 0.0016   Epoch: 17   Global Step: 216660   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:31:54,994-Speed 3298.31 samples/sec   Loss 1.1060   LearningRate 0.0016   Epoch: 17   Global Step: 216670   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:31:58,085-Speed 3313.68 samples/sec   Loss 1.0781   LearningRate 0.0016   Epoch: 17   Global Step: 216680   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:32:01,172-Speed 3318.41 samples/sec   Loss 1.0583   LearningRate 0.0016   Epoch: 17   Global Step: 216690   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:32:04,231-Speed 3348.88 samples/sec   Loss 1.0823   LearningRate 0.0016   Epoch: 17   Global Step: 216700   Fp16 Grad Scale: 4096   Required: 3 hours
Training: 2022-04-27 20:32:07,328-Speed 3306.90 samples/sec   Loss 1.0765   LearningRate 0.0016   Epoch: 17   Global Step: 216710   Fp16 Grad Scale: 4096   Required: 3 hours
Training: 2022-04-27 20:32:10,420-Speed 3313.29 samples/sec   Loss 1.0986   LearningRate 0.0016   Epoch: 17   Global Step: 216720   Fp16 Grad Scale: 4096   Required: 3 hours
Training: 2022-04-27 20:32:13,579-Speed 3242.21 samples/sec   Loss 1.0774   LearningRate 0.0016   Epoch: 17   Global Step: 216730   Fp16 Grad Scale: 4096   Required: 3 hours
Training: 2022-04-27 20:32:16,722-Speed 3259.43 samples/sec   Loss 1.0226   LearningRate 0.0016   Epoch: 17   Global Step: 216740   Fp16 Grad Scale: 4096   Required: 3 hours
Training: 2022-04-27 20:32:19,853-Speed 3271.03 samples/sec   Loss 1.0874   LearningRate 0.0016   Epoch: 17   Global Step: 216750   Fp16 Grad Scale: 4096   Required: 3 hours
Training: 2022-04-27 20:32:22,942-Speed 3316.95 samples/sec   Loss 1.0522   LearningRate 0.0016   Epoch: 17   Global Step: 216760   Fp16 Grad Scale: 4096   Required: 3 hours
Training: 2022-04-27 20:32:26,064-Speed 3280.63 samples/sec   Loss 1.0450   LearningRate 0.0016   Epoch: 17   Global Step: 216770   Fp16 Grad Scale: 4096   Required: 3 hours
Training: 2022-04-27 20:32:29,155-Speed 3314.03 samples/sec   Loss 1.0412   LearningRate 0.0016   Epoch: 17   Global Step: 216780   Fp16 Grad Scale: 4096   Required: 3 hours
Training: 2022-04-27 20:32:32,241-Speed 3319.03 samples/sec   Loss 1.0580   LearningRate 0.0016   Epoch: 17   Global Step: 216790   Fp16 Grad Scale: 4096   Required: 3 hours
Training: 2022-04-27 20:32:35,343-Speed 3302.69 samples/sec   Loss 1.0857   LearningRate 0.0016   Epoch: 17   Global Step: 216800   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:32:38,426-Speed 3322.45 samples/sec   Loss 1.0494   LearningRate 0.0016   Epoch: 17   Global Step: 216810   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:32:41,503-Speed 3329.03 samples/sec   Loss 1.0458   LearningRate 0.0016   Epoch: 17   Global Step: 216820   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:32:44,606-Speed 3301.10 samples/sec   Loss 1.0355   LearningRate 0.0016   Epoch: 17   Global Step: 216830   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:32:47,734-Speed 3274.29 samples/sec   Loss 1.0483   LearningRate 0.0016   Epoch: 17   Global Step: 216840   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:32:50,813-Speed 3326.87 samples/sec   Loss 1.0305   LearningRate 0.0016   Epoch: 17   Global Step: 216850   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:32:53,887-Speed 3332.75 samples/sec   Loss 1.1088   LearningRate 0.0016   Epoch: 17   Global Step: 216860   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:32:56,964-Speed 3328.35 samples/sec   Loss 1.1114   LearningRate 0.0016   Epoch: 17   Global Step: 216870   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:33:00,083-Speed 3284.81 samples/sec   Loss 1.0399   LearningRate 0.0016   Epoch: 17   Global Step: 216880   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:33:03,167-Speed 3321.06 samples/sec   Loss 1.0396   LearningRate 0.0016   Epoch: 17   Global Step: 216890   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:33:06,294-Speed 3275.97 samples/sec   Loss 1.0871   LearningRate 0.0016   Epoch: 17   Global Step: 216900   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:33:09,364-Speed 3336.61 samples/sec   Loss 1.0580   LearningRate 0.0016   Epoch: 17   Global Step: 216910   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:33:12,456-Speed 3313.76 samples/sec   Loss 1.0501   LearningRate 0.0016   Epoch: 17   Global Step: 216920   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:33:15,613-Speed 3244.28 samples/sec   Loss 1.0601   LearningRate 0.0016   Epoch: 17   Global Step: 216930   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:33:18,757-Speed 3257.51 samples/sec   Loss 1.0592   LearningRate 0.0016   Epoch: 17   Global Step: 216940   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:33:21,852-Speed 3310.08 samples/sec   Loss 1.0546   LearningRate 0.0016   Epoch: 17   Global Step: 216950   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:33:24,985-Speed 3269.10 samples/sec   Loss 1.0631   LearningRate 0.0016   Epoch: 17   Global Step: 216960   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:33:28,054-Speed 3337.51 samples/sec   Loss 1.1258   LearningRate 0.0016   Epoch: 17   Global Step: 216970   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:33:31,193-Speed 3262.39 samples/sec   Loss 1.0528   LearningRate 0.0016   Epoch: 17   Global Step: 216980   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:33:34,285-Speed 3313.27 samples/sec   Loss 1.0718   LearningRate 0.0016   Epoch: 17   Global Step: 216990   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:33:37,434-Speed 3252.88 samples/sec   Loss 1.0410   LearningRate 0.0016   Epoch: 17   Global Step: 217000   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:33:40,574-Speed 3262.55 samples/sec   Loss 1.0868   LearningRate 0.0016   Epoch: 17   Global Step: 217010   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:33:43,644-Speed 3336.93 samples/sec   Loss 1.0977   LearningRate 0.0016   Epoch: 17   Global Step: 217020   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:33:46,774-Speed 3272.13 samples/sec   Loss 1.0620   LearningRate 0.0016   Epoch: 17   Global Step: 217030   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:33:49,885-Speed 3291.80 samples/sec   Loss 1.0852   LearningRate 0.0016   Epoch: 17   Global Step: 217040   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:33:53,009-Speed 3279.48 samples/sec   Loss 1.0929   LearningRate 0.0016   Epoch: 17   Global Step: 217050   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:33:56,171-Speed 3239.14 samples/sec   Loss 1.0901   LearningRate 0.0016   Epoch: 17   Global Step: 217060   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:33:59,290-Speed 3283.90 samples/sec   Loss 1.0493   LearningRate 0.0016   Epoch: 17   Global Step: 217070   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:34:02,520-Speed 3171.07 samples/sec   Loss 1.0599   LearningRate 0.0016   Epoch: 17   Global Step: 217080   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:34:05,602-Speed 3324.12 samples/sec   Loss 1.0745   LearningRate 0.0016   Epoch: 17   Global Step: 217090   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:34:08,720-Speed 3284.48 samples/sec   Loss 1.0634   LearningRate 0.0016   Epoch: 17   Global Step: 217100   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:34:11,826-Speed 3297.78 samples/sec   Loss 1.0746   LearningRate 0.0016   Epoch: 17   Global Step: 217110   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:34:14,978-Speed 3250.37 samples/sec   Loss 1.0946   LearningRate 0.0016   Epoch: 17   Global Step: 217120   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:34:18,157-Speed 3222.36 samples/sec   Loss 1.0273   LearningRate 0.0016   Epoch: 17   Global Step: 217130   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:34:21,255-Speed 3306.00 samples/sec   Loss 1.0572   LearningRate 0.0016   Epoch: 17   Global Step: 217140   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:34:24,346-Speed 3313.17 samples/sec   Loss 1.1000   LearningRate 0.0016   Epoch: 17   Global Step: 217150   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:34:27,546-Speed 3201.73 samples/sec   Loss 1.0681   LearningRate 0.0016   Epoch: 17   Global Step: 217160   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:34:30,618-Speed 3334.80 samples/sec   Loss 1.0362   LearningRate 0.0016   Epoch: 17   Global Step: 217170   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:34:33,720-Speed 3302.10 samples/sec   Loss 1.0521   LearningRate 0.0016   Epoch: 17   Global Step: 217180   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:34:36,876-Speed 3245.09 samples/sec   Loss 1.0528   LearningRate 0.0016   Epoch: 17   Global Step: 217190   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:34:39,991-Speed 3289.01 samples/sec   Loss 0.9864   LearningRate 0.0016   Epoch: 17   Global Step: 217200   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:34:43,142-Speed 3250.61 samples/sec   Loss 1.0661   LearningRate 0.0016   Epoch: 17   Global Step: 217210   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:34:46,232-Speed 3315.13 samples/sec   Loss 1.0734   LearningRate 0.0016   Epoch: 17   Global Step: 217220   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:34:49,327-Speed 3308.74 samples/sec   Loss 1.0127   LearningRate 0.0016   Epoch: 17   Global Step: 217230   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:34:52,514-Speed 3214.40 samples/sec   Loss 1.0535   LearningRate 0.0016   Epoch: 17   Global Step: 217240   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:34:55,608-Speed 3311.43 samples/sec   Loss 1.0689   LearningRate 0.0016   Epoch: 17   Global Step: 217250   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:34:58,718-Speed 3293.18 samples/sec   Loss 1.1043   LearningRate 0.0016   Epoch: 17   Global Step: 217260   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:35:01,882-Speed 3237.62 samples/sec   Loss 1.0240   LearningRate 0.0016   Epoch: 17   Global Step: 217270   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:35:05,043-Speed 3240.00 samples/sec   Loss 1.0764   LearningRate 0.0016   Epoch: 17   Global Step: 217280   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:35:08,209-Speed 3235.33 samples/sec   Loss 1.0673   LearningRate 0.0016   Epoch: 17   Global Step: 217290   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:35:11,332-Speed 3280.05 samples/sec   Loss 1.0830   LearningRate 0.0016   Epoch: 17   Global Step: 217300   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:35:14,458-Speed 3277.45 samples/sec   Loss 1.0402   LearningRate 0.0016   Epoch: 17   Global Step: 217310   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:35:17,566-Speed 3295.34 samples/sec   Loss 1.0419   LearningRate 0.0016   Epoch: 17   Global Step: 217320   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:35:20,630-Speed 3343.95 samples/sec   Loss 1.0801   LearningRate 0.0016   Epoch: 17   Global Step: 217330   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:35:23,729-Speed 3305.30 samples/sec   Loss 1.1005   LearningRate 0.0016   Epoch: 17   Global Step: 217340   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:35:26,854-Speed 3277.66 samples/sec   Loss 1.0841   LearningRate 0.0016   Epoch: 17   Global Step: 217350   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:35:29,918-Speed 3343.34 samples/sec   Loss 1.0798   LearningRate 0.0016   Epoch: 17   Global Step: 217360   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:35:32,954-Speed 3373.00 samples/sec   Loss 1.0185   LearningRate 0.0016   Epoch: 17   Global Step: 217370   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:35:36,139-Speed 3216.11 samples/sec   Loss 1.0938   LearningRate 0.0016   Epoch: 17   Global Step: 217380   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:35:39,313-Speed 3227.13 samples/sec   Loss 1.0633   LearningRate 0.0016   Epoch: 17   Global Step: 217390   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:35:42,460-Speed 3255.45 samples/sec   Loss 1.0768   LearningRate 0.0016   Epoch: 17   Global Step: 217400   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:35:45,574-Speed 3289.09 samples/sec   Loss 1.0442   LearningRate 0.0016   Epoch: 17   Global Step: 217410   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:35:48,732-Speed 3243.74 samples/sec   Loss 1.0775   LearningRate 0.0016   Epoch: 17   Global Step: 217420   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:35:51,865-Speed 3269.54 samples/sec   Loss 1.0109   LearningRate 0.0016   Epoch: 17   Global Step: 217430   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:35:54,944-Speed 3326.78 samples/sec   Loss 1.0109   LearningRate 0.0016   Epoch: 17   Global Step: 217440   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:35:58,007-Speed 3344.04 samples/sec   Loss 1.0698   LearningRate 0.0016   Epoch: 17   Global Step: 217450   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:36:01,085-Speed 3328.33 samples/sec   Loss 1.0544   LearningRate 0.0016   Epoch: 17   Global Step: 217460   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:36:04,243-Speed 3243.58 samples/sec   Loss 1.0536   LearningRate 0.0016   Epoch: 17   Global Step: 217470   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:36:07,408-Speed 3236.13 samples/sec   Loss 1.0749   LearningRate 0.0016   Epoch: 17   Global Step: 217480   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:36:10,530-Speed 3281.04 samples/sec   Loss 1.0473   LearningRate 0.0016   Epoch: 17   Global Step: 217490   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:36:13,690-Speed 3242.07 samples/sec   Loss 1.0218   LearningRate 0.0015   Epoch: 17   Global Step: 217500   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:36:16,788-Speed 3306.42 samples/sec   Loss 1.0585   LearningRate 0.0015   Epoch: 17   Global Step: 217510   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:36:19,925-Speed 3265.35 samples/sec   Loss 1.0654   LearningRate 0.0015   Epoch: 17   Global Step: 217520   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:36:23,010-Speed 3319.93 samples/sec   Loss 1.0869   LearningRate 0.0015   Epoch: 17   Global Step: 217530   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:36:26,101-Speed 3314.39 samples/sec   Loss 1.0405   LearningRate 0.0015   Epoch: 17   Global Step: 217540   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:36:29,301-Speed 3200.87 samples/sec   Loss 1.0555   LearningRate 0.0015   Epoch: 17   Global Step: 217550   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:36:32,398-Speed 3307.39 samples/sec   Loss 1.0656   LearningRate 0.0015   Epoch: 17   Global Step: 217560   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:36:35,516-Speed 3285.01 samples/sec   Loss 1.0390   LearningRate 0.0015   Epoch: 17   Global Step: 217570   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:36:38,645-Speed 3272.95 samples/sec   Loss 1.0162   LearningRate 0.0015   Epoch: 17   Global Step: 217580   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:36:41,770-Speed 3278.13 samples/sec   Loss 1.0280   LearningRate 0.0015   Epoch: 17   Global Step: 217590   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:36:44,861-Speed 3314.89 samples/sec   Loss 1.0744   LearningRate 0.0015   Epoch: 17   Global Step: 217600   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:36:48,010-Speed 3252.75 samples/sec   Loss 1.0340   LearningRate 0.0015   Epoch: 17   Global Step: 217610   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:36:51,190-Speed 3220.24 samples/sec   Loss 1.0482   LearningRate 0.0015   Epoch: 17   Global Step: 217620   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:36:54,318-Speed 3275.40 samples/sec   Loss 1.1041   LearningRate 0.0015   Epoch: 17   Global Step: 217630   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:36:57,425-Speed 3296.99 samples/sec   Loss 1.0714   LearningRate 0.0015   Epoch: 17   Global Step: 217640   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:37:00,556-Speed 3270.66 samples/sec   Loss 1.0555   LearningRate 0.0015   Epoch: 17   Global Step: 217650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:37:03,723-Speed 3234.96 samples/sec   Loss 1.0655   LearningRate 0.0015   Epoch: 17   Global Step: 217660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:37:06,873-Speed 3251.19 samples/sec   Loss 1.0998   LearningRate 0.0015   Epoch: 17   Global Step: 217670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:37:09,966-Speed 3311.98 samples/sec   Loss 1.0987   LearningRate 0.0015   Epoch: 17   Global Step: 217680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:37:13,106-Speed 3262.27 samples/sec   Loss 1.0522   LearningRate 0.0015   Epoch: 17   Global Step: 217690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:37:16,218-Speed 3291.29 samples/sec   Loss 1.0661   LearningRate 0.0015   Epoch: 17   Global Step: 217700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:37:19,339-Speed 3282.20 samples/sec   Loss 1.0693   LearningRate 0.0015   Epoch: 17   Global Step: 217710   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:37:22,495-Speed 3245.77 samples/sec   Loss 1.0324   LearningRate 0.0015   Epoch: 17   Global Step: 217720   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:37:25,576-Speed 3324.72 samples/sec   Loss 1.0334   LearningRate 0.0015   Epoch: 17   Global Step: 217730   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:37:28,655-Speed 3326.69 samples/sec   Loss 1.0439   LearningRate 0.0015   Epoch: 17   Global Step: 217740   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:37:31,776-Speed 3282.02 samples/sec   Loss 1.1175   LearningRate 0.0015   Epoch: 17   Global Step: 217750   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:37:34,889-Speed 3290.23 samples/sec   Loss 1.0748   LearningRate 0.0015   Epoch: 17   Global Step: 217760   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:37:38,009-Speed 3283.48 samples/sec   Loss 1.0997   LearningRate 0.0015   Epoch: 17   Global Step: 217770   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:37:41,141-Speed 3270.33 samples/sec   Loss 1.0723   LearningRate 0.0015   Epoch: 17   Global Step: 217780   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:37:44,282-Speed 3261.10 samples/sec   Loss 1.0706   LearningRate 0.0015   Epoch: 17   Global Step: 217790   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:37:47,351-Speed 3337.41 samples/sec   Loss 1.0592   LearningRate 0.0015   Epoch: 17   Global Step: 217800   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:37:50,506-Speed 3246.29 samples/sec   Loss 1.0467   LearningRate 0.0015   Epoch: 17   Global Step: 217810   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:37:53,631-Speed 3278.57 samples/sec   Loss 1.0702   LearningRate 0.0015   Epoch: 17   Global Step: 217820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:37:56,685-Speed 3354.55 samples/sec   Loss 1.1117   LearningRate 0.0015   Epoch: 17   Global Step: 217830   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:37:59,754-Speed 3336.78 samples/sec   Loss 1.0351   LearningRate 0.0015   Epoch: 17   Global Step: 217840   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:38:02,901-Speed 3254.88 samples/sec   Loss 1.0658   LearningRate 0.0015   Epoch: 17   Global Step: 217850   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:38:06,049-Speed 3254.53 samples/sec   Loss 1.0571   LearningRate 0.0015   Epoch: 17   Global Step: 217860   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:38:09,121-Speed 3333.44 samples/sec   Loss 1.0246   LearningRate 0.0015   Epoch: 17   Global Step: 217870   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:38:12,237-Speed 3287.68 samples/sec   Loss 1.0581   LearningRate 0.0015   Epoch: 17   Global Step: 217880   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:38:15,313-Speed 3330.50 samples/sec   Loss 1.0708   LearningRate 0.0015   Epoch: 17   Global Step: 217890   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:38:18,470-Speed 3243.91 samples/sec   Loss 1.0386   LearningRate 0.0015   Epoch: 17   Global Step: 217900   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:38:21,586-Speed 3287.79 samples/sec   Loss 1.0725   LearningRate 0.0015   Epoch: 17   Global Step: 217910   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:38:24,661-Speed 3331.44 samples/sec   Loss 1.0815   LearningRate 0.0015   Epoch: 17   Global Step: 217920   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:38:27,759-Speed 3306.34 samples/sec   Loss 1.0302   LearningRate 0.0015   Epoch: 17   Global Step: 217930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:38:30,877-Speed 3284.44 samples/sec   Loss 1.0494   LearningRate 0.0015   Epoch: 17   Global Step: 217940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:38:33,980-Speed 3301.48 samples/sec   Loss 1.0885   LearningRate 0.0015   Epoch: 17   Global Step: 217950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:38:37,084-Speed 3299.94 samples/sec   Loss 1.0909   LearningRate 0.0015   Epoch: 17   Global Step: 217960   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:38:40,178-Speed 3310.51 samples/sec   Loss 1.0237   LearningRate 0.0015   Epoch: 17   Global Step: 217970   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:38:43,296-Speed 3285.06 samples/sec   Loss 1.0814   LearningRate 0.0015   Epoch: 17   Global Step: 217980   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:38:46,343-Speed 3361.76 samples/sec   Loss 1.0688   LearningRate 0.0015   Epoch: 17   Global Step: 217990   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:38:49,436-Speed 3312.39 samples/sec   Loss 1.0541   LearningRate 0.0015   Epoch: 17   Global Step: 218000   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:38:52,543-Speed 3297.11 samples/sec   Loss 1.0258   LearningRate 0.0015   Epoch: 17   Global Step: 218010   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:38:55,657-Speed 3289.10 samples/sec   Loss 1.0685   LearningRate 0.0015   Epoch: 17   Global Step: 218020   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:38:58,830-Speed 3227.58 samples/sec   Loss 1.0755   LearningRate 0.0015   Epoch: 17   Global Step: 218030   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:39:01,956-Speed 3276.93 samples/sec   Loss 1.0553   LearningRate 0.0015   Epoch: 17   Global Step: 218040   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:39:05,088-Speed 3271.03 samples/sec   Loss 1.0246   LearningRate 0.0015   Epoch: 17   Global Step: 218050   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:39:08,167-Speed 3326.10 samples/sec   Loss 1.0607   LearningRate 0.0015   Epoch: 17   Global Step: 218060   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:39:11,265-Speed 3306.35 samples/sec   Loss 1.0546   LearningRate 0.0015   Epoch: 17   Global Step: 218070   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:39:14,392-Speed 3276.90 samples/sec   Loss 1.1067   LearningRate 0.0015   Epoch: 17   Global Step: 218080   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:39:17,472-Speed 3325.66 samples/sec   Loss 1.0704   LearningRate 0.0015   Epoch: 17   Global Step: 218090   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:39:20,522-Speed 3358.22 samples/sec   Loss 1.0514   LearningRate 0.0015   Epoch: 17   Global Step: 218100   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:39:23,625-Speed 3300.99 samples/sec   Loss 1.0715   LearningRate 0.0015   Epoch: 17   Global Step: 218110   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:39:26,773-Speed 3253.90 samples/sec   Loss 1.0541   LearningRate 0.0015   Epoch: 17   Global Step: 218120   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:39:29,944-Speed 3230.44 samples/sec   Loss 1.0379   LearningRate 0.0015   Epoch: 17   Global Step: 218130   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:39:33,012-Speed 3338.25 samples/sec   Loss 1.0588   LearningRate 0.0015   Epoch: 17   Global Step: 218140   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:39:36,099-Speed 3318.56 samples/sec   Loss 1.0778   LearningRate 0.0015   Epoch: 17   Global Step: 218150   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:39:39,160-Speed 3346.22 samples/sec   Loss 1.0219   LearningRate 0.0015   Epoch: 17   Global Step: 218160   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:39:42,242-Speed 3323.98 samples/sec   Loss 1.0494   LearningRate 0.0015   Epoch: 17   Global Step: 218170   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:39:45,309-Speed 3340.23 samples/sec   Loss 1.0598   LearningRate 0.0015   Epoch: 17   Global Step: 218180   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:39:48,413-Speed 3299.89 samples/sec   Loss 1.0662   LearningRate 0.0015   Epoch: 17   Global Step: 218190   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:39:51,497-Speed 3321.08 samples/sec   Loss 1.0239   LearningRate 0.0015   Epoch: 17   Global Step: 218200   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:39:54,578-Speed 3325.49 samples/sec   Loss 1.0481   LearningRate 0.0015   Epoch: 17   Global Step: 218210   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:39:57,629-Speed 3356.36 samples/sec   Loss 1.0552   LearningRate 0.0015   Epoch: 17   Global Step: 218220   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:40:00,738-Speed 3295.17 samples/sec   Loss 1.0333   LearningRate 0.0015   Epoch: 17   Global Step: 218230   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:40:03,830-Speed 3312.22 samples/sec   Loss 1.0928   LearningRate 0.0015   Epoch: 17   Global Step: 218240   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:40:06,916-Speed 3320.33 samples/sec   Loss 1.0754   LearningRate 0.0015   Epoch: 17   Global Step: 218250   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:40:09,973-Speed 3349.90 samples/sec   Loss 1.0179   LearningRate 0.0015   Epoch: 17   Global Step: 218260   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:40:13,140-Speed 3234.88 samples/sec   Loss 1.0443   LearningRate 0.0015   Epoch: 17   Global Step: 218270   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:40:16,224-Speed 3321.20 samples/sec   Loss 1.0968   LearningRate 0.0015   Epoch: 17   Global Step: 218280   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:40:19,297-Speed 3333.59 samples/sec   Loss 1.0376   LearningRate 0.0015   Epoch: 17   Global Step: 218290   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:40:22,381-Speed 3320.57 samples/sec   Loss 1.0910   LearningRate 0.0015   Epoch: 17   Global Step: 218300   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:40:25,456-Speed 3331.89 samples/sec   Loss 1.0788   LearningRate 0.0015   Epoch: 17   Global Step: 218310   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:40:28,584-Speed 3274.52 samples/sec   Loss 1.0316   LearningRate 0.0015   Epoch: 17   Global Step: 218320   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:40:31,658-Speed 3331.67 samples/sec   Loss 1.0306   LearningRate 0.0015   Epoch: 17   Global Step: 218330   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:40:34,728-Speed 3336.81 samples/sec   Loss 1.0815   LearningRate 0.0015   Epoch: 17   Global Step: 218340   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:40:37,809-Speed 3324.45 samples/sec   Loss 1.0580   LearningRate 0.0015   Epoch: 17   Global Step: 218350   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:40:40,863-Speed 3353.97 samples/sec   Loss 1.0802   LearningRate 0.0015   Epoch: 17   Global Step: 218360   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:40:43,959-Speed 3308.76 samples/sec   Loss 1.0550   LearningRate 0.0015   Epoch: 17   Global Step: 218370   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:40:47,059-Speed 3305.03 samples/sec   Loss 1.0409   LearningRate 0.0015   Epoch: 17   Global Step: 218380   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:40:50,165-Speed 3296.99 samples/sec   Loss 1.1055   LearningRate 0.0015   Epoch: 17   Global Step: 218390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:40:53,313-Speed 3255.13 samples/sec   Loss 1.0763   LearningRate 0.0015   Epoch: 17   Global Step: 218400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:40:56,355-Speed 3367.23 samples/sec   Loss 1.0937   LearningRate 0.0015   Epoch: 17   Global Step: 218410   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:40:59,426-Speed 3335.80 samples/sec   Loss 1.0857   LearningRate 0.0015   Epoch: 17   Global Step: 218420   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:41:02,561-Speed 3266.40 samples/sec   Loss 1.0783   LearningRate 0.0015   Epoch: 17   Global Step: 218430   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:41:05,711-Speed 3252.43 samples/sec   Loss 1.1050   LearningRate 0.0015   Epoch: 17   Global Step: 218440   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:41:08,752-Speed 3368.02 samples/sec   Loss 1.0333   LearningRate 0.0015   Epoch: 17   Global Step: 218450   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:41:11,914-Speed 3239.02 samples/sec   Loss 1.0376   LearningRate 0.0015   Epoch: 17   Global Step: 218460   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:41:15,097-Speed 3218.43 samples/sec   Loss 1.0507   LearningRate 0.0015   Epoch: 17   Global Step: 218470   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:41:18,248-Speed 3250.41 samples/sec   Loss 1.0309   LearningRate 0.0015   Epoch: 17   Global Step: 218480   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:41:21,322-Speed 3332.74 samples/sec   Loss 1.0805   LearningRate 0.0015   Epoch: 17   Global Step: 218490   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:41:24,465-Speed 3259.18 samples/sec   Loss 1.0472   LearningRate 0.0015   Epoch: 17   Global Step: 218500   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:41:27,612-Speed 3254.35 samples/sec   Loss 1.0993   LearningRate 0.0014   Epoch: 17   Global Step: 218510   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:41:30,723-Speed 3293.08 samples/sec   Loss 1.0417   LearningRate 0.0014   Epoch: 17   Global Step: 218520   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:41:33,786-Speed 3344.41 samples/sec   Loss 0.9842   LearningRate 0.0014   Epoch: 17   Global Step: 218530   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:41:36,938-Speed 3249.95 samples/sec   Loss 1.0458   LearningRate 0.0014   Epoch: 17   Global Step: 218540   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:41:40,107-Speed 3231.94 samples/sec   Loss 1.0933   LearningRate 0.0014   Epoch: 17   Global Step: 218550   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:41:43,193-Speed 3319.02 samples/sec   Loss 1.0202   LearningRate 0.0014   Epoch: 17   Global Step: 218560   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:41:46,330-Speed 3266.19 samples/sec   Loss 1.0375   LearningRate 0.0014   Epoch: 17   Global Step: 218570   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:41:49,500-Speed 3231.42 samples/sec   Loss 1.0670   LearningRate 0.0014   Epoch: 17   Global Step: 218580   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:41:52,646-Speed 3255.89 samples/sec   Loss 1.0444   LearningRate 0.0014   Epoch: 17   Global Step: 218590   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:41:55,747-Speed 3303.41 samples/sec   Loss 1.0576   LearningRate 0.0014   Epoch: 17   Global Step: 218600   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:41:58,900-Speed 3248.74 samples/sec   Loss 1.0771   LearningRate 0.0014   Epoch: 17   Global Step: 218610   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:42:02,045-Speed 3257.49 samples/sec   Loss 1.0617   LearningRate 0.0014   Epoch: 17   Global Step: 218620   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:42:05,134-Speed 3316.04 samples/sec   Loss 1.1047   LearningRate 0.0014   Epoch: 17   Global Step: 218630   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:42:08,248-Speed 3288.95 samples/sec   Loss 1.0722   LearningRate 0.0014   Epoch: 17   Global Step: 218640   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:42:11,370-Speed 3281.32 samples/sec   Loss 1.1026   LearningRate 0.0014   Epoch: 17   Global Step: 218650   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:42:14,498-Speed 3274.75 samples/sec   Loss 1.0073   LearningRate 0.0014   Epoch: 17   Global Step: 218660   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:42:17,632-Speed 3268.11 samples/sec   Loss 1.0802   LearningRate 0.0014   Epoch: 17   Global Step: 218670   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:42:20,726-Speed 3310.36 samples/sec   Loss 1.0366   LearningRate 0.0014   Epoch: 17   Global Step: 218680   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:42:23,820-Speed 3310.42 samples/sec   Loss 1.0772   LearningRate 0.0014   Epoch: 17   Global Step: 218690   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:42:26,970-Speed 3252.18 samples/sec   Loss 1.0080   LearningRate 0.0014   Epoch: 17   Global Step: 218700   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:42:30,095-Speed 3278.09 samples/sec   Loss 1.0754   LearningRate 0.0014   Epoch: 17   Global Step: 218710   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:42:33,154-Speed 3348.97 samples/sec   Loss 1.0486   LearningRate 0.0014   Epoch: 17   Global Step: 218720   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:42:36,251-Speed 3306.98 samples/sec   Loss 1.0529   LearningRate 0.0014   Epoch: 17   Global Step: 218730   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:42:39,353-Speed 3302.15 samples/sec   Loss 1.0031   LearningRate 0.0014   Epoch: 17   Global Step: 218740   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:42:42,511-Speed 3243.70 samples/sec   Loss 1.0642   LearningRate 0.0014   Epoch: 17   Global Step: 218750   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:42:45,577-Speed 3341.35 samples/sec   Loss 1.0432   LearningRate 0.0014   Epoch: 17   Global Step: 218760   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:42:48,681-Speed 3299.34 samples/sec   Loss 1.0311   LearningRate 0.0014   Epoch: 17   Global Step: 218770   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:42:51,770-Speed 3316.51 samples/sec   Loss 1.0854   LearningRate 0.0014   Epoch: 17   Global Step: 218780   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:42:54,831-Speed 3345.77 samples/sec   Loss 1.0837   LearningRate 0.0014   Epoch: 17   Global Step: 218790   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:42:57,916-Speed 3320.74 samples/sec   Loss 1.0429   LearningRate 0.0014   Epoch: 17   Global Step: 218800   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:43:01,043-Speed 3275.28 samples/sec   Loss 1.0808   LearningRate 0.0014   Epoch: 17   Global Step: 218810   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:43:04,190-Speed 3255.93 samples/sec   Loss 1.1031   LearningRate 0.0014   Epoch: 17   Global Step: 218820   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:43:07,314-Speed 3278.44 samples/sec   Loss 1.0759   LearningRate 0.0014   Epoch: 17   Global Step: 218830   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:43:10,442-Speed 3275.01 samples/sec   Loss 1.0587   LearningRate 0.0014   Epoch: 17   Global Step: 218840   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:43:13,571-Speed 3273.17 samples/sec   Loss 1.0673   LearningRate 0.0014   Epoch: 17   Global Step: 218850   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:43:16,758-Speed 3214.69 samples/sec   Loss 1.0664   LearningRate 0.0014   Epoch: 17   Global Step: 218860   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:43:19,907-Speed 3252.17 samples/sec   Loss 1.0213   LearningRate 0.0014   Epoch: 17   Global Step: 218870   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:43:22,977-Speed 3336.99 samples/sec   Loss 1.0413   LearningRate 0.0014   Epoch: 17   Global Step: 218880   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:43:26,129-Speed 3250.09 samples/sec   Loss 1.0697   LearningRate 0.0014   Epoch: 17   Global Step: 218890   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:43:29,203-Speed 3332.02 samples/sec   Loss 1.0730   LearningRate 0.0014   Epoch: 17   Global Step: 218900   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:43:32,271-Speed 3338.72 samples/sec   Loss 1.0797   LearningRate 0.0014   Epoch: 17   Global Step: 218910   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:43:35,376-Speed 3298.74 samples/sec   Loss 1.1123   LearningRate 0.0014   Epoch: 17   Global Step: 218920   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:43:38,462-Speed 3319.80 samples/sec   Loss 1.0734   LearningRate 0.0014   Epoch: 17   Global Step: 218930   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:43:41,576-Speed 3288.86 samples/sec   Loss 1.0638   LearningRate 0.0014   Epoch: 17   Global Step: 218940   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:43:44,684-Speed 3295.61 samples/sec   Loss 1.0300   LearningRate 0.0014   Epoch: 17   Global Step: 218950   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:43:47,809-Speed 3278.18 samples/sec   Loss 1.0340   LearningRate 0.0014   Epoch: 17   Global Step: 218960   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:43:50,988-Speed 3221.92 samples/sec   Loss 1.0627   LearningRate 0.0014   Epoch: 17   Global Step: 218970   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:43:54,135-Speed 3255.38 samples/sec   Loss 1.0438   LearningRate 0.0014   Epoch: 17   Global Step: 218980   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:43:57,209-Speed 3331.96 samples/sec   Loss 1.0227   LearningRate 0.0014   Epoch: 17   Global Step: 218990   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:44:00,363-Speed 3248.06 samples/sec   Loss 1.0297   LearningRate 0.0014   Epoch: 17   Global Step: 219000   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:44:03,536-Speed 3228.37 samples/sec   Loss 1.0695   LearningRate 0.0014   Epoch: 17   Global Step: 219010   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:44:06,598-Speed 3345.08 samples/sec   Loss 1.1051   LearningRate 0.0014   Epoch: 17   Global Step: 219020   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:44:09,649-Speed 3357.51 samples/sec   Loss 1.0605   LearningRate 0.0014   Epoch: 17   Global Step: 219030   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:44:12,777-Speed 3274.15 samples/sec   Loss 1.0555   LearningRate 0.0014   Epoch: 17   Global Step: 219040   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:44:15,922-Speed 3257.15 samples/sec   Loss 1.0440   LearningRate 0.0014   Epoch: 17   Global Step: 219050   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:44:19,077-Speed 3246.16 samples/sec   Loss 1.0528   LearningRate 0.0014   Epoch: 17   Global Step: 219060   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:44:22,145-Speed 3339.06 samples/sec   Loss 1.0780   LearningRate 0.0014   Epoch: 17   Global Step: 219070   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:44:25,267-Speed 3281.06 samples/sec   Loss 1.0942   LearningRate 0.0014   Epoch: 17   Global Step: 219080   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:44:28,361-Speed 3310.29 samples/sec   Loss 1.0414   LearningRate 0.0014   Epoch: 17   Global Step: 219090   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:44:31,451-Speed 3315.56 samples/sec   Loss 1.0656   LearningRate 0.0014   Epoch: 17   Global Step: 219100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:44:34,537-Speed 3319.13 samples/sec   Loss 1.0738   LearningRate 0.0014   Epoch: 17   Global Step: 219110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:44:37,672-Speed 3266.96 samples/sec   Loss 1.0520   LearningRate 0.0014   Epoch: 17   Global Step: 219120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:44:40,758-Speed 3319.33 samples/sec   Loss 1.0408   LearningRate 0.0014   Epoch: 17   Global Step: 219130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:44:43,825-Speed 3340.08 samples/sec   Loss 1.0966   LearningRate 0.0014   Epoch: 17   Global Step: 219140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:44:46,918-Speed 3312.54 samples/sec   Loss 1.0796   LearningRate 0.0014   Epoch: 17   Global Step: 219150   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:44:50,020-Speed 3301.26 samples/sec   Loss 1.0989   LearningRate 0.0014   Epoch: 17   Global Step: 219160   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:44:53,169-Speed 3253.27 samples/sec   Loss 1.0651   LearningRate 0.0014   Epoch: 17   Global Step: 219170   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:44:56,268-Speed 3305.59 samples/sec   Loss 1.0921   LearningRate 0.0014   Epoch: 17   Global Step: 219180   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:44:59,378-Speed 3293.17 samples/sec   Loss 1.0415   LearningRate 0.0014   Epoch: 17   Global Step: 219190   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:45:02,535-Speed 3244.93 samples/sec   Loss 1.0731   LearningRate 0.0014   Epoch: 17   Global Step: 219200   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:45:05,634-Speed 3305.07 samples/sec   Loss 1.0770   LearningRate 0.0014   Epoch: 17   Global Step: 219210   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:45:08,695-Speed 3346.79 samples/sec   Loss 1.0844   LearningRate 0.0014   Epoch: 17   Global Step: 219220   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:45:11,804-Speed 3293.93 samples/sec   Loss 1.0419   LearningRate 0.0014   Epoch: 17   Global Step: 219230   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:45:14,946-Speed 3260.82 samples/sec   Loss 1.0723   LearningRate 0.0014   Epoch: 17   Global Step: 219240   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:45:18,035-Speed 3316.23 samples/sec   Loss 1.0895   LearningRate 0.0014   Epoch: 17   Global Step: 219250   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:45:21,099-Speed 3342.83 samples/sec   Loss 1.0010   LearningRate 0.0014   Epoch: 17   Global Step: 219260   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:45:24,239-Speed 3261.46 samples/sec   Loss 1.0478   LearningRate 0.0014   Epoch: 17   Global Step: 219270   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:45:27,414-Speed 3226.56 samples/sec   Loss 1.0589   LearningRate 0.0014   Epoch: 17   Global Step: 219280   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:45:30,496-Speed 3323.65 samples/sec   Loss 1.0440   LearningRate 0.0014   Epoch: 17   Global Step: 219290   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:45:33,584-Speed 3316.37 samples/sec   Loss 1.0128   LearningRate 0.0014   Epoch: 17   Global Step: 219300   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:45:36,766-Speed 3220.30 samples/sec   Loss 1.0773   LearningRate 0.0014   Epoch: 17   Global Step: 219310   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-04-27 20:45:39,836-Speed 3336.59 samples/sec   Loss 1.0759   LearningRate 0.0014   Epoch: 17   Global Step: 219320   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:45:42,930-Speed 3309.81 samples/sec   Loss 1.0583   LearningRate 0.0014   Epoch: 17   Global Step: 219330   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:45:46,020-Speed 3314.94 samples/sec   Loss 1.0611   LearningRate 0.0014   Epoch: 17   Global Step: 219340   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:45:49,204-Speed 3217.52 samples/sec   Loss 1.0429   LearningRate 0.0014   Epoch: 17   Global Step: 219350   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:45:52,395-Speed 3209.48 samples/sec   Loss 1.1078   LearningRate 0.0014   Epoch: 17   Global Step: 219360   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:45:55,508-Speed 3290.85 samples/sec   Loss 1.0433   LearningRate 0.0014   Epoch: 17   Global Step: 219370   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:45:58,552-Speed 3364.95 samples/sec   Loss 1.0682   LearningRate 0.0014   Epoch: 17   Global Step: 219380   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:46:01,607-Speed 3353.28 samples/sec   Loss 1.0502   LearningRate 0.0014   Epoch: 17   Global Step: 219390   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:46:04,791-Speed 3216.62 samples/sec   Loss 1.0123   LearningRate 0.0014   Epoch: 17   Global Step: 219400   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:46:07,909-Speed 3286.22 samples/sec   Loss 1.0756   LearningRate 0.0014   Epoch: 17   Global Step: 219410   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:46:10,956-Speed 3361.71 samples/sec   Loss 1.0830   LearningRate 0.0014   Epoch: 17   Global Step: 219420   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:46:14,021-Speed 3341.60 samples/sec   Loss 1.0329   LearningRate 0.0014   Epoch: 17   Global Step: 219430   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:46:17,148-Speed 3276.00 samples/sec   Loss 1.0367   LearningRate 0.0014   Epoch: 17   Global Step: 219440   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:46:20,242-Speed 3310.73 samples/sec   Loss 1.0830   LearningRate 0.0014   Epoch: 17   Global Step: 219450   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:46:23,369-Speed 3274.83 samples/sec   Loss 1.0704   LearningRate 0.0014   Epoch: 17   Global Step: 219460   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:46:26,457-Speed 3317.72 samples/sec   Loss 1.0939   LearningRate 0.0014   Epoch: 17   Global Step: 219470   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:46:29,579-Speed 3281.25 samples/sec   Loss 1.0382   LearningRate 0.0014   Epoch: 17   Global Step: 219480   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:46:32,728-Speed 3252.69 samples/sec   Loss 1.0332   LearningRate 0.0014   Epoch: 17   Global Step: 219490   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:46:35,905-Speed 3224.34 samples/sec   Loss 1.0433   LearningRate 0.0014   Epoch: 17   Global Step: 219500   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:46:38,997-Speed 3312.39 samples/sec   Loss 1.0507   LearningRate 0.0014   Epoch: 17   Global Step: 219510   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-04-27 20:46:42,117-Speed 3282.65 samples/sec   Loss 1.0065   LearningRate 0.0014   Epoch: 17   Global Step: 219520   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-04-27 20:46:45,196-Speed 3327.94 samples/sec   Loss 1.0639   LearningRate 0.0014   Epoch: 17   Global Step: 219530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 20:46:48,326-Speed 3272.22 samples/sec   Loss 1.0532   LearningRate 0.0014   Epoch: 17   Global Step: 219540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 20:46:51,464-Speed 3263.98 samples/sec   Loss 1.0518   LearningRate 0.0014   Epoch: 17   Global Step: 219550   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:46:54,623-Speed 3242.87 samples/sec   Loss 1.0936   LearningRate 0.0013   Epoch: 17   Global Step: 219560   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:46:57,703-Speed 3326.07 samples/sec   Loss 1.0164   LearningRate 0.0013   Epoch: 17   Global Step: 219570   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:47:00,781-Speed 3326.79 samples/sec   Loss 1.0373   LearningRate 0.0013   Epoch: 17   Global Step: 219580   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:47:03,868-Speed 3319.15 samples/sec   Loss 1.0849   LearningRate 0.0013   Epoch: 17   Global Step: 219590   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:47:06,977-Speed 3294.17 samples/sec   Loss 1.0283   LearningRate 0.0013   Epoch: 17   Global Step: 219600   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:47:10,038-Speed 3346.76 samples/sec   Loss 1.0138   LearningRate 0.0013   Epoch: 17   Global Step: 219610   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:47:13,174-Speed 3266.45 samples/sec   Loss 1.0563   LearningRate 0.0013   Epoch: 17   Global Step: 219620   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:47:16,299-Speed 3277.59 samples/sec   Loss 1.0747   LearningRate 0.0013   Epoch: 17   Global Step: 219630   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:47:19,445-Speed 3255.54 samples/sec   Loss 1.0364   LearningRate 0.0013   Epoch: 17   Global Step: 219640   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:47:22,558-Speed 3291.38 samples/sec   Loss 1.0361   LearningRate 0.0013   Epoch: 17   Global Step: 219650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 20:47:25,682-Speed 3278.02 samples/sec   Loss 1.0476   LearningRate 0.0013   Epoch: 17   Global Step: 219660   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:47:28,814-Speed 3271.05 samples/sec   Loss 1.0608   LearningRate 0.0013   Epoch: 17   Global Step: 219670   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:47:31,929-Speed 3288.25 samples/sec   Loss 1.0425   LearningRate 0.0013   Epoch: 17   Global Step: 219680   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:47:35,008-Speed 3327.36 samples/sec   Loss 1.0425   LearningRate 0.0013   Epoch: 17   Global Step: 219690   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:47:38,219-Speed 3189.53 samples/sec   Loss 1.0545   LearningRate 0.0013   Epoch: 17   Global Step: 219700   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:47:41,312-Speed 3311.70 samples/sec   Loss 0.9943   LearningRate 0.0013   Epoch: 17   Global Step: 219710   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:47:44,422-Speed 3294.00 samples/sec   Loss 1.1096   LearningRate 0.0013   Epoch: 17   Global Step: 219720   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:47:47,544-Speed 3280.41 samples/sec   Loss 1.0438   LearningRate 0.0013   Epoch: 17   Global Step: 219730   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:47:50,609-Speed 3341.96 samples/sec   Loss 1.0901   LearningRate 0.0013   Epoch: 17   Global Step: 219740   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:47:53,740-Speed 3271.26 samples/sec   Loss 1.0566   LearningRate 0.0013   Epoch: 17   Global Step: 219750   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:47:56,796-Speed 3352.30 samples/sec   Loss 1.0711   LearningRate 0.0013   Epoch: 17   Global Step: 219760   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:47:59,890-Speed 3310.67 samples/sec   Loss 1.0441   LearningRate 0.0013   Epoch: 17   Global Step: 219770   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:48:02,984-Speed 3310.66 samples/sec   Loss 1.0825   LearningRate 0.0013   Epoch: 17   Global Step: 219780   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:48:06,142-Speed 3243.55 samples/sec   Loss 1.0693   LearningRate 0.0013   Epoch: 17   Global Step: 219790   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:48:09,219-Speed 3328.89 samples/sec   Loss 1.0764   LearningRate 0.0013   Epoch: 17   Global Step: 219800   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:48:12,316-Speed 3307.67 samples/sec   Loss 1.0358   LearningRate 0.0013   Epoch: 17   Global Step: 219810   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:48:15,471-Speed 3246.24 samples/sec   Loss 1.0925   LearningRate 0.0013   Epoch: 17   Global Step: 219820   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:48:18,664-Speed 3208.01 samples/sec   Loss 1.0726   LearningRate 0.0013   Epoch: 17   Global Step: 219830   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:48:21,740-Speed 3329.96 samples/sec   Loss 1.0556   LearningRate 0.0013   Epoch: 17   Global Step: 219840   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:48:24,944-Speed 3197.52 samples/sec   Loss 1.0932   LearningRate 0.0013   Epoch: 17   Global Step: 219850   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:48:28,064-Speed 3282.83 samples/sec   Loss 1.0567   LearningRate 0.0013   Epoch: 17   Global Step: 219860   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:48:31,198-Speed 3267.90 samples/sec   Loss 1.1014   LearningRate 0.0013   Epoch: 17   Global Step: 219870   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:48:34,277-Speed 3327.67 samples/sec   Loss 1.0560   LearningRate 0.0013   Epoch: 17   Global Step: 219880   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:48:37,395-Speed 3285.32 samples/sec   Loss 1.0514   LearningRate 0.0013   Epoch: 17   Global Step: 219890   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:48:40,552-Speed 3244.09 samples/sec   Loss 1.0821   LearningRate 0.0013   Epoch: 17   Global Step: 219900   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:48:43,667-Speed 3288.67 samples/sec   Loss 1.0803   LearningRate 0.0013   Epoch: 17   Global Step: 219910   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:48:46,739-Speed 3335.06 samples/sec   Loss 1.0568   LearningRate 0.0013   Epoch: 17   Global Step: 219920   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:48:49,865-Speed 3276.38 samples/sec   Loss 1.0895   LearningRate 0.0013   Epoch: 17   Global Step: 219930   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:48:53,037-Speed 3229.15 samples/sec   Loss 1.1182   LearningRate 0.0013   Epoch: 17   Global Step: 219940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 20:48:56,130-Speed 3311.42 samples/sec   Loss 1.0484   LearningRate 0.0013   Epoch: 17   Global Step: 219950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 20:48:59,350-Speed 3181.33 samples/sec   Loss 1.0192   LearningRate 0.0013   Epoch: 17   Global Step: 219960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 20:49:02,471-Speed 3281.85 samples/sec   Loss 1.0245   LearningRate 0.0013   Epoch: 17   Global Step: 219970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 20:49:05,603-Speed 3270.19 samples/sec   Loss 1.0422   LearningRate 0.0013   Epoch: 17   Global Step: 219980   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 20:49:08,713-Speed 3294.27 samples/sec   Loss 1.0597   LearningRate 0.0013   Epoch: 17   Global Step: 219990   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 20:49:11,845-Speed 3270.15 samples/sec   Loss 1.0698   LearningRate 0.0013   Epoch: 17   Global Step: 220000   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:49:14,995-Speed 3252.28 samples/sec   Loss 1.0288   LearningRate 0.0013   Epoch: 17   Global Step: 220010   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:49:18,179-Speed 3217.08 samples/sec   Loss 1.0697   LearningRate 0.0013   Epoch: 17   Global Step: 220020   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:49:21,272-Speed 3311.11 samples/sec   Loss 1.0575   LearningRate 0.0013   Epoch: 17   Global Step: 220030   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:49:24,401-Speed 3273.89 samples/sec   Loss 1.1025   LearningRate 0.0013   Epoch: 17   Global Step: 220040   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:49:27,559-Speed 3243.71 samples/sec   Loss 1.0386   LearningRate 0.0013   Epoch: 17   Global Step: 220050   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:49:30,642-Speed 3322.76 samples/sec   Loss 1.0044   LearningRate 0.0013   Epoch: 17   Global Step: 220060   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:49:33,801-Speed 3242.27 samples/sec   Loss 0.9969   LearningRate 0.0013   Epoch: 17   Global Step: 220070   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:49:36,902-Speed 3303.51 samples/sec   Loss 1.0516   LearningRate 0.0013   Epoch: 17   Global Step: 220080   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:49:39,983-Speed 3324.40 samples/sec   Loss 1.0481   LearningRate 0.0013   Epoch: 17   Global Step: 220090   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:49:43,081-Speed 3305.97 samples/sec   Loss 1.0293   LearningRate 0.0013   Epoch: 17   Global Step: 220100   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:49:46,183-Speed 3302.59 samples/sec   Loss 1.0754   LearningRate 0.0013   Epoch: 17   Global Step: 220110   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:49:49,357-Speed 3227.18 samples/sec   Loss 1.0640   LearningRate 0.0013   Epoch: 17   Global Step: 220120   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:49:52,477-Speed 3283.56 samples/sec   Loss 1.0342   LearningRate 0.0013   Epoch: 17   Global Step: 220130   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:49:55,572-Speed 3309.31 samples/sec   Loss 1.0654   LearningRate 0.0013   Epoch: 17   Global Step: 220140   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:49:58,655-Speed 3322.73 samples/sec   Loss 0.9977   LearningRate 0.0013   Epoch: 17   Global Step: 220150   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:50:01,788-Speed 3269.28 samples/sec   Loss 1.0563   LearningRate 0.0013   Epoch: 17   Global Step: 220160   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:50:04,907-Speed 3284.49 samples/sec   Loss 1.0733   LearningRate 0.0013   Epoch: 17   Global Step: 220170   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:50:08,030-Speed 3279.24 samples/sec   Loss 1.0271   LearningRate 0.0013   Epoch: 17   Global Step: 220180   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:50:11,141-Speed 3292.19 samples/sec   Loss 1.0232   LearningRate 0.0013   Epoch: 17   Global Step: 220190   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:50:14,265-Speed 3279.33 samples/sec   Loss 1.0846   LearningRate 0.0013   Epoch: 17   Global Step: 220200   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:50:17,355-Speed 3314.98 samples/sec   Loss 1.0626   LearningRate 0.0013   Epoch: 17   Global Step: 220210   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:50:20,474-Speed 3284.12 samples/sec   Loss 1.0414   LearningRate 0.0013   Epoch: 17   Global Step: 220220   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:50:23,575-Speed 3303.08 samples/sec   Loss 1.0406   LearningRate 0.0013   Epoch: 17   Global Step: 220230   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:50:26,691-Speed 3287.31 samples/sec   Loss 1.0402   LearningRate 0.0013   Epoch: 17   Global Step: 220240   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:50:29,880-Speed 3212.11 samples/sec   Loss 1.0643   LearningRate 0.0013   Epoch: 17   Global Step: 220250   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:50:32,979-Speed 3304.74 samples/sec   Loss 1.0730   LearningRate 0.0013   Epoch: 17   Global Step: 220260   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:50:36,064-Speed 3320.83 samples/sec   Loss 1.0389   LearningRate 0.0013   Epoch: 17   Global Step: 220270   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:50:39,185-Speed 3282.76 samples/sec   Loss 1.0281   LearningRate 0.0013   Epoch: 17   Global Step: 220280   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:50:42,381-Speed 3204.74 samples/sec   Loss 1.0990   LearningRate 0.0013   Epoch: 17   Global Step: 220290   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:50:45,467-Speed 3318.62 samples/sec   Loss 1.0642   LearningRate 0.0013   Epoch: 17   Global Step: 220300   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:50:48,591-Speed 3281.10 samples/sec   Loss 1.0389   LearningRate 0.0013   Epoch: 17   Global Step: 220310   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:50:51,692-Speed 3303.45 samples/sec   Loss 1.0426   LearningRate 0.0013   Epoch: 17   Global Step: 220320   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:50:54,809-Speed 3285.51 samples/sec   Loss 1.0489   LearningRate 0.0013   Epoch: 17   Global Step: 220330   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:50:57,898-Speed 3316.80 samples/sec   Loss 1.0657   LearningRate 0.0013   Epoch: 17   Global Step: 220340   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:51:01,014-Speed 3287.40 samples/sec   Loss 1.0838   LearningRate 0.0013   Epoch: 17   Global Step: 220350   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:51:04,142-Speed 3274.31 samples/sec   Loss 1.0917   LearningRate 0.0013   Epoch: 17   Global Step: 220360   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:51:07,223-Speed 3325.91 samples/sec   Loss 1.0292   LearningRate 0.0013   Epoch: 17   Global Step: 220370   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:51:10,276-Speed 3354.66 samples/sec   Loss 1.0321   LearningRate 0.0013   Epoch: 17   Global Step: 220380   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:51:13,465-Speed 3211.73 samples/sec   Loss 1.0518   LearningRate 0.0013   Epoch: 17   Global Step: 220390   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:51:16,644-Speed 3222.30 samples/sec   Loss 1.0673   LearningRate 0.0013   Epoch: 17   Global Step: 220400   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:51:19,719-Speed 3331.51 samples/sec   Loss 1.0722   LearningRate 0.0013   Epoch: 17   Global Step: 220410   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:51:22,793-Speed 3332.80 samples/sec   Loss 1.0390   LearningRate 0.0013   Epoch: 17   Global Step: 220420   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:51:25,859-Speed 3340.41 samples/sec   Loss 1.0608   LearningRate 0.0013   Epoch: 17   Global Step: 220430   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:51:28,982-Speed 3280.22 samples/sec   Loss 1.0645   LearningRate 0.0013   Epoch: 17   Global Step: 220440   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:51:32,091-Speed 3294.59 samples/sec   Loss 1.0427   LearningRate 0.0013   Epoch: 17   Global Step: 220450   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:51:35,155-Speed 3342.61 samples/sec   Loss 1.0400   LearningRate 0.0013   Epoch: 17   Global Step: 220460   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:51:38,295-Speed 3262.58 samples/sec   Loss 1.0609   LearningRate 0.0013   Epoch: 17   Global Step: 220470   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:51:41,424-Speed 3273.10 samples/sec   Loss 1.0285   LearningRate 0.0013   Epoch: 17   Global Step: 220480   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:51:44,517-Speed 3311.62 samples/sec   Loss 1.0022   LearningRate 0.0013   Epoch: 17   Global Step: 220490   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:51:47,605-Speed 3317.85 samples/sec   Loss 1.0512   LearningRate 0.0013   Epoch: 17   Global Step: 220500   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:51:50,703-Speed 3306.42 samples/sec   Loss 1.0364   LearningRate 0.0013   Epoch: 17   Global Step: 220510   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:51:53,780-Speed 3328.39 samples/sec   Loss 1.0455   LearningRate 0.0013   Epoch: 17   Global Step: 220520   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:51:56,872-Speed 3313.08 samples/sec   Loss 1.0102   LearningRate 0.0013   Epoch: 17   Global Step: 220530   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:51:59,944-Speed 3333.96 samples/sec   Loss 1.0464   LearningRate 0.0013   Epoch: 17   Global Step: 220540   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:52:03,154-Speed 3191.23 samples/sec   Loss 1.0521   LearningRate 0.0013   Epoch: 17   Global Step: 220550   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:52:06,282-Speed 3274.95 samples/sec   Loss 1.0489   LearningRate 0.0013   Epoch: 17   Global Step: 220560   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:52:09,351-Speed 3337.50 samples/sec   Loss 1.1009   LearningRate 0.0013   Epoch: 17   Global Step: 220570   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:52:12,400-Speed 3359.99 samples/sec   Loss 1.0423   LearningRate 0.0013   Epoch: 17   Global Step: 220580   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:52:15,547-Speed 3254.03 samples/sec   Loss 1.0494   LearningRate 0.0013   Epoch: 17   Global Step: 220590   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:52:18,728-Speed 3220.84 samples/sec   Loss 1.0780   LearningRate 0.0013   Epoch: 17   Global Step: 220600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 20:52:21,807-Speed 3326.39 samples/sec   Loss 1.0599   LearningRate 0.0013   Epoch: 17   Global Step: 220610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 20:52:24,870-Speed 3344.85 samples/sec   Loss 1.0349   LearningRate 0.0013   Epoch: 17   Global Step: 220620   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:52:27,958-Speed 3317.34 samples/sec   Loss 1.0177   LearningRate 0.0013   Epoch: 17   Global Step: 220630   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:52:31,040-Speed 3323.01 samples/sec   Loss 1.0365   LearningRate 0.0013   Epoch: 17   Global Step: 220640   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:52:34,122-Speed 3323.86 samples/sec   Loss 1.0528   LearningRate 0.0012   Epoch: 17   Global Step: 220650   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:52:37,275-Speed 3249.18 samples/sec   Loss 1.0238   LearningRate 0.0012   Epoch: 17   Global Step: 220660   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:52:40,469-Speed 3206.51 samples/sec   Loss 1.0211   LearningRate 0.0012   Epoch: 17   Global Step: 220670   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:52:43,561-Speed 3313.10 samples/sec   Loss 1.0745   LearningRate 0.0012   Epoch: 17   Global Step: 220680   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:52:46,646-Speed 3320.59 samples/sec   Loss 1.0525   LearningRate 0.0012   Epoch: 17   Global Step: 220690   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:52:49,818-Speed 3228.35 samples/sec   Loss 1.0365   LearningRate 0.0012   Epoch: 17   Global Step: 220700   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:52:52,953-Speed 3268.23 samples/sec   Loss 1.0554   LearningRate 0.0012   Epoch: 17   Global Step: 220710   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:52:56,056-Speed 3300.18 samples/sec   Loss 1.0685   LearningRate 0.0012   Epoch: 17   Global Step: 220720   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:52:59,179-Speed 3280.51 samples/sec   Loss 1.0731   LearningRate 0.0012   Epoch: 17   Global Step: 220730   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:53:02,313-Speed 3268.32 samples/sec   Loss 1.0346   LearningRate 0.0012   Epoch: 17   Global Step: 220740   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:53:05,526-Speed 3188.52 samples/sec   Loss 1.1002   LearningRate 0.0012   Epoch: 17   Global Step: 220750   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:53:08,600-Speed 3331.58 samples/sec   Loss 1.0077   LearningRate 0.0012   Epoch: 17   Global Step: 220760   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:53:11,694-Speed 3311.06 samples/sec   Loss 1.0450   LearningRate 0.0012   Epoch: 17   Global Step: 220770   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:53:14,804-Speed 3294.02 samples/sec   Loss 1.0585   LearningRate 0.0012   Epoch: 17   Global Step: 220780   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:53:17,954-Speed 3252.11 samples/sec   Loss 1.1032   LearningRate 0.0012   Epoch: 17   Global Step: 220790   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:53:21,059-Speed 3298.39 samples/sec   Loss 1.0457   LearningRate 0.0012   Epoch: 17   Global Step: 220800   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:53:24,143-Speed 3321.43 samples/sec   Loss 1.0447   LearningRate 0.0012   Epoch: 17   Global Step: 220810   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:53:27,302-Speed 3243.17 samples/sec   Loss 1.0330   LearningRate 0.0012   Epoch: 17   Global Step: 220820   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:53:30,480-Speed 3223.08 samples/sec   Loss 1.0257   LearningRate 0.0012   Epoch: 17   Global Step: 220830   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:53:33,592-Speed 3293.79 samples/sec   Loss 1.0338   LearningRate 0.0012   Epoch: 17   Global Step: 220840   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:53:36,762-Speed 3230.95 samples/sec   Loss 1.0619   LearningRate 0.0012   Epoch: 17   Global Step: 220850   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:53:39,865-Speed 3301.20 samples/sec   Loss 1.0758   LearningRate 0.0012   Epoch: 17   Global Step: 220860   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:53:42,998-Speed 3268.93 samples/sec   Loss 1.0331   LearningRate 0.0012   Epoch: 17   Global Step: 220870   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:53:46,064-Speed 3341.40 samples/sec   Loss 1.0623   LearningRate 0.0012   Epoch: 17   Global Step: 220880   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:53:49,284-Speed 3181.32 samples/sec   Loss 1.0535   LearningRate 0.0012   Epoch: 17   Global Step: 220890   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:53:52,431-Speed 3254.54 samples/sec   Loss 1.1170   LearningRate 0.0012   Epoch: 17   Global Step: 220900   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:53:55,616-Speed 3216.43 samples/sec   Loss 1.0508   LearningRate 0.0012   Epoch: 17   Global Step: 220910   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:53:58,709-Speed 3311.17 samples/sec   Loss 0.9729   LearningRate 0.0012   Epoch: 17   Global Step: 220920   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:54:01,831-Speed 3280.90 samples/sec   Loss 1.0474   LearningRate 0.0012   Epoch: 17   Global Step: 220930   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:54:04,971-Speed 3263.08 samples/sec   Loss 1.0488   LearningRate 0.0012   Epoch: 17   Global Step: 220940   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:54:08,104-Speed 3269.54 samples/sec   Loss 1.1049   LearningRate 0.0012   Epoch: 17   Global Step: 220950   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:54:11,227-Speed 3280.00 samples/sec   Loss 1.0989   LearningRate 0.0012   Epoch: 17   Global Step: 220960   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:54:14,390-Speed 3238.23 samples/sec   Loss 1.0104   LearningRate 0.0012   Epoch: 17   Global Step: 220970   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:54:17,530-Speed 3262.51 samples/sec   Loss 1.0560   LearningRate 0.0012   Epoch: 17   Global Step: 220980   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:54:20,616-Speed 3318.89 samples/sec   Loss 1.0475   LearningRate 0.0012   Epoch: 17   Global Step: 220990   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:54:23,724-Speed 3296.00 samples/sec   Loss 1.0316   LearningRate 0.0012   Epoch: 17   Global Step: 221000   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:54:26,848-Speed 3278.51 samples/sec   Loss 1.0717   LearningRate 0.0012   Epoch: 17   Global Step: 221010   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:54:29,999-Speed 3251.36 samples/sec   Loss 1.0854   LearningRate 0.0012   Epoch: 17   Global Step: 221020   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:54:33,102-Speed 3300.62 samples/sec   Loss 1.1071   LearningRate 0.0012   Epoch: 17   Global Step: 221030   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:54:36,207-Speed 3299.78 samples/sec   Loss 1.0104   LearningRate 0.0012   Epoch: 17   Global Step: 221040   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:54:39,351-Speed 3258.04 samples/sec   Loss 1.0430   LearningRate 0.0012   Epoch: 17   Global Step: 221050   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:54:42,507-Speed 3244.80 samples/sec   Loss 1.0427   LearningRate 0.0012   Epoch: 17   Global Step: 221060   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:54:45,609-Speed 3303.09 samples/sec   Loss 1.0370   LearningRate 0.0012   Epoch: 17   Global Step: 221070   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:54:48,725-Speed 3287.23 samples/sec   Loss 1.0659   LearningRate 0.0012   Epoch: 17   Global Step: 221080   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:54:51,867-Speed 3260.14 samples/sec   Loss 1.0644   LearningRate 0.0012   Epoch: 17   Global Step: 221090   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:54:55,018-Speed 3251.00 samples/sec   Loss 1.0305   LearningRate 0.0012   Epoch: 17   Global Step: 221100   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:54:58,073-Speed 3352.28 samples/sec   Loss 1.0277   LearningRate 0.0012   Epoch: 17   Global Step: 221110   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:55:01,212-Speed 3263.27 samples/sec   Loss 1.0578   LearningRate 0.0012   Epoch: 17   Global Step: 221120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 20:55:04,289-Speed 3328.89 samples/sec   Loss 1.0323   LearningRate 0.0012   Epoch: 17   Global Step: 221130   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:55:07,472-Speed 3217.89 samples/sec   Loss 1.0522   LearningRate 0.0012   Epoch: 17   Global Step: 221140   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:55:10,613-Speed 3261.88 samples/sec   Loss 1.0211   LearningRate 0.0012   Epoch: 17   Global Step: 221150   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:55:13,760-Speed 3254.42 samples/sec   Loss 1.0529   LearningRate 0.0012   Epoch: 17   Global Step: 221160   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:55:16,905-Speed 3257.63 samples/sec   Loss 1.0408   LearningRate 0.0012   Epoch: 17   Global Step: 221170   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:55:20,052-Speed 3254.43 samples/sec   Loss 1.0609   LearningRate 0.0012   Epoch: 17   Global Step: 221180   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:55:23,184-Speed 3270.96 samples/sec   Loss 1.0086   LearningRate 0.0012   Epoch: 17   Global Step: 221190   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:55:26,294-Speed 3294.03 samples/sec   Loss 1.0917   LearningRate 0.0012   Epoch: 17   Global Step: 221200   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:55:29,395-Speed 3303.21 samples/sec   Loss 1.0952   LearningRate 0.0012   Epoch: 17   Global Step: 221210   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:55:32,478-Speed 3322.62 samples/sec   Loss 1.0348   LearningRate 0.0012   Epoch: 17   Global Step: 221220   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:55:35,578-Speed 3303.68 samples/sec   Loss 1.0801   LearningRate 0.0012   Epoch: 17   Global Step: 221230   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:55:38,682-Speed 3299.77 samples/sec   Loss 1.0446   LearningRate 0.0012   Epoch: 17   Global Step: 221240   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:55:41,870-Speed 3212.93 samples/sec   Loss 1.0312   LearningRate 0.0012   Epoch: 17   Global Step: 221250   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:55:44,949-Speed 3327.50 samples/sec   Loss 1.0236   LearningRate 0.0012   Epoch: 17   Global Step: 221260   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:55:48,062-Speed 3290.87 samples/sec   Loss 1.0160   LearningRate 0.0012   Epoch: 17   Global Step: 221270   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:55:51,164-Speed 3301.56 samples/sec   Loss 1.0629   LearningRate 0.0012   Epoch: 17   Global Step: 221280   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:55:54,297-Speed 3270.14 samples/sec   Loss 1.0309   LearningRate 0.0012   Epoch: 17   Global Step: 221290   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:55:57,348-Speed 3357.53 samples/sec   Loss 1.1011   LearningRate 0.0012   Epoch: 17   Global Step: 221300   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:56:00,450-Speed 3301.26 samples/sec   Loss 1.0794   LearningRate 0.0012   Epoch: 17   Global Step: 221310   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:56:03,663-Speed 3188.01 samples/sec   Loss 1.0366   LearningRate 0.0012   Epoch: 17   Global Step: 221320   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:56:06,814-Speed 3250.59 samples/sec   Loss 1.0634   LearningRate 0.0012   Epoch: 17   Global Step: 221330   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:56:09,886-Speed 3335.29 samples/sec   Loss 1.0585   LearningRate 0.0012   Epoch: 17   Global Step: 221340   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:56:13,142-Speed 3145.66 samples/sec   Loss 1.0148   LearningRate 0.0012   Epoch: 17   Global Step: 221350   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:56:16,323-Speed 3219.92 samples/sec   Loss 1.0700   LearningRate 0.0012   Epoch: 17   Global Step: 221360   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:56:19,500-Speed 3224.65 samples/sec   Loss 1.0993   LearningRate 0.0012   Epoch: 17   Global Step: 221370   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:56:22,568-Speed 3338.40 samples/sec   Loss 1.0552   LearningRate 0.0012   Epoch: 17   Global Step: 221380   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:56:25,749-Speed 3219.65 samples/sec   Loss 1.0502   LearningRate 0.0012   Epoch: 17   Global Step: 221390   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:56:28,883-Speed 3269.43 samples/sec   Loss 1.1052   LearningRate 0.0012   Epoch: 17   Global Step: 221400   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:56:31,969-Speed 3319.14 samples/sec   Loss 1.0678   LearningRate 0.0012   Epoch: 17   Global Step: 221410   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:56:35,103-Speed 3267.92 samples/sec   Loss 1.0303   LearningRate 0.0012   Epoch: 17   Global Step: 221420   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:56:38,270-Speed 3234.74 samples/sec   Loss 1.0205   LearningRate 0.0012   Epoch: 17   Global Step: 221430   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:56:41,386-Speed 3286.87 samples/sec   Loss 1.0142   LearningRate 0.0012   Epoch: 17   Global Step: 221440   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:56:44,519-Speed 3269.95 samples/sec   Loss 1.0634   LearningRate 0.0012   Epoch: 17   Global Step: 221450   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:56:47,626-Speed 3297.33 samples/sec   Loss 1.1028   LearningRate 0.0012   Epoch: 17   Global Step: 221460   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:56:50,795-Speed 3232.09 samples/sec   Loss 1.0915   LearningRate 0.0012   Epoch: 17   Global Step: 221470   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:56:53,945-Speed 3251.92 samples/sec   Loss 1.0478   LearningRate 0.0012   Epoch: 17   Global Step: 221480   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:56:57,022-Speed 3329.00 samples/sec   Loss 1.0383   LearningRate 0.0012   Epoch: 17   Global Step: 221490   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:57:00,078-Speed 3351.64 samples/sec   Loss 1.0478   LearningRate 0.0012   Epoch: 17   Global Step: 221500   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:57:03,162-Speed 3322.10 samples/sec   Loss 1.0238   LearningRate 0.0012   Epoch: 17   Global Step: 221510   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:57:06,293-Speed 3271.40 samples/sec   Loss 1.0498   LearningRate 0.0012   Epoch: 17   Global Step: 221520   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:57:09,391-Speed 3307.06 samples/sec   Loss 1.0546   LearningRate 0.0012   Epoch: 17   Global Step: 221530   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:57:12,566-Speed 3225.71 samples/sec   Loss 1.0519   LearningRate 0.0012   Epoch: 17   Global Step: 221540   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:57:15,684-Speed 3285.11 samples/sec   Loss 1.0838   LearningRate 0.0012   Epoch: 17   Global Step: 221550   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:57:18,781-Speed 3307.48 samples/sec   Loss 1.0581   LearningRate 0.0012   Epoch: 17   Global Step: 221560   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:57:21,888-Speed 3297.34 samples/sec   Loss 1.0327   LearningRate 0.0012   Epoch: 17   Global Step: 221570   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:57:25,008-Speed 3282.34 samples/sec   Loss 1.0402   LearningRate 0.0012   Epoch: 17   Global Step: 221580   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:57:28,098-Speed 3315.15 samples/sec   Loss 1.0416   LearningRate 0.0012   Epoch: 17   Global Step: 221590   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:57:31,212-Speed 3289.61 samples/sec   Loss 1.0420   LearningRate 0.0012   Epoch: 17   Global Step: 221600   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:57:34,279-Speed 3339.07 samples/sec   Loss 1.0608   LearningRate 0.0012   Epoch: 17   Global Step: 221610   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:57:37,364-Speed 3320.51 samples/sec   Loss 0.9914   LearningRate 0.0012   Epoch: 17   Global Step: 221620   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:57:40,538-Speed 3227.41 samples/sec   Loss 1.0300   LearningRate 0.0012   Epoch: 17   Global Step: 221630   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:57:43,632-Speed 3311.18 samples/sec   Loss 1.0047   LearningRate 0.0012   Epoch: 17   Global Step: 221640   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:57:46,729-Speed 3307.25 samples/sec   Loss 1.0181   LearningRate 0.0012   Epoch: 17   Global Step: 221650   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:57:49,872-Speed 3259.08 samples/sec   Loss 1.0089   LearningRate 0.0012   Epoch: 17   Global Step: 221660   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:57:53,035-Speed 3238.23 samples/sec   Loss 1.0544   LearningRate 0.0012   Epoch: 17   Global Step: 221670   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:57:56,136-Speed 3303.59 samples/sec   Loss 1.0505   LearningRate 0.0012   Epoch: 17   Global Step: 221680   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:57:59,250-Speed 3289.32 samples/sec   Loss 1.0577   LearningRate 0.0012   Epoch: 17   Global Step: 221690   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:58:02,417-Speed 3234.81 samples/sec   Loss 1.0927   LearningRate 0.0012   Epoch: 17   Global Step: 221700   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:58:05,537-Speed 3282.61 samples/sec   Loss 1.0489   LearningRate 0.0012   Epoch: 17   Global Step: 221710   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:58:08,642-Speed 3298.69 samples/sec   Loss 1.0436   LearningRate 0.0012   Epoch: 17   Global Step: 221720   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:58:11,747-Speed 3298.95 samples/sec   Loss 1.0296   LearningRate 0.0012   Epoch: 17   Global Step: 221730   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:58:14,878-Speed 3272.09 samples/sec   Loss 1.0689   LearningRate 0.0012   Epoch: 17   Global Step: 221740   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:58:17,959-Speed 3324.13 samples/sec   Loss 1.0392   LearningRate 0.0012   Epoch: 17   Global Step: 221750   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:58:21,033-Speed 3332.97 samples/sec   Loss 1.0198   LearningRate 0.0012   Epoch: 17   Global Step: 221760   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:58:24,098-Speed 3341.42 samples/sec   Loss 1.0751   LearningRate 0.0012   Epoch: 17   Global Step: 221770   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:58:27,166-Speed 3338.84 samples/sec   Loss 1.0545   LearningRate 0.0011   Epoch: 17   Global Step: 221780   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:58:30,248-Speed 3323.65 samples/sec   Loss 1.0893   LearningRate 0.0011   Epoch: 17   Global Step: 221790   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:58:33,312-Speed 3343.21 samples/sec   Loss 0.9900   LearningRate 0.0011   Epoch: 17   Global Step: 221800   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:58:36,379-Speed 3339.84 samples/sec   Loss 1.0518   LearningRate 0.0011   Epoch: 17   Global Step: 221810   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:58:39,467-Speed 3316.61 samples/sec   Loss 1.0238   LearningRate 0.0011   Epoch: 17   Global Step: 221820   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-27 20:58:42,542-Speed 3331.50 samples/sec   Loss 1.0916   LearningRate 0.0011   Epoch: 17   Global Step: 221830   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-27 20:58:45,606-Speed 3343.49 samples/sec   Loss 1.0652   LearningRate 0.0011   Epoch: 17   Global Step: 221840   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-27 20:58:48,659-Speed 3354.75 samples/sec   Loss 1.0115   LearningRate 0.0011   Epoch: 17   Global Step: 221850   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-27 20:58:51,731-Speed 3334.47 samples/sec   Loss 1.0124   LearningRate 0.0011   Epoch: 17   Global Step: 221860   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-27 20:58:54,815-Speed 3321.02 samples/sec   Loss 1.0178   LearningRate 0.0011   Epoch: 17   Global Step: 221870   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-27 20:58:57,880-Speed 3357.90 samples/sec   Loss 1.0170   LearningRate 0.0011   Epoch: 17   Global Step: 221880   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-27 20:59:00,984-Speed 3300.15 samples/sec   Loss 1.0307   LearningRate 0.0011   Epoch: 17   Global Step: 221890   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-27 20:59:04,154-Speed 3230.82 samples/sec   Loss 1.0324   LearningRate 0.0011   Epoch: 17   Global Step: 221900   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-27 20:59:07,270-Speed 3286.86 samples/sec   Loss 1.0608   LearningRate 0.0011   Epoch: 17   Global Step: 221910   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-27 20:59:10,351-Speed 3325.07 samples/sec   Loss 1.0305   LearningRate 0.0011   Epoch: 17   Global Step: 221920   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:59:13,486-Speed 3267.26 samples/sec   Loss 1.0173   LearningRate 0.0011   Epoch: 17   Global Step: 221930   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:59:16,641-Speed 3246.53 samples/sec   Loss 1.0133   LearningRate 0.0011   Epoch: 17   Global Step: 221940   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:59:19,766-Speed 3277.88 samples/sec   Loss 1.0304   LearningRate 0.0011   Epoch: 17   Global Step: 221950   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:59:22,868-Speed 3301.61 samples/sec   Loss 1.0796   LearningRate 0.0011   Epoch: 17   Global Step: 221960   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:59:26,043-Speed 3227.10 samples/sec   Loss 1.0697   LearningRate 0.0011   Epoch: 17   Global Step: 221970   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:59:29,170-Speed 3275.72 samples/sec   Loss 1.0534   LearningRate 0.0011   Epoch: 17   Global Step: 221980   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:59:32,251-Speed 3324.59 samples/sec   Loss 1.0466   LearningRate 0.0011   Epoch: 17   Global Step: 221990   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:59:35,318-Speed 3339.98 samples/sec   Loss 1.0829   LearningRate 0.0011   Epoch: 17   Global Step: 222000   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:59:38,456-Speed 3263.81 samples/sec   Loss 1.0488   LearningRate 0.0011   Epoch: 17   Global Step: 222010   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 20:59:41,544-Speed 3316.77 samples/sec   Loss 1.0499   LearningRate 0.0011   Epoch: 17   Global Step: 222020   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:59:44,626-Speed 3323.48 samples/sec   Loss 1.0756   LearningRate 0.0011   Epoch: 17   Global Step: 222030   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:59:47,704-Speed 3329.02 samples/sec   Loss 1.0641   LearningRate 0.0011   Epoch: 17   Global Step: 222040   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:59:50,782-Speed 3327.55 samples/sec   Loss 1.0426   LearningRate 0.0011   Epoch: 17   Global Step: 222050   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:59:53,999-Speed 3183.81 samples/sec   Loss 1.0530   LearningRate 0.0011   Epoch: 17   Global Step: 222060   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 20:59:57,091-Speed 3313.07 samples/sec   Loss 1.0238   LearningRate 0.0011   Epoch: 17   Global Step: 222070   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:00:00,178-Speed 3318.73 samples/sec   Loss 1.0334   LearningRate 0.0011   Epoch: 17   Global Step: 222080   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:00:03,355-Speed 3224.02 samples/sec   Loss 1.0682   LearningRate 0.0011   Epoch: 17   Global Step: 222090   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:00:06,433-Speed 3327.52 samples/sec   Loss 1.0276   LearningRate 0.0011   Epoch: 17   Global Step: 222100   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:00:09,503-Speed 3336.94 samples/sec   Loss 1.0405   LearningRate 0.0011   Epoch: 17   Global Step: 222110   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:00:12,600-Speed 3307.08 samples/sec   Loss 1.0318   LearningRate 0.0011   Epoch: 17   Global Step: 222120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:00:15,726-Speed 3277.54 samples/sec   Loss 1.0556   LearningRate 0.0011   Epoch: 17   Global Step: 222130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:00:18,849-Speed 3279.78 samples/sec   Loss 1.0970   LearningRate 0.0011   Epoch: 17   Global Step: 222140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:00:21,911-Speed 3345.29 samples/sec   Loss 1.0442   LearningRate 0.0011   Epoch: 17   Global Step: 222150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:00:25,061-Speed 3251.15 samples/sec   Loss 1.0697   LearningRate 0.0011   Epoch: 17   Global Step: 222160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:00:28,160-Speed 3305.78 samples/sec   Loss 1.0510   LearningRate 0.0011   Epoch: 17   Global Step: 222170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:00:31,300-Speed 3261.62 samples/sec   Loss 1.0321   LearningRate 0.0011   Epoch: 17   Global Step: 222180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:00:34,321-Speed 3391.39 samples/sec   Loss 1.0337   LearningRate 0.0011   Epoch: 17   Global Step: 222190   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:00:37,394-Speed 3333.68 samples/sec   Loss 1.0216   LearningRate 0.0011   Epoch: 17   Global Step: 222200   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:00:40,473-Speed 3326.52 samples/sec   Loss 1.0267   LearningRate 0.0011   Epoch: 17   Global Step: 222210   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:00:43,564-Speed 3313.43 samples/sec   Loss 1.0353   LearningRate 0.0011   Epoch: 17   Global Step: 222220   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:00:46,657-Speed 3312.51 samples/sec   Loss 1.0343   LearningRate 0.0011   Epoch: 17   Global Step: 222230   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:00:49,736-Speed 3326.12 samples/sec   Loss 1.0741   LearningRate 0.0011   Epoch: 17   Global Step: 222240   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:00:52,796-Speed 3347.61 samples/sec   Loss 1.0071   LearningRate 0.0011   Epoch: 17   Global Step: 222250   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:00:55,874-Speed 3328.29 samples/sec   Loss 1.0236   LearningRate 0.0011   Epoch: 17   Global Step: 222260   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:00:58,938-Speed 3342.97 samples/sec   Loss 1.0637   LearningRate 0.0011   Epoch: 17   Global Step: 222270   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:01:02,008-Speed 3336.75 samples/sec   Loss 1.0360   LearningRate 0.0011   Epoch: 17   Global Step: 222280   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:01:05,070-Speed 3344.68 samples/sec   Loss 1.0394   LearningRate 0.0011   Epoch: 17   Global Step: 222290   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:01:08,132-Speed 3345.95 samples/sec   Loss 1.0347   LearningRate 0.0011   Epoch: 17   Global Step: 222300   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:01:11,220-Speed 3316.77 samples/sec   Loss 1.0461   LearningRate 0.0011   Epoch: 17   Global Step: 222310   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:01:14,339-Speed 3284.07 samples/sec   Loss 1.0644   LearningRate 0.0011   Epoch: 17   Global Step: 222320   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:01:17,449-Speed 3293.88 samples/sec   Loss 1.0647   LearningRate 0.0011   Epoch: 17   Global Step: 222330   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:01:20,558-Speed 3295.23 samples/sec   Loss 1.0661   LearningRate 0.0011   Epoch: 17   Global Step: 222340   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:01:23,654-Speed 3307.87 samples/sec   Loss 1.0097   LearningRate 0.0011   Epoch: 17   Global Step: 222350   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:01:26,789-Speed 3267.54 samples/sec   Loss 1.0585   LearningRate 0.0011   Epoch: 17   Global Step: 222360   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:01:29,905-Speed 3287.75 samples/sec   Loss 1.0331   LearningRate 0.0011   Epoch: 17   Global Step: 222370   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:01:32,988-Speed 3321.84 samples/sec   Loss 1.0519   LearningRate 0.0011   Epoch: 17   Global Step: 222380   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:01:36,097-Speed 3294.98 samples/sec   Loss 1.0372   LearningRate 0.0011   Epoch: 17   Global Step: 222390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:01:39,220-Speed 3279.62 samples/sec   Loss 1.0520   LearningRate 0.0011   Epoch: 17   Global Step: 222400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:01:42,445-Speed 3177.15 samples/sec   Loss 1.0499   LearningRate 0.0011   Epoch: 17   Global Step: 222410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:01:45,505-Speed 3347.35 samples/sec   Loss 1.0424   LearningRate 0.0011   Epoch: 17   Global Step: 222420   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:01:48,612-Speed 3296.67 samples/sec   Loss 1.0589   LearningRate 0.0011   Epoch: 17   Global Step: 222430   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:01:51,740-Speed 3275.56 samples/sec   Loss 1.0284   LearningRate 0.0011   Epoch: 17   Global Step: 222440   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:01:54,890-Speed 3250.93 samples/sec   Loss 1.0441   LearningRate 0.0011   Epoch: 17   Global Step: 222450   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:01:57,984-Speed 3310.93 samples/sec   Loss 1.0284   LearningRate 0.0011   Epoch: 17   Global Step: 222460   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:02:01,090-Speed 3297.81 samples/sec   Loss 1.0577   LearningRate 0.0011   Epoch: 17   Global Step: 222470   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:02:04,225-Speed 3267.41 samples/sec   Loss 1.0456   LearningRate 0.0011   Epoch: 17   Global Step: 222480   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:02:07,329-Speed 3300.76 samples/sec   Loss 1.0451   LearningRate 0.0011   Epoch: 17   Global Step: 222490   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:02:10,429-Speed 3303.52 samples/sec   Loss 1.0339   LearningRate 0.0011   Epoch: 17   Global Step: 222500   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:02:13,556-Speed 3275.75 samples/sec   Loss 1.0726   LearningRate 0.0011   Epoch: 17   Global Step: 222510   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:02:16,619-Speed 3344.73 samples/sec   Loss 1.0834   LearningRate 0.0011   Epoch: 17   Global Step: 222520   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:02:19,733-Speed 3289.48 samples/sec   Loss 1.0406   LearningRate 0.0011   Epoch: 17   Global Step: 222530   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:02:22,821-Speed 3316.76 samples/sec   Loss 1.0451   LearningRate 0.0011   Epoch: 17   Global Step: 222540   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:02:25,922-Speed 3303.13 samples/sec   Loss 1.0300   LearningRate 0.0011   Epoch: 17   Global Step: 222550   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:02:29,074-Speed 3249.50 samples/sec   Loss 1.0261   LearningRate 0.0011   Epoch: 17   Global Step: 222560   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:02:32,147-Speed 3333.22 samples/sec   Loss 1.0623   LearningRate 0.0011   Epoch: 17   Global Step: 222570   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:02:35,229-Speed 3324.34 samples/sec   Loss 1.0891   LearningRate 0.0011   Epoch: 17   Global Step: 222580   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:02:38,356-Speed 3276.26 samples/sec   Loss 1.0522   LearningRate 0.0011   Epoch: 17   Global Step: 222590   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:02:41,415-Speed 3348.19 samples/sec   Loss 1.0360   LearningRate 0.0011   Epoch: 17   Global Step: 222600   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:02:44,533-Speed 3286.04 samples/sec   Loss 1.0494   LearningRate 0.0011   Epoch: 17   Global Step: 222610   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:02:47,624-Speed 3313.31 samples/sec   Loss 1.0413   LearningRate 0.0011   Epoch: 17   Global Step: 222620   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:02:50,813-Speed 3212.70 samples/sec   Loss 0.9841   LearningRate 0.0011   Epoch: 17   Global Step: 222630   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:02:53,979-Speed 3234.72 samples/sec   Loss 1.0313   LearningRate 0.0011   Epoch: 17   Global Step: 222640   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:02:57,031-Speed 3356.36 samples/sec   Loss 1.0812   LearningRate 0.0011   Epoch: 17   Global Step: 222650   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:03:00,177-Speed 3256.05 samples/sec   Loss 1.0482   LearningRate 0.0011   Epoch: 17   Global Step: 222660   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:03:03,275-Speed 3307.21 samples/sec   Loss 1.0504   LearningRate 0.0011   Epoch: 17   Global Step: 222670   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:03:06,388-Speed 3289.64 samples/sec   Loss 0.9976   LearningRate 0.0011   Epoch: 17   Global Step: 222680   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:03:09,480-Speed 3313.28 samples/sec   Loss 1.0711   LearningRate 0.0011   Epoch: 17   Global Step: 222690   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:03:12,583-Speed 3300.58 samples/sec   Loss 1.0007   LearningRate 0.0011   Epoch: 17   Global Step: 222700   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:03:15,696-Speed 3290.95 samples/sec   Loss 1.0449   LearningRate 0.0011   Epoch: 17   Global Step: 222710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:03:18,756-Speed 3347.72 samples/sec   Loss 1.0456   LearningRate 0.0011   Epoch: 17   Global Step: 222720   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:03:21,857-Speed 3303.04 samples/sec   Loss 1.0826   LearningRate 0.0011   Epoch: 17   Global Step: 222730   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:03:24,919-Speed 3345.73 samples/sec   Loss 1.0023   LearningRate 0.0011   Epoch: 17   Global Step: 222740   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:03:28,087-Speed 3232.49 samples/sec   Loss 1.0227   LearningRate 0.0011   Epoch: 17   Global Step: 222750   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:03:31,239-Speed 3250.06 samples/sec   Loss 1.0643   LearningRate 0.0011   Epoch: 17   Global Step: 222760   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:03:34,336-Speed 3307.18 samples/sec   Loss 1.0769   LearningRate 0.0011   Epoch: 17   Global Step: 222770   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:03:37,512-Speed 3225.47 samples/sec   Loss 1.0676   LearningRate 0.0011   Epoch: 17   Global Step: 222780   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:03:40,617-Speed 3299.50 samples/sec   Loss 1.0510   LearningRate 0.0011   Epoch: 17   Global Step: 222790   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:03:43,713-Speed 3308.80 samples/sec   Loss 1.0283   LearningRate 0.0011   Epoch: 17   Global Step: 222800   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:03:46,786-Speed 3333.35 samples/sec   Loss 1.0741   LearningRate 0.0011   Epoch: 17   Global Step: 222810   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:03:49,855-Speed 3336.91 samples/sec   Loss 1.0240   LearningRate 0.0011   Epoch: 17   Global Step: 222820   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:03:52,957-Speed 3303.24 samples/sec   Loss 1.0422   LearningRate 0.0011   Epoch: 17   Global Step: 222830   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:03:56,067-Speed 3293.02 samples/sec   Loss 1.0259   LearningRate 0.0011   Epoch: 17   Global Step: 222840   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:03:59,163-Speed 3308.51 samples/sec   Loss 0.9811   LearningRate 0.0011   Epoch: 17   Global Step: 222850   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:04:02,308-Speed 3257.26 samples/sec   Loss 1.0773   LearningRate 0.0011   Epoch: 17   Global Step: 222860   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:04:05,471-Speed 3238.23 samples/sec   Loss 1.0375   LearningRate 0.0011   Epoch: 17   Global Step: 222870   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:04:08,553-Speed 3323.04 samples/sec   Loss 1.0385   LearningRate 0.0011   Epoch: 17   Global Step: 222880   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:04:11,641-Speed 3317.62 samples/sec   Loss 1.0347   LearningRate 0.0011   Epoch: 17   Global Step: 222890   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:04:14,716-Speed 3331.55 samples/sec   Loss 1.0724   LearningRate 0.0011   Epoch: 17   Global Step: 222900   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:04:17,778-Speed 3344.75 samples/sec   Loss 1.0193   LearningRate 0.0011   Epoch: 17   Global Step: 222910   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:04:20,858-Speed 3325.94 samples/sec   Loss 1.0497   LearningRate 0.0011   Epoch: 17   Global Step: 222920   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:04:23,975-Speed 3286.78 samples/sec   Loss 1.0586   LearningRate 0.0011   Epoch: 17   Global Step: 222930   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:04:27,164-Speed 3211.57 samples/sec   Loss 1.0450   LearningRate 0.0011   Epoch: 17   Global Step: 222940   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:04:30,301-Speed 3264.98 samples/sec   Loss 1.0289   LearningRate 0.0011   Epoch: 17   Global Step: 222950   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:04:33,365-Speed 3343.29 samples/sec   Loss 1.0498   LearningRate 0.0011   Epoch: 17   Global Step: 222960   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:04:36,507-Speed 3260.15 samples/sec   Loss 1.0233   LearningRate 0.0010   Epoch: 17   Global Step: 222970   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:04:39,592-Speed 3320.35 samples/sec   Loss 1.0154   LearningRate 0.0010   Epoch: 17   Global Step: 222980   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:04:42,816-Speed 3177.19 samples/sec   Loss 1.0179   LearningRate 0.0010   Epoch: 17   Global Step: 222990   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:04:45,888-Speed 3334.54 samples/sec   Loss 1.0477   LearningRate 0.0010   Epoch: 17   Global Step: 223000   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:04:48,932-Speed 3365.31 samples/sec   Loss 0.9990   LearningRate 0.0010   Epoch: 17   Global Step: 223010   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:04:52,023-Speed 3314.40 samples/sec   Loss 1.0154   LearningRate 0.0010   Epoch: 17   Global Step: 223020   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:04:55,146-Speed 3279.32 samples/sec   Loss 1.0252   LearningRate 0.0010   Epoch: 17   Global Step: 223030   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:04:58,232-Speed 3319.62 samples/sec   Loss 1.0574   LearningRate 0.0010   Epoch: 17   Global Step: 223040   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:05:01,288-Speed 3351.64 samples/sec   Loss 1.0217   LearningRate 0.0010   Epoch: 17   Global Step: 223050   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:05:04,381-Speed 3312.17 samples/sec   Loss 1.0699   LearningRate 0.0010   Epoch: 17   Global Step: 223060   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:05:07,472-Speed 3313.25 samples/sec   Loss 1.0129   LearningRate 0.0010   Epoch: 17   Global Step: 223070   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:05:10,543-Speed 3335.60 samples/sec   Loss 1.0165   LearningRate 0.0010   Epoch: 17   Global Step: 223080   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:05:13,689-Speed 3256.18 samples/sec   Loss 1.0667   LearningRate 0.0010   Epoch: 17   Global Step: 223090   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:05:16,771-Speed 3323.74 samples/sec   Loss 1.0307   LearningRate 0.0010   Epoch: 17   Global Step: 223100   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:05:19,854-Speed 3322.47 samples/sec   Loss 1.0306   LearningRate 0.0010   Epoch: 17   Global Step: 223110   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:05:22,966-Speed 3291.74 samples/sec   Loss 0.9923   LearningRate 0.0010   Epoch: 17   Global Step: 223120   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:05:26,141-Speed 3225.61 samples/sec   Loss 1.0256   LearningRate 0.0010   Epoch: 17   Global Step: 223130   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:05:29,202-Speed 3346.63 samples/sec   Loss 1.0310   LearningRate 0.0010   Epoch: 17   Global Step: 223140   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:05:32,344-Speed 3261.75 samples/sec   Loss 1.0340   LearningRate 0.0010   Epoch: 17   Global Step: 223150   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:05:35,451-Speed 3296.48 samples/sec   Loss 1.0490   LearningRate 0.0010   Epoch: 17   Global Step: 223160   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:05:38,594-Speed 3259.45 samples/sec   Loss 1.0224   LearningRate 0.0010   Epoch: 17   Global Step: 223170   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:05:41,695-Speed 3302.30 samples/sec   Loss 1.0217   LearningRate 0.0010   Epoch: 17   Global Step: 223180   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:05:44,808-Speed 3290.49 samples/sec   Loss 1.0580   LearningRate 0.0010   Epoch: 17   Global Step: 223190   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:05:47,906-Speed 3306.54 samples/sec   Loss 1.0563   LearningRate 0.0010   Epoch: 17   Global Step: 223200   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:05:51,023-Speed 3286.87 samples/sec   Loss 1.0879   LearningRate 0.0010   Epoch: 17   Global Step: 223210   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:05:54,187-Speed 3237.37 samples/sec   Loss 1.0101   LearningRate 0.0010   Epoch: 17   Global Step: 223220   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:05:57,280-Speed 3311.81 samples/sec   Loss 1.0727   LearningRate 0.0010   Epoch: 17   Global Step: 223230   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:06:00,355-Speed 3330.34 samples/sec   Loss 1.0180   LearningRate 0.0010   Epoch: 17   Global Step: 223240   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:06:03,443-Speed 3317.72 samples/sec   Loss 1.0509   LearningRate 0.0010   Epoch: 17   Global Step: 223250   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:06:06,539-Speed 3308.72 samples/sec   Loss 1.0022   LearningRate 0.0010   Epoch: 17   Global Step: 223260   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:06:09,645-Speed 3297.52 samples/sec   Loss 1.0384   LearningRate 0.0010   Epoch: 17   Global Step: 223270   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:06:12,806-Speed 3240.42 samples/sec   Loss 1.0321   LearningRate 0.0010   Epoch: 17   Global Step: 223280   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:06:15,916-Speed 3294.94 samples/sec   Loss 1.0345   LearningRate 0.0010   Epoch: 17   Global Step: 223290   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:06:19,067-Speed 3250.33 samples/sec   Loss 1.0160   LearningRate 0.0010   Epoch: 17   Global Step: 223300   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:06:22,137-Speed 3336.30 samples/sec   Loss 1.1052   LearningRate 0.0010   Epoch: 17   Global Step: 223310   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:06:25,232-Speed 3309.58 samples/sec   Loss 1.0084   LearningRate 0.0010   Epoch: 17   Global Step: 223320   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:06:28,365-Speed 3270.30 samples/sec   Loss 1.0071   LearningRate 0.0010   Epoch: 17   Global Step: 223330   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:06:31,438-Speed 3333.22 samples/sec   Loss 1.0566   LearningRate 0.0010   Epoch: 17   Global Step: 223340   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:06:34,577-Speed 3263.21 samples/sec   Loss 1.0328   LearningRate 0.0010   Epoch: 17   Global Step: 223350   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:06:37,679-Speed 3302.47 samples/sec   Loss 1.0489   LearningRate 0.0010   Epoch: 17   Global Step: 223360   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:06:40,734-Speed 3352.88 samples/sec   Loss 1.0290   LearningRate 0.0010   Epoch: 17   Global Step: 223370   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:06:43,832-Speed 3306.13 samples/sec   Loss 1.0469   LearningRate 0.0010   Epoch: 17   Global Step: 223380   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-27 21:06:46,917-Speed 3320.66 samples/sec   Loss 1.0542   LearningRate 0.0010   Epoch: 17   Global Step: 223390   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-27 21:06:49,987-Speed 3336.46 samples/sec   Loss 1.0040   LearningRate 0.0010   Epoch: 17   Global Step: 223400   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-27 21:06:53,126-Speed 3263.33 samples/sec   Loss 1.0270   LearningRate 0.0010   Epoch: 17   Global Step: 223410   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-27 21:06:56,223-Speed 3307.63 samples/sec   Loss 1.0373   LearningRate 0.0010   Epoch: 17   Global Step: 223420   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-27 21:06:59,289-Speed 3341.26 samples/sec   Loss 1.0613   LearningRate 0.0010   Epoch: 17   Global Step: 223430   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-27 21:07:02,379-Speed 3314.70 samples/sec   Loss 1.0988   LearningRate 0.0010   Epoch: 17   Global Step: 223440   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-27 21:07:05,446-Speed 3340.45 samples/sec   Loss 1.0681   LearningRate 0.0010   Epoch: 17   Global Step: 223450   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-27 21:07:08,552-Speed 3297.83 samples/sec   Loss 1.0248   LearningRate 0.0010   Epoch: 17   Global Step: 223460   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-27 21:07:11,642-Speed 3314.57 samples/sec   Loss 1.0262   LearningRate 0.0010   Epoch: 17   Global Step: 223470   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-27 21:07:14,762-Speed 3282.65 samples/sec   Loss 1.0132   LearningRate 0.0010   Epoch: 17   Global Step: 223480   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:07:17,929-Speed 3235.08 samples/sec   Loss 0.9918   LearningRate 0.0010   Epoch: 17   Global Step: 223490   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:07:21,023-Speed 3310.26 samples/sec   Loss 1.0265   LearningRate 0.0010   Epoch: 17   Global Step: 223500   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:07:24,133-Speed 3293.95 samples/sec   Loss 1.0201   LearningRate 0.0010   Epoch: 17   Global Step: 223510   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:07:27,211-Speed 3327.95 samples/sec   Loss 1.0034   LearningRate 0.0010   Epoch: 17   Global Step: 223520   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:07:30,365-Speed 3248.16 samples/sec   Loss 1.0365   LearningRate 0.0010   Epoch: 17   Global Step: 223530   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:07:33,506-Speed 3260.62 samples/sec   Loss 1.0433   LearningRate 0.0010   Epoch: 17   Global Step: 223540   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:07:36,574-Speed 3338.78 samples/sec   Loss 1.0351   LearningRate 0.0010   Epoch: 17   Global Step: 223550   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:07:39,710-Speed 3266.17 samples/sec   Loss 1.0136   LearningRate 0.0010   Epoch: 17   Global Step: 223560   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:07:43,008-Speed 3105.88 samples/sec   Loss 1.0477   LearningRate 0.0010   Epoch: 17   Global Step: 223570   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:08:15,413-Speed 316.02 samples/sec   Loss 1.0138   LearningRate 0.0010   Epoch: 18   Global Step: 223580   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:08:18,526-Speed 3290.69 samples/sec   Loss 0.8162   LearningRate 0.0010   Epoch: 18   Global Step: 223590   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:08:21,820-Speed 3109.56 samples/sec   Loss 0.8567   LearningRate 0.0010   Epoch: 18   Global Step: 223600   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:08:24,924-Speed 3299.95 samples/sec   Loss 0.8893   LearningRate 0.0010   Epoch: 18   Global Step: 223610   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:08:28,057-Speed 3269.52 samples/sec   Loss 0.8859   LearningRate 0.0010   Epoch: 18   Global Step: 223620   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:08:31,150-Speed 3311.89 samples/sec   Loss 0.8467   LearningRate 0.0010   Epoch: 18   Global Step: 223630   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:08:34,251-Speed 3303.47 samples/sec   Loss 0.8408   LearningRate 0.0010   Epoch: 18   Global Step: 223640   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:08:37,376-Speed 3278.53 samples/sec   Loss 0.8468   LearningRate 0.0010   Epoch: 18   Global Step: 223650   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:08:40,510-Speed 3268.28 samples/sec   Loss 0.8317   LearningRate 0.0010   Epoch: 18   Global Step: 223660   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:08:43,643-Speed 3268.80 samples/sec   Loss 0.8397   LearningRate 0.0010   Epoch: 18   Global Step: 223670   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:08:46,830-Speed 3214.91 samples/sec   Loss 0.8295   LearningRate 0.0010   Epoch: 18   Global Step: 223680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:08:49,914-Speed 3321.37 samples/sec   Loss 0.8296   LearningRate 0.0010   Epoch: 18   Global Step: 223690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:08:53,023-Speed 3294.50 samples/sec   Loss 0.8791   LearningRate 0.0010   Epoch: 18   Global Step: 223700   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:08:56,114-Speed 3313.78 samples/sec   Loss 0.8525   LearningRate 0.0010   Epoch: 18   Global Step: 223710   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:08:59,218-Speed 3299.54 samples/sec   Loss 0.8850   LearningRate 0.0010   Epoch: 18   Global Step: 223720   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:09:02,380-Speed 3240.48 samples/sec   Loss 0.8934   LearningRate 0.0010   Epoch: 18   Global Step: 223730   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:09:05,467-Speed 3317.54 samples/sec   Loss 0.8024   LearningRate 0.0010   Epoch: 18   Global Step: 223740   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:09:08,535-Speed 3338.69 samples/sec   Loss 0.8434   LearningRate 0.0010   Epoch: 18   Global Step: 223750   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:09:11,610-Speed 3331.59 samples/sec   Loss 0.8840   LearningRate 0.0010   Epoch: 18   Global Step: 223760   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:09:14,682-Speed 3334.93 samples/sec   Loss 0.8895   LearningRate 0.0010   Epoch: 18   Global Step: 223770   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:09:17,813-Speed 3270.33 samples/sec   Loss 0.8685   LearningRate 0.0010   Epoch: 18   Global Step: 223780   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:09:20,925-Speed 3291.49 samples/sec   Loss 0.8430   LearningRate 0.0010   Epoch: 18   Global Step: 223790   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:09:24,043-Speed 3285.28 samples/sec   Loss 0.8320   LearningRate 0.0010   Epoch: 18   Global Step: 223800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:09:27,121-Speed 3328.76 samples/sec   Loss 0.8447   LearningRate 0.0010   Epoch: 18   Global Step: 223810   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:09:30,271-Speed 3250.96 samples/sec   Loss 0.8803   LearningRate 0.0010   Epoch: 18   Global Step: 223820   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:09:33,360-Speed 3317.21 samples/sec   Loss 0.8038   LearningRate 0.0010   Epoch: 18   Global Step: 223830   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:09:36,468-Speed 3295.19 samples/sec   Loss 0.8420   LearningRate 0.0010   Epoch: 18   Global Step: 223840   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:09:39,582-Speed 3290.01 samples/sec   Loss 0.8385   LearningRate 0.0010   Epoch: 18   Global Step: 223850   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:09:42,699-Speed 3285.36 samples/sec   Loss 0.8393   LearningRate 0.0010   Epoch: 18   Global Step: 223860   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:09:45,870-Speed 3230.58 samples/sec   Loss 0.8477   LearningRate 0.0010   Epoch: 18   Global Step: 223870   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:09:49,204-Speed 3072.40 samples/sec   Loss 0.8376   LearningRate 0.0010   Epoch: 18   Global Step: 223880   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:09:52,341-Speed 3265.19 samples/sec   Loss 0.8372   LearningRate 0.0010   Epoch: 18   Global Step: 223890   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:09:55,478-Speed 3265.80 samples/sec   Loss 0.8495   LearningRate 0.0010   Epoch: 18   Global Step: 223900   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:09:58,535-Speed 3350.10 samples/sec   Loss 0.8418   LearningRate 0.0010   Epoch: 18   Global Step: 223910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:10:01,624-Speed 3316.67 samples/sec   Loss 0.8502   LearningRate 0.0010   Epoch: 18   Global Step: 223920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:10:04,708-Speed 3320.97 samples/sec   Loss 0.8375   LearningRate 0.0010   Epoch: 18   Global Step: 223930   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:10:07,805-Speed 3307.41 samples/sec   Loss 0.8270   LearningRate 0.0010   Epoch: 18   Global Step: 223940   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:10:10,908-Speed 3301.15 samples/sec   Loss 0.8325   LearningRate 0.0010   Epoch: 18   Global Step: 223950   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:10:14,043-Speed 3267.64 samples/sec   Loss 0.8313   LearningRate 0.0010   Epoch: 18   Global Step: 223960   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:10:17,124-Speed 3324.30 samples/sec   Loss 0.8589   LearningRate 0.0010   Epoch: 18   Global Step: 223970   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:10:20,182-Speed 3349.84 samples/sec   Loss 0.8183   LearningRate 0.0010   Epoch: 18   Global Step: 223980   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:10:23,289-Speed 3296.42 samples/sec   Loss 0.8435   LearningRate 0.0010   Epoch: 18   Global Step: 223990   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:10:26,374-Speed 3321.17 samples/sec   Loss 0.8011   LearningRate 0.0010   Epoch: 18   Global Step: 224000   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:10:29,464-Speed 3315.13 samples/sec   Loss 0.8643   LearningRate 0.0010   Epoch: 18   Global Step: 224010   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:10:32,552-Speed 3317.02 samples/sec   Loss 0.8514   LearningRate 0.0010   Epoch: 18   Global Step: 224020   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:10:35,638-Speed 3319.34 samples/sec   Loss 0.8040   LearningRate 0.0010   Epoch: 18   Global Step: 224030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:10:38,740-Speed 3301.68 samples/sec   Loss 0.8526   LearningRate 0.0010   Epoch: 18   Global Step: 224040   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:10:41,821-Speed 3324.95 samples/sec   Loss 0.8211   LearningRate 0.0010   Epoch: 18   Global Step: 224050   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:10:44,896-Speed 3331.31 samples/sec   Loss 0.8569   LearningRate 0.0010   Epoch: 18   Global Step: 224060   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:10:48,084-Speed 3213.05 samples/sec   Loss 0.8448   LearningRate 0.0010   Epoch: 18   Global Step: 224070   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:10:51,238-Speed 3247.47 samples/sec   Loss 0.8241   LearningRate 0.0010   Epoch: 18   Global Step: 224080   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:10:54,410-Speed 3229.56 samples/sec   Loss 0.8389   LearningRate 0.0010   Epoch: 18   Global Step: 224090   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:10:57,498-Speed 3316.59 samples/sec   Loss 0.8168   LearningRate 0.0010   Epoch: 18   Global Step: 224100   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:11:00,632-Speed 3268.93 samples/sec   Loss 0.8293   LearningRate 0.0010   Epoch: 18   Global Step: 224110   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:11:03,725-Speed 3311.57 samples/sec   Loss 0.8218   LearningRate 0.0010   Epoch: 18   Global Step: 224120   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:11:06,866-Speed 3260.79 samples/sec   Loss 0.8387   LearningRate 0.0010   Epoch: 18   Global Step: 224130   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:11:09,935-Speed 3338.09 samples/sec   Loss 0.8444   LearningRate 0.0010   Epoch: 18   Global Step: 224140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:11:13,131-Speed 3205.22 samples/sec   Loss 0.8667   LearningRate 0.0010   Epoch: 18   Global Step: 224150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:11:16,286-Speed 3246.82 samples/sec   Loss 0.8611   LearningRate 0.0010   Epoch: 18   Global Step: 224160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:11:19,346-Speed 3347.13 samples/sec   Loss 0.8078   LearningRate 0.0010   Epoch: 18   Global Step: 224170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:11:22,405-Speed 3347.92 samples/sec   Loss 0.8350   LearningRate 0.0010   Epoch: 18   Global Step: 224180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:11:25,510-Speed 3299.64 samples/sec   Loss 0.8192   LearningRate 0.0010   Epoch: 18   Global Step: 224190   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:11:28,653-Speed 3259.04 samples/sec   Loss 0.8789   LearningRate 0.0010   Epoch: 18   Global Step: 224200   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:11:31,765-Speed 3291.96 samples/sec   Loss 0.8469   LearningRate 0.0009   Epoch: 18   Global Step: 224210   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:11:34,873-Speed 3294.97 samples/sec   Loss 0.8258   LearningRate 0.0009   Epoch: 18   Global Step: 224220   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:11:37,937-Speed 3343.48 samples/sec   Loss 0.8876   LearningRate 0.0009   Epoch: 18   Global Step: 224230   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:11:41,076-Speed 3262.94 samples/sec   Loss 0.8530   LearningRate 0.0009   Epoch: 18   Global Step: 224240   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:11:44,244-Speed 3233.35 samples/sec   Loss 0.8744   LearningRate 0.0009   Epoch: 18   Global Step: 224250   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:11:47,307-Speed 3344.94 samples/sec   Loss 0.8580   LearningRate 0.0009   Epoch: 18   Global Step: 224260   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:11:50,387-Speed 3325.64 samples/sec   Loss 0.8507   LearningRate 0.0009   Epoch: 18   Global Step: 224270   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:11:53,562-Speed 3225.78 samples/sec   Loss 0.8365   LearningRate 0.0009   Epoch: 18   Global Step: 224280   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:11:56,650-Speed 3317.96 samples/sec   Loss 0.8426   LearningRate 0.0009   Epoch: 18   Global Step: 224290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:11:59,742-Speed 3312.62 samples/sec   Loss 0.8585   LearningRate 0.0009   Epoch: 18   Global Step: 224300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:12:02,857-Speed 3287.99 samples/sec   Loss 0.8529   LearningRate 0.0009   Epoch: 18   Global Step: 224310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:12:05,969-Speed 3292.15 samples/sec   Loss 0.8337   LearningRate 0.0009   Epoch: 18   Global Step: 224320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:12:09,023-Speed 3353.96 samples/sec   Loss 0.8784   LearningRate 0.0009   Epoch: 18   Global Step: 224330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:12:12,120-Speed 3306.50 samples/sec   Loss 0.8237   LearningRate 0.0009   Epoch: 18   Global Step: 224340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:12:15,211-Speed 3314.28 samples/sec   Loss 0.8311   LearningRate 0.0009   Epoch: 18   Global Step: 224350   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:12:18,310-Speed 3305.74 samples/sec   Loss 0.8466   LearningRate 0.0009   Epoch: 18   Global Step: 224360   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:12:21,380-Speed 3335.98 samples/sec   Loss 0.8270   LearningRate 0.0009   Epoch: 18   Global Step: 224370   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:12:24,485-Speed 3299.07 samples/sec   Loss 0.8157   LearningRate 0.0009   Epoch: 18   Global Step: 224380   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:12:27,595-Speed 3293.71 samples/sec   Loss 0.8455   LearningRate 0.0009   Epoch: 18   Global Step: 224390   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:12:30,728-Speed 3270.11 samples/sec   Loss 0.8393   LearningRate 0.0009   Epoch: 18   Global Step: 224400   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:12:33,814-Speed 3319.37 samples/sec   Loss 0.8635   LearningRate 0.0009   Epoch: 18   Global Step: 224410   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:12:36,941-Speed 3275.21 samples/sec   Loss 0.8331   LearningRate 0.0009   Epoch: 18   Global Step: 224420   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:12:40,047-Speed 3297.68 samples/sec   Loss 0.8513   LearningRate 0.0009   Epoch: 18   Global Step: 224430   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:12:43,164-Speed 3286.68 samples/sec   Loss 0.8364   LearningRate 0.0009   Epoch: 18   Global Step: 224440   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:12:46,257-Speed 3311.44 samples/sec   Loss 0.8321   LearningRate 0.0009   Epoch: 18   Global Step: 224450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:12:49,330-Speed 3333.84 samples/sec   Loss 0.8499   LearningRate 0.0009   Epoch: 18   Global Step: 224460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:12:52,422-Speed 3312.64 samples/sec   Loss 0.8754   LearningRate 0.0009   Epoch: 18   Global Step: 224470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:12:55,540-Speed 3285.41 samples/sec   Loss 0.8546   LearningRate 0.0009   Epoch: 18   Global Step: 224480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:12:58,618-Speed 3327.58 samples/sec   Loss 0.8103   LearningRate 0.0009   Epoch: 18   Global Step: 224490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:13:01,757-Speed 3262.65 samples/sec   Loss 0.8133   LearningRate 0.0009   Epoch: 18   Global Step: 224500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:13:04,888-Speed 3271.54 samples/sec   Loss 0.8465   LearningRate 0.0009   Epoch: 18   Global Step: 224510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:13:07,978-Speed 3315.59 samples/sec   Loss 0.8486   LearningRate 0.0009   Epoch: 18   Global Step: 224520   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:13:11,055-Speed 3328.99 samples/sec   Loss 0.8809   LearningRate 0.0009   Epoch: 18   Global Step: 224530   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:13:14,205-Speed 3251.71 samples/sec   Loss 0.8299   LearningRate 0.0009   Epoch: 18   Global Step: 224540   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:13:17,324-Speed 3283.60 samples/sec   Loss 0.8514   LearningRate 0.0009   Epoch: 18   Global Step: 224550   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:13:20,494-Speed 3231.17 samples/sec   Loss 0.8903   LearningRate 0.0009   Epoch: 18   Global Step: 224560   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:13:23,609-Speed 3289.15 samples/sec   Loss 0.8657   LearningRate 0.0009   Epoch: 18   Global Step: 224570   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:13:26,773-Speed 3237.19 samples/sec   Loss 0.8502   LearningRate 0.0009   Epoch: 18   Global Step: 224580   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:13:29,912-Speed 3263.30 samples/sec   Loss 0.8636   LearningRate 0.0009   Epoch: 18   Global Step: 224590   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:13:32,973-Speed 3345.50 samples/sec   Loss 0.8503   LearningRate 0.0009   Epoch: 18   Global Step: 224600   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:13:36,068-Speed 3310.14 samples/sec   Loss 0.8346   LearningRate 0.0009   Epoch: 18   Global Step: 224610   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:13:39,248-Speed 3221.01 samples/sec   Loss 0.8153   LearningRate 0.0009   Epoch: 18   Global Step: 224620   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:13:42,372-Speed 3278.97 samples/sec   Loss 0.8553   LearningRate 0.0009   Epoch: 18   Global Step: 224630   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:13:45,470-Speed 3307.32 samples/sec   Loss 0.8536   LearningRate 0.0009   Epoch: 18   Global Step: 224640   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:13:48,632-Speed 3238.73 samples/sec   Loss 0.8678   LearningRate 0.0009   Epoch: 18   Global Step: 224650   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:13:51,777-Speed 3257.29 samples/sec   Loss 0.8325   LearningRate 0.0009   Epoch: 18   Global Step: 224660   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:13:54,935-Speed 3246.82 samples/sec   Loss 0.8516   LearningRate 0.0009   Epoch: 18   Global Step: 224670   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:13:57,997-Speed 3345.55 samples/sec   Loss 0.8895   LearningRate 0.0009   Epoch: 18   Global Step: 224680   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:14:01,095-Speed 3305.47 samples/sec   Loss 0.8427   LearningRate 0.0009   Epoch: 18   Global Step: 224690   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:14:04,231-Speed 3267.32 samples/sec   Loss 0.8454   LearningRate 0.0009   Epoch: 18   Global Step: 224700   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:14:07,394-Speed 3238.40 samples/sec   Loss 0.8725   LearningRate 0.0009   Epoch: 18   Global Step: 224710   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:14:10,509-Speed 3287.71 samples/sec   Loss 0.8568   LearningRate 0.0009   Epoch: 18   Global Step: 224720   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:14:13,633-Speed 3279.49 samples/sec   Loss 0.8498   LearningRate 0.0009   Epoch: 18   Global Step: 224730   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:14:16,810-Speed 3224.06 samples/sec   Loss 0.8210   LearningRate 0.0009   Epoch: 18   Global Step: 224740   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:14:19,916-Speed 3298.61 samples/sec   Loss 0.8078   LearningRate 0.0009   Epoch: 18   Global Step: 224750   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:14:22,975-Speed 3349.05 samples/sec   Loss 0.8476   LearningRate 0.0009   Epoch: 18   Global Step: 224760   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:14:26,195-Speed 3181.57 samples/sec   Loss 0.8564   LearningRate 0.0009   Epoch: 18   Global Step: 224770   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:14:29,330-Speed 3266.85 samples/sec   Loss 0.8315   LearningRate 0.0009   Epoch: 18   Global Step: 224780   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:14:32,449-Speed 3283.61 samples/sec   Loss 0.8832   LearningRate 0.0009   Epoch: 18   Global Step: 224790   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:14:35,535-Speed 3320.27 samples/sec   Loss 0.8343   LearningRate 0.0009   Epoch: 18   Global Step: 224800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:14:38,681-Speed 3255.51 samples/sec   Loss 0.8657   LearningRate 0.0009   Epoch: 18   Global Step: 224810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:14:41,818-Speed 3265.34 samples/sec   Loss 0.8608   LearningRate 0.0009   Epoch: 18   Global Step: 224820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:14:44,858-Speed 3369.12 samples/sec   Loss 0.8813   LearningRate 0.0009   Epoch: 18   Global Step: 224830   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:14:47,978-Speed 3283.35 samples/sec   Loss 0.8803   LearningRate 0.0009   Epoch: 18   Global Step: 224840   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:14:51,131-Speed 3248.47 samples/sec   Loss 0.8498   LearningRate 0.0009   Epoch: 18   Global Step: 224850   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:14:54,222-Speed 3313.83 samples/sec   Loss 0.8445   LearningRate 0.0009   Epoch: 18   Global Step: 224860   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:14:57,290-Speed 3338.95 samples/sec   Loss 0.8875   LearningRate 0.0009   Epoch: 18   Global Step: 224870   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:15:00,386-Speed 3308.39 samples/sec   Loss 0.8476   LearningRate 0.0009   Epoch: 18   Global Step: 224880   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:15:03,459-Speed 3334.11 samples/sec   Loss 0.8553   LearningRate 0.0009   Epoch: 18   Global Step: 224890   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:15:06,720-Speed 3140.66 samples/sec   Loss 0.8479   LearningRate 0.0009   Epoch: 18   Global Step: 224900   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:15:09,799-Speed 3326.68 samples/sec   Loss 0.8325   LearningRate 0.0009   Epoch: 18   Global Step: 224910   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:15:12,993-Speed 3207.28 samples/sec   Loss 0.7873   LearningRate 0.0009   Epoch: 18   Global Step: 224920   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:15:16,176-Speed 3218.19 samples/sec   Loss 0.8528   LearningRate 0.0009   Epoch: 18   Global Step: 224930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:15:19,359-Speed 3217.85 samples/sec   Loss 0.8399   LearningRate 0.0009   Epoch: 18   Global Step: 224940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:15:22,417-Speed 3349.25 samples/sec   Loss 0.8799   LearningRate 0.0009   Epoch: 18   Global Step: 224950   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:15:25,541-Speed 3279.41 samples/sec   Loss 0.8704   LearningRate 0.0009   Epoch: 18   Global Step: 224960   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:15:28,709-Speed 3232.55 samples/sec   Loss 0.8513   LearningRate 0.0009   Epoch: 18   Global Step: 224970   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:15:31,877-Speed 3234.06 samples/sec   Loss 0.8159   LearningRate 0.0009   Epoch: 18   Global Step: 224980   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:15:35,001-Speed 3277.98 samples/sec   Loss 0.8344   LearningRate 0.0009   Epoch: 18   Global Step: 224990   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:15:38,139-Speed 3264.62 samples/sec   Loss 0.8466   LearningRate 0.0009   Epoch: 18   Global Step: 225000   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:15:41,279-Speed 3263.10 samples/sec   Loss 0.8565   LearningRate 0.0009   Epoch: 18   Global Step: 225010   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:15:44,352-Speed 3332.82 samples/sec   Loss 0.8324   LearningRate 0.0009   Epoch: 18   Global Step: 225020   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:15:47,460-Speed 3295.20 samples/sec   Loss 0.7852   LearningRate 0.0009   Epoch: 18   Global Step: 225030   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:15:50,617-Speed 3244.66 samples/sec   Loss 0.8675   LearningRate 0.0009   Epoch: 18   Global Step: 225040   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:15:53,758-Speed 3261.07 samples/sec   Loss 0.8291   LearningRate 0.0009   Epoch: 18   Global Step: 225050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:15:56,815-Speed 3351.99 samples/sec   Loss 0.8271   LearningRate 0.0009   Epoch: 18   Global Step: 225060   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:15:59,906-Speed 3313.23 samples/sec   Loss 0.8828   LearningRate 0.0009   Epoch: 18   Global Step: 225070   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:16:03,154-Speed 3153.83 samples/sec   Loss 0.8425   LearningRate 0.0009   Epoch: 18   Global Step: 225080   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:16:06,363-Speed 3191.82 samples/sec   Loss 0.7757   LearningRate 0.0009   Epoch: 18   Global Step: 225090   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:16:09,504-Speed 3260.92 samples/sec   Loss 0.8626   LearningRate 0.0009   Epoch: 18   Global Step: 225100   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:16:12,628-Speed 3279.58 samples/sec   Loss 0.8333   LearningRate 0.0009   Epoch: 18   Global Step: 225110   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:16:15,822-Speed 3207.20 samples/sec   Loss 0.8445   LearningRate 0.0009   Epoch: 18   Global Step: 225120   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:16:18,937-Speed 3288.15 samples/sec   Loss 0.8530   LearningRate 0.0009   Epoch: 18   Global Step: 225130   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:16:22,004-Speed 3340.02 samples/sec   Loss 0.8570   LearningRate 0.0009   Epoch: 18   Global Step: 225140   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:16:25,099-Speed 3310.05 samples/sec   Loss 0.8653   LearningRate 0.0009   Epoch: 18   Global Step: 225150   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:16:28,222-Speed 3279.86 samples/sec   Loss 0.8220   LearningRate 0.0009   Epoch: 18   Global Step: 225160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:16:31,320-Speed 3306.59 samples/sec   Loss 0.8777   LearningRate 0.0009   Epoch: 18   Global Step: 225170   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:16:34,395-Speed 3330.72 samples/sec   Loss 0.8225   LearningRate 0.0009   Epoch: 18   Global Step: 225180   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:16:37,596-Speed 3199.83 samples/sec   Loss 0.8140   LearningRate 0.0009   Epoch: 18   Global Step: 225190   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:16:40,753-Speed 3245.05 samples/sec   Loss 0.8139   LearningRate 0.0009   Epoch: 18   Global Step: 225200   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:16:43,868-Speed 3287.82 samples/sec   Loss 0.8256   LearningRate 0.0009   Epoch: 18   Global Step: 225210   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:16:47,068-Speed 3201.18 samples/sec   Loss 0.8249   LearningRate 0.0009   Epoch: 18   Global Step: 225220   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:16:50,235-Speed 3234.02 samples/sec   Loss 0.8554   LearningRate 0.0009   Epoch: 18   Global Step: 225230   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:16:53,437-Speed 3199.36 samples/sec   Loss 0.8244   LearningRate 0.0009   Epoch: 18   Global Step: 225240   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:16:56,540-Speed 3301.22 samples/sec   Loss 0.8794   LearningRate 0.0009   Epoch: 18   Global Step: 225250   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:16:59,683-Speed 3258.52 samples/sec   Loss 0.8775   LearningRate 0.0009   Epoch: 18   Global Step: 225260   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:17:02,834-Speed 3250.81 samples/sec   Loss 0.8430   LearningRate 0.0009   Epoch: 18   Global Step: 225270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:17:06,009-Speed 3226.46 samples/sec   Loss 0.8284   LearningRate 0.0009   Epoch: 18   Global Step: 225280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:17:09,117-Speed 3295.72 samples/sec   Loss 0.8677   LearningRate 0.0009   Epoch: 18   Global Step: 225290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:17:12,147-Speed 3380.18 samples/sec   Loss 0.8156   LearningRate 0.0009   Epoch: 18   Global Step: 225300   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:17:15,346-Speed 3201.81 samples/sec   Loss 0.8275   LearningRate 0.0009   Epoch: 18   Global Step: 225310   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:17:18,432-Speed 3320.03 samples/sec   Loss 0.7926   LearningRate 0.0009   Epoch: 18   Global Step: 225320   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:17:21,504-Speed 3335.07 samples/sec   Loss 0.8366   LearningRate 0.0009   Epoch: 18   Global Step: 225330   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:17:24,673-Speed 3232.06 samples/sec   Loss 0.8521   LearningRate 0.0009   Epoch: 18   Global Step: 225340   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:17:27,826-Speed 3248.15 samples/sec   Loss 0.8559   LearningRate 0.0009   Epoch: 18   Global Step: 225350   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:17:30,974-Speed 3254.54 samples/sec   Loss 0.8340   LearningRate 0.0009   Epoch: 18   Global Step: 225360   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:17:34,065-Speed 3313.19 samples/sec   Loss 0.8658   LearningRate 0.0009   Epoch: 18   Global Step: 225370   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:17:37,218-Speed 3249.41 samples/sec   Loss 0.8602   LearningRate 0.0009   Epoch: 18   Global Step: 225380   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:17:40,386-Speed 3232.31 samples/sec   Loss 0.8420   LearningRate 0.0009   Epoch: 18   Global Step: 225390   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:17:43,530-Speed 3258.58 samples/sec   Loss 0.8483   LearningRate 0.0009   Epoch: 18   Global Step: 225400   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:17:46,615-Speed 3319.87 samples/sec   Loss 0.8789   LearningRate 0.0009   Epoch: 18   Global Step: 225410   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:17:49,691-Speed 3330.61 samples/sec   Loss 0.8800   LearningRate 0.0009   Epoch: 18   Global Step: 225420   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:17:52,775-Speed 3321.35 samples/sec   Loss 0.8705   LearningRate 0.0009   Epoch: 18   Global Step: 225430   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:17:55,851-Speed 3329.90 samples/sec   Loss 0.8769   LearningRate 0.0009   Epoch: 18   Global Step: 225440   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:17:58,927-Speed 3330.40 samples/sec   Loss 0.8466   LearningRate 0.0009   Epoch: 18   Global Step: 225450   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:18:02,051-Speed 3279.39 samples/sec   Loss 0.8873   LearningRate 0.0009   Epoch: 18   Global Step: 225460   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:18:05,155-Speed 3299.80 samples/sec   Loss 0.8306   LearningRate 0.0009   Epoch: 18   Global Step: 225470   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:18:08,233-Speed 3328.64 samples/sec   Loss 0.8572   LearningRate 0.0009   Epoch: 18   Global Step: 225480   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:18:11,308-Speed 3330.99 samples/sec   Loss 0.8681   LearningRate 0.0009   Epoch: 18   Global Step: 225490   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:18:14,370-Speed 3344.45 samples/sec   Loss 0.8668   LearningRate 0.0009   Epoch: 18   Global Step: 225500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:18:17,543-Speed 3228.98 samples/sec   Loss 0.8401   LearningRate 0.0009   Epoch: 18   Global Step: 225510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:18:20,577-Speed 3375.92 samples/sec   Loss 0.8595   LearningRate 0.0008   Epoch: 18   Global Step: 225520   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:18:23,650-Speed 3333.60 samples/sec   Loss 0.8318   LearningRate 0.0008   Epoch: 18   Global Step: 225530   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:18:26,732-Speed 3324.29 samples/sec   Loss 0.7889   LearningRate 0.0008   Epoch: 18   Global Step: 225540   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:18:29,886-Speed 3247.05 samples/sec   Loss 0.8315   LearningRate 0.0008   Epoch: 18   Global Step: 225550   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:18:32,992-Speed 3297.42 samples/sec   Loss 0.8871   LearningRate 0.0008   Epoch: 18   Global Step: 225560   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:18:36,124-Speed 3271.25 samples/sec   Loss 0.8451   LearningRate 0.0008   Epoch: 18   Global Step: 225570   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:18:39,287-Speed 3238.13 samples/sec   Loss 0.8517   LearningRate 0.0008   Epoch: 18   Global Step: 225580   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:18:42,451-Speed 3237.72 samples/sec   Loss 0.8660   LearningRate 0.0008   Epoch: 18   Global Step: 225590   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:18:45,542-Speed 3313.87 samples/sec   Loss 0.8821   LearningRate 0.0008   Epoch: 18   Global Step: 225600   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:18:48,699-Speed 3243.85 samples/sec   Loss 0.8839   LearningRate 0.0008   Epoch: 18   Global Step: 225610   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:18:51,826-Speed 3276.51 samples/sec   Loss 0.8304   LearningRate 0.0008   Epoch: 18   Global Step: 225620   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:18:54,995-Speed 3232.62 samples/sec   Loss 0.8916   LearningRate 0.0008   Epoch: 18   Global Step: 225630   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:18:58,109-Speed 3289.28 samples/sec   Loss 0.8417   LearningRate 0.0008   Epoch: 18   Global Step: 225640   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:19:01,301-Speed 3209.10 samples/sec   Loss 0.8357   LearningRate 0.0008   Epoch: 18   Global Step: 225650   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:19:04,494-Speed 3207.26 samples/sec   Loss 0.8620   LearningRate 0.0008   Epoch: 18   Global Step: 225660   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:19:07,627-Speed 3269.88 samples/sec   Loss 0.8209   LearningRate 0.0008   Epoch: 18   Global Step: 225670   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:19:10,691-Speed 3342.60 samples/sec   Loss 0.8615   LearningRate 0.0008   Epoch: 18   Global Step: 225680   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:19:13,915-Speed 3178.32 samples/sec   Loss 0.8577   LearningRate 0.0008   Epoch: 18   Global Step: 225690   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:19:17,028-Speed 3289.70 samples/sec   Loss 0.8300   LearningRate 0.0008   Epoch: 18   Global Step: 225700   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:19:20,130-Speed 3302.56 samples/sec   Loss 0.8552   LearningRate 0.0008   Epoch: 18   Global Step: 225710   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:19:23,267-Speed 3265.34 samples/sec   Loss 0.8622   LearningRate 0.0008   Epoch: 18   Global Step: 225720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:19:26,367-Speed 3304.76 samples/sec   Loss 0.8226   LearningRate 0.0008   Epoch: 18   Global Step: 225730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:19:29,472-Speed 3298.34 samples/sec   Loss 0.8398   LearningRate 0.0008   Epoch: 18   Global Step: 225740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:19:32,565-Speed 3312.24 samples/sec   Loss 0.8676   LearningRate 0.0008   Epoch: 18   Global Step: 225750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:19:35,681-Speed 3287.41 samples/sec   Loss 0.8139   LearningRate 0.0008   Epoch: 18   Global Step: 225760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:19:38,834-Speed 3247.89 samples/sec   Loss 0.8372   LearningRate 0.0008   Epoch: 18   Global Step: 225770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:19:42,580-Speed 2734.94 samples/sec   Loss 0.8276   LearningRate 0.0008   Epoch: 18   Global Step: 225780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:19:45,696-Speed 3287.34 samples/sec   Loss 0.8542   LearningRate 0.0008   Epoch: 18   Global Step: 225790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:19:48,775-Speed 3326.38 samples/sec   Loss 0.8463   LearningRate 0.0008   Epoch: 18   Global Step: 225800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:19:51,873-Speed 3306.92 samples/sec   Loss 0.8248   LearningRate 0.0008   Epoch: 18   Global Step: 225810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:19:55,019-Speed 3256.30 samples/sec   Loss 0.8387   LearningRate 0.0008   Epoch: 18   Global Step: 225820   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 21:19:58,114-Speed 3309.15 samples/sec   Loss 0.8350   LearningRate 0.0008   Epoch: 18   Global Step: 225830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:20:01,215-Speed 3303.17 samples/sec   Loss 0.8622   LearningRate 0.0008   Epoch: 18   Global Step: 225840   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:20:04,305-Speed 3315.80 samples/sec   Loss 0.8400   LearningRate 0.0008   Epoch: 18   Global Step: 225850   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:20:07,417-Speed 3290.82 samples/sec   Loss 0.8135   LearningRate 0.0008   Epoch: 18   Global Step: 225860   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:20:10,478-Speed 3346.36 samples/sec   Loss 0.8404   LearningRate 0.0008   Epoch: 18   Global Step: 225870   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:20:13,557-Speed 3327.16 samples/sec   Loss 0.8501   LearningRate 0.0008   Epoch: 18   Global Step: 225880   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:20:16,812-Speed 3146.99 samples/sec   Loss 0.8424   LearningRate 0.0008   Epoch: 18   Global Step: 225890   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:20:19,922-Speed 3292.82 samples/sec   Loss 0.8136   LearningRate 0.0008   Epoch: 18   Global Step: 225900   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:20:23,014-Speed 3313.06 samples/sec   Loss 0.7935   LearningRate 0.0008   Epoch: 18   Global Step: 225910   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:20:26,166-Speed 3250.48 samples/sec   Loss 0.8459   LearningRate 0.0008   Epoch: 18   Global Step: 225920   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:20:29,285-Speed 3284.19 samples/sec   Loss 0.8718   LearningRate 0.0008   Epoch: 18   Global Step: 225930   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:20:32,391-Speed 3297.98 samples/sec   Loss 0.8200   LearningRate 0.0008   Epoch: 18   Global Step: 225940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:20:35,462-Speed 3334.62 samples/sec   Loss 0.8307   LearningRate 0.0008   Epoch: 18   Global Step: 225950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:20:38,619-Speed 3244.61 samples/sec   Loss 0.8452   LearningRate 0.0008   Epoch: 18   Global Step: 225960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:20:41,808-Speed 3212.81 samples/sec   Loss 0.8207   LearningRate 0.0008   Epoch: 18   Global Step: 225970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:20:44,896-Speed 3316.79 samples/sec   Loss 0.8628   LearningRate 0.0008   Epoch: 18   Global Step: 225980   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:20:47,992-Speed 3309.11 samples/sec   Loss 0.8685   LearningRate 0.0008   Epoch: 18   Global Step: 225990   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:20:51,110-Speed 3284.55 samples/sec   Loss 0.8293   LearningRate 0.0008   Epoch: 18   Global Step: 226000   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:20:54,227-Speed 3287.15 samples/sec   Loss 0.8385   LearningRate 0.0008   Epoch: 18   Global Step: 226010   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:20:57,286-Speed 3348.12 samples/sec   Loss 0.8344   LearningRate 0.0008   Epoch: 18   Global Step: 226020   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:21:00,425-Speed 3263.41 samples/sec   Loss 0.8434   LearningRate 0.0008   Epoch: 18   Global Step: 226030   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:21:03,518-Speed 3311.55 samples/sec   Loss 0.8466   LearningRate 0.0008   Epoch: 18   Global Step: 226040   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:21:06,625-Speed 3296.35 samples/sec   Loss 0.8232   LearningRate 0.0008   Epoch: 18   Global Step: 226050   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:21:09,685-Speed 3348.33 samples/sec   Loss 0.8783   LearningRate 0.0008   Epoch: 18   Global Step: 226060   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:21:12,853-Speed 3232.95 samples/sec   Loss 0.8340   LearningRate 0.0008   Epoch: 18   Global Step: 226070   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:21:16,647-Speed 2699.74 samples/sec   Loss 0.8474   LearningRate 0.0008   Epoch: 18   Global Step: 226080   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:21:19,761-Speed 3289.09 samples/sec   Loss 0.8670   LearningRate 0.0008   Epoch: 18   Global Step: 226090   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:21:22,861-Speed 3304.28 samples/sec   Loss 0.8671   LearningRate 0.0008   Epoch: 18   Global Step: 226100   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:21:26,692-Speed 2673.45 samples/sec   Loss 0.8450   LearningRate 0.0008   Epoch: 18   Global Step: 226110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:21:30,351-Speed 2799.38 samples/sec   Loss 0.8232   LearningRate 0.0008   Epoch: 18   Global Step: 226120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:21:33,414-Speed 3345.17 samples/sec   Loss 0.8865   LearningRate 0.0008   Epoch: 18   Global Step: 226130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:21:36,516-Speed 3301.82 samples/sec   Loss 0.8263   LearningRate 0.0008   Epoch: 18   Global Step: 226140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:21:39,631-Speed 3288.54 samples/sec   Loss 0.8382   LearningRate 0.0008   Epoch: 18   Global Step: 226150   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:21:42,788-Speed 3244.36 samples/sec   Loss 0.8344   LearningRate 0.0008   Epoch: 18   Global Step: 226160   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:21:45,862-Speed 3332.51 samples/sec   Loss 0.8660   LearningRate 0.0008   Epoch: 18   Global Step: 226170   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:21:48,964-Speed 3301.99 samples/sec   Loss 0.8438   LearningRate 0.0008   Epoch: 18   Global Step: 226180   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:21:52,036-Speed 3334.65 samples/sec   Loss 0.8410   LearningRate 0.0008   Epoch: 18   Global Step: 226190   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:21:55,223-Speed 3214.00 samples/sec   Loss 0.8669   LearningRate 0.0008   Epoch: 18   Global Step: 226200   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:21:58,299-Speed 3330.82 samples/sec   Loss 0.8245   LearningRate 0.0008   Epoch: 18   Global Step: 226210   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:22:01,494-Speed 3205.66 samples/sec   Loss 0.8399   LearningRate 0.0008   Epoch: 18   Global Step: 226220   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:22:04,669-Speed 3226.77 samples/sec   Loss 0.8704   LearningRate 0.0008   Epoch: 18   Global Step: 226230   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:22:07,761-Speed 3312.20 samples/sec   Loss 0.8569   LearningRate 0.0008   Epoch: 18   Global Step: 226240   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:22:10,822-Speed 3346.67 samples/sec   Loss 0.8068   LearningRate 0.0008   Epoch: 18   Global Step: 226250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:22:13,892-Speed 3336.11 samples/sec   Loss 0.8463   LearningRate 0.0008   Epoch: 18   Global Step: 226260   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:22:16,974-Speed 3324.58 samples/sec   Loss 0.8635   LearningRate 0.0008   Epoch: 18   Global Step: 226270   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:22:20,047-Speed 3333.23 samples/sec   Loss 0.8438   LearningRate 0.0008   Epoch: 18   Global Step: 226280   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:22:23,150-Speed 3300.41 samples/sec   Loss 0.8111   LearningRate 0.0008   Epoch: 18   Global Step: 226290   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:22:26,322-Speed 3229.94 samples/sec   Loss 0.8687   LearningRate 0.0008   Epoch: 18   Global Step: 226300   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:22:29,510-Speed 3212.28 samples/sec   Loss 0.8436   LearningRate 0.0008   Epoch: 18   Global Step: 226310   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:22:32,681-Speed 3231.38 samples/sec   Loss 0.8194   LearningRate 0.0008   Epoch: 18   Global Step: 226320   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:22:35,801-Speed 3283.19 samples/sec   Loss 0.8478   LearningRate 0.0008   Epoch: 18   Global Step: 226330   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:22:38,883-Speed 3323.34 samples/sec   Loss 0.8498   LearningRate 0.0008   Epoch: 18   Global Step: 226340   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:22:42,022-Speed 3262.84 samples/sec   Loss 0.8809   LearningRate 0.0008   Epoch: 18   Global Step: 226350   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:22:45,091-Speed 3337.72 samples/sec   Loss 0.8452   LearningRate 0.0008   Epoch: 18   Global Step: 226360   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:22:48,188-Speed 3307.75 samples/sec   Loss 0.8392   LearningRate 0.0008   Epoch: 18   Global Step: 226370   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:22:51,262-Speed 3331.72 samples/sec   Loss 0.8329   LearningRate 0.0008   Epoch: 18   Global Step: 226380   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:22:54,363-Speed 3303.18 samples/sec   Loss 0.8559   LearningRate 0.0008   Epoch: 18   Global Step: 226390   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:22:57,438-Speed 3331.18 samples/sec   Loss 0.8923   LearningRate 0.0008   Epoch: 18   Global Step: 226400   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:23:00,554-Speed 3287.79 samples/sec   Loss 0.8473   LearningRate 0.0008   Epoch: 18   Global Step: 226410   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:23:03,617-Speed 3344.80 samples/sec   Loss 0.8482   LearningRate 0.0008   Epoch: 18   Global Step: 226420   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:23:06,728-Speed 3292.49 samples/sec   Loss 0.8520   LearningRate 0.0008   Epoch: 18   Global Step: 226430   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:23:09,808-Speed 3324.74 samples/sec   Loss 0.8467   LearningRate 0.0008   Epoch: 18   Global Step: 226440   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:23:12,956-Speed 3254.74 samples/sec   Loss 0.8768   LearningRate 0.0008   Epoch: 18   Global Step: 226450   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:23:16,084-Speed 3274.50 samples/sec   Loss 0.8777   LearningRate 0.0008   Epoch: 18   Global Step: 226460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:23:19,169-Speed 3320.12 samples/sec   Loss 0.8471   LearningRate 0.0008   Epoch: 18   Global Step: 226470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:23:22,224-Speed 3352.84 samples/sec   Loss 0.8711   LearningRate 0.0008   Epoch: 18   Global Step: 226480   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:23:25,282-Speed 3349.25 samples/sec   Loss 0.8714   LearningRate 0.0008   Epoch: 18   Global Step: 226490   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:23:28,380-Speed 3307.02 samples/sec   Loss 0.8549   LearningRate 0.0008   Epoch: 18   Global Step: 226500   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:23:31,442-Speed 3345.09 samples/sec   Loss 0.8494   LearningRate 0.0008   Epoch: 18   Global Step: 226510   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:23:34,531-Speed 3315.53 samples/sec   Loss 0.8519   LearningRate 0.0008   Epoch: 18   Global Step: 226520   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:23:37,628-Speed 3308.67 samples/sec   Loss 0.8390   LearningRate 0.0008   Epoch: 18   Global Step: 226530   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:23:40,787-Speed 3242.29 samples/sec   Loss 0.8013   LearningRate 0.0008   Epoch: 18   Global Step: 226540   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:23:43,922-Speed 3266.95 samples/sec   Loss 0.8188   LearningRate 0.0008   Epoch: 18   Global Step: 226550   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:23:46,996-Speed 3332.69 samples/sec   Loss 0.8284   LearningRate 0.0008   Epoch: 18   Global Step: 226560   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:23:50,049-Speed 3354.61 samples/sec   Loss 0.8868   LearningRate 0.0008   Epoch: 18   Global Step: 226570   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:23:53,123-Speed 3331.97 samples/sec   Loss 0.8500   LearningRate 0.0008   Epoch: 18   Global Step: 226580   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:23:56,200-Speed 3330.05 samples/sec   Loss 0.8509   LearningRate 0.0008   Epoch: 18   Global Step: 226590   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:23:59,279-Speed 3326.52 samples/sec   Loss 0.8466   LearningRate 0.0008   Epoch: 18   Global Step: 226600   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:24:02,395-Speed 3287.26 samples/sec   Loss 0.8344   LearningRate 0.0008   Epoch: 18   Global Step: 226610   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:24:05,541-Speed 3256.01 samples/sec   Loss 0.8904   LearningRate 0.0008   Epoch: 18   Global Step: 226620   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:24:08,613-Speed 3333.94 samples/sec   Loss 0.8315   LearningRate 0.0008   Epoch: 18   Global Step: 226630   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:24:11,661-Speed 3360.72 samples/sec   Loss 0.8870   LearningRate 0.0008   Epoch: 18   Global Step: 226640   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:24:14,797-Speed 3266.24 samples/sec   Loss 0.8139   LearningRate 0.0008   Epoch: 18   Global Step: 226650   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:24:17,932-Speed 3267.79 samples/sec   Loss 0.8600   LearningRate 0.0008   Epoch: 18   Global Step: 226660   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:24:21,003-Speed 3335.53 samples/sec   Loss 0.8505   LearningRate 0.0008   Epoch: 18   Global Step: 226670   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:24:24,120-Speed 3285.59 samples/sec   Loss 0.8468   LearningRate 0.0008   Epoch: 18   Global Step: 226680   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:24:27,200-Speed 3325.71 samples/sec   Loss 0.8052   LearningRate 0.0008   Epoch: 18   Global Step: 226690   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:24:30,341-Speed 3261.32 samples/sec   Loss 0.8252   LearningRate 0.0008   Epoch: 18   Global Step: 226700   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:24:33,462-Speed 3282.71 samples/sec   Loss 0.8644   LearningRate 0.0008   Epoch: 18   Global Step: 226710   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:24:36,524-Speed 3344.77 samples/sec   Loss 0.8401   LearningRate 0.0008   Epoch: 18   Global Step: 226720   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:24:39,598-Speed 3332.16 samples/sec   Loss 0.8643   LearningRate 0.0008   Epoch: 18   Global Step: 226730   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:24:42,668-Speed 3337.06 samples/sec   Loss 0.8859   LearningRate 0.0008   Epoch: 18   Global Step: 226740   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:24:45,747-Speed 3326.12 samples/sec   Loss 0.8422   LearningRate 0.0008   Epoch: 18   Global Step: 226750   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:24:48,829-Speed 3323.79 samples/sec   Loss 0.8622   LearningRate 0.0008   Epoch: 18   Global Step: 226760   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:24:51,975-Speed 3255.95 samples/sec   Loss 0.8126   LearningRate 0.0008   Epoch: 18   Global Step: 226770   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:24:55,120-Speed 3256.61 samples/sec   Loss 0.8926   LearningRate 0.0008   Epoch: 18   Global Step: 226780   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:24:58,236-Speed 3288.07 samples/sec   Loss 0.8591   LearningRate 0.0008   Epoch: 18   Global Step: 226790   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:25:01,327-Speed 3313.49 samples/sec   Loss 0.8386   LearningRate 0.0008   Epoch: 18   Global Step: 226800   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:25:04,434-Speed 3297.10 samples/sec   Loss 0.8182   LearningRate 0.0008   Epoch: 18   Global Step: 226810   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:25:07,507-Speed 3333.04 samples/sec   Loss 0.8607   LearningRate 0.0008   Epoch: 18   Global Step: 226820   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:25:10,618-Speed 3292.32 samples/sec   Loss 0.8551   LearningRate 0.0008   Epoch: 18   Global Step: 226830   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:25:13,761-Speed 3259.09 samples/sec   Loss 0.8300   LearningRate 0.0008   Epoch: 18   Global Step: 226840   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:25:16,911-Speed 3251.74 samples/sec   Loss 0.8190   LearningRate 0.0008   Epoch: 18   Global Step: 226850   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:25:19,983-Speed 3334.66 samples/sec   Loss 0.8540   LearningRate 0.0008   Epoch: 18   Global Step: 226860   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:25:23,044-Speed 3346.30 samples/sec   Loss 0.8159   LearningRate 0.0008   Epoch: 18   Global Step: 226870   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:25:26,188-Speed 3257.95 samples/sec   Loss 0.8588   LearningRate 0.0008   Epoch: 18   Global Step: 226880   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:25:29,415-Speed 3173.90 samples/sec   Loss 0.8474   LearningRate 0.0008   Epoch: 18   Global Step: 226890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:25:32,516-Speed 3303.44 samples/sec   Loss 0.8421   LearningRate 0.0008   Epoch: 18   Global Step: 226900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:25:35,602-Speed 3319.38 samples/sec   Loss 0.8365   LearningRate 0.0007   Epoch: 18   Global Step: 226910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:25:38,646-Speed 3364.57 samples/sec   Loss 0.8278   LearningRate 0.0007   Epoch: 18   Global Step: 226920   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:25:41,779-Speed 3269.93 samples/sec   Loss 0.8447   LearningRate 0.0007   Epoch: 18   Global Step: 226930   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:25:44,867-Speed 3316.91 samples/sec   Loss 0.8882   LearningRate 0.0007   Epoch: 18   Global Step: 226940   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:25:48,084-Speed 3184.27 samples/sec   Loss 0.8782   LearningRate 0.0007   Epoch: 18   Global Step: 226950   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:25:51,216-Speed 3270.52 samples/sec   Loss 0.8370   LearningRate 0.0007   Epoch: 18   Global Step: 226960   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:25:54,394-Speed 3223.68 samples/sec   Loss 0.8582   LearningRate 0.0007   Epoch: 18   Global Step: 226970   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:25:57,477-Speed 3322.10 samples/sec   Loss 0.8437   LearningRate 0.0007   Epoch: 18   Global Step: 226980   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:26:00,570-Speed 3311.94 samples/sec   Loss 0.8419   LearningRate 0.0007   Epoch: 18   Global Step: 226990   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:26:03,685-Speed 3287.96 samples/sec   Loss 0.8561   LearningRate 0.0007   Epoch: 18   Global Step: 227000   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:26:06,763-Speed 3327.96 samples/sec   Loss 0.7786   LearningRate 0.0007   Epoch: 18   Global Step: 227010   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:26:09,858-Speed 3309.41 samples/sec   Loss 0.8182   LearningRate 0.0007   Epoch: 18   Global Step: 227020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:26:13,036-Speed 3223.22 samples/sec   Loss 0.8692   LearningRate 0.0007   Epoch: 18   Global Step: 227030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:26:16,111-Speed 3331.52 samples/sec   Loss 0.8679   LearningRate 0.0007   Epoch: 18   Global Step: 227040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:26:19,232-Speed 3281.47 samples/sec   Loss 0.8614   LearningRate 0.0007   Epoch: 18   Global Step: 227050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:26:22,280-Speed 3361.12 samples/sec   Loss 0.8270   LearningRate 0.0007   Epoch: 18   Global Step: 227060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:26:25,485-Speed 3195.95 samples/sec   Loss 0.8405   LearningRate 0.0007   Epoch: 18   Global Step: 227070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:26:28,626-Speed 3260.59 samples/sec   Loss 0.8531   LearningRate 0.0007   Epoch: 18   Global Step: 227080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:26:31,707-Speed 3325.34 samples/sec   Loss 0.8281   LearningRate 0.0007   Epoch: 18   Global Step: 227090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:26:34,871-Speed 3236.28 samples/sec   Loss 0.8490   LearningRate 0.0007   Epoch: 18   Global Step: 227100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:26:38,013-Speed 3260.71 samples/sec   Loss 0.8909   LearningRate 0.0007   Epoch: 18   Global Step: 227110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:26:41,077-Speed 3343.72 samples/sec   Loss 0.8112   LearningRate 0.0007   Epoch: 18   Global Step: 227120   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-04-27 21:26:44,168-Speed 3314.12 samples/sec   Loss 0.8618   LearningRate 0.0007   Epoch: 18   Global Step: 227130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:26:47,233-Speed 3341.53 samples/sec   Loss 0.8566   LearningRate 0.0007   Epoch: 18   Global Step: 227140   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:26:50,290-Speed 3350.80 samples/sec   Loss 0.8690   LearningRate 0.0007   Epoch: 18   Global Step: 227150   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:26:53,423-Speed 3269.55 samples/sec   Loss 0.7998   LearningRate 0.0007   Epoch: 18   Global Step: 227160   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:26:56,576-Speed 3248.59 samples/sec   Loss 0.8544   LearningRate 0.0007   Epoch: 18   Global Step: 227170   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:26:59,657-Speed 3324.70 samples/sec   Loss 0.8549   LearningRate 0.0007   Epoch: 18   Global Step: 227180   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:27:02,754-Speed 3307.43 samples/sec   Loss 0.8526   LearningRate 0.0007   Epoch: 18   Global Step: 227190   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:27:05,866-Speed 3291.87 samples/sec   Loss 0.8748   LearningRate 0.0007   Epoch: 18   Global Step: 227200   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:27:08,927-Speed 3347.04 samples/sec   Loss 0.8293   LearningRate 0.0007   Epoch: 18   Global Step: 227210   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:27:12,015-Speed 3316.49 samples/sec   Loss 0.8613   LearningRate 0.0007   Epoch: 18   Global Step: 227220   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:27:15,109-Speed 3310.92 samples/sec   Loss 0.8745   LearningRate 0.0007   Epoch: 18   Global Step: 227230   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:27:18,224-Speed 3288.56 samples/sec   Loss 0.8790   LearningRate 0.0007   Epoch: 18   Global Step: 227240   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:27:21,275-Speed 3357.09 samples/sec   Loss 0.8471   LearningRate 0.0007   Epoch: 18   Global Step: 227250   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:27:24,351-Speed 3330.84 samples/sec   Loss 0.8497   LearningRate 0.0007   Epoch: 18   Global Step: 227260   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:27:27,506-Speed 3245.86 samples/sec   Loss 0.8468   LearningRate 0.0007   Epoch: 18   Global Step: 227270   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:27:30,677-Speed 3230.09 samples/sec   Loss 0.8473   LearningRate 0.0007   Epoch: 18   Global Step: 227280   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:27:33,741-Speed 3343.05 samples/sec   Loss 0.8584   LearningRate 0.0007   Epoch: 18   Global Step: 227290   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:27:36,799-Speed 3350.07 samples/sec   Loss 0.8709   LearningRate 0.0007   Epoch: 18   Global Step: 227300   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:27:39,906-Speed 3296.49 samples/sec   Loss 0.8781   LearningRate 0.0007   Epoch: 18   Global Step: 227310   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:27:43,026-Speed 3283.35 samples/sec   Loss 0.8462   LearningRate 0.0007   Epoch: 18   Global Step: 227320   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:27:46,097-Speed 3336.04 samples/sec   Loss 0.8407   LearningRate 0.0007   Epoch: 18   Global Step: 227330   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:27:49,174-Speed 3329.19 samples/sec   Loss 0.8687   LearningRate 0.0007   Epoch: 18   Global Step: 227340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:27:52,221-Speed 3360.82 samples/sec   Loss 0.8518   LearningRate 0.0007   Epoch: 18   Global Step: 227350   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:27:55,296-Speed 3331.33 samples/sec   Loss 0.8423   LearningRate 0.0007   Epoch: 18   Global Step: 227360   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:27:58,390-Speed 3310.70 samples/sec   Loss 0.8501   LearningRate 0.0007   Epoch: 18   Global Step: 227370   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:28:01,510-Speed 3283.33 samples/sec   Loss 0.8530   LearningRate 0.0007   Epoch: 18   Global Step: 227380   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:28:04,588-Speed 3327.46 samples/sec   Loss 0.8313   LearningRate 0.0007   Epoch: 18   Global Step: 227390   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:28:07,666-Speed 3328.31 samples/sec   Loss 0.8711   LearningRate 0.0007   Epoch: 18   Global Step: 227400   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:28:10,728-Speed 3344.95 samples/sec   Loss 0.8346   LearningRate 0.0007   Epoch: 18   Global Step: 227410   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:28:13,802-Speed 3333.00 samples/sec   Loss 0.8682   LearningRate 0.0007   Epoch: 18   Global Step: 227420   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:28:16,905-Speed 3301.21 samples/sec   Loss 0.8349   LearningRate 0.0007   Epoch: 18   Global Step: 227430   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:28:19,968-Speed 3344.19 samples/sec   Loss 0.8314   LearningRate 0.0007   Epoch: 18   Global Step: 227440   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:28:23,078-Speed 3293.72 samples/sec   Loss 0.8651   LearningRate 0.0007   Epoch: 18   Global Step: 227450   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:28:26,199-Speed 3281.97 samples/sec   Loss 0.8953   LearningRate 0.0007   Epoch: 18   Global Step: 227460   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:28:29,373-Speed 3226.67 samples/sec   Loss 0.8484   LearningRate 0.0007   Epoch: 18   Global Step: 227470   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:28:32,432-Speed 3348.78 samples/sec   Loss 0.8180   LearningRate 0.0007   Epoch: 18   Global Step: 227480   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:28:35,538-Speed 3298.03 samples/sec   Loss 0.8351   LearningRate 0.0007   Epoch: 18   Global Step: 227490   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:28:38,650-Speed 3291.92 samples/sec   Loss 0.8401   LearningRate 0.0007   Epoch: 18   Global Step: 227500   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:28:41,708-Speed 3349.68 samples/sec   Loss 0.8693   LearningRate 0.0007   Epoch: 18   Global Step: 227510   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:28:44,770-Speed 3345.63 samples/sec   Loss 0.8686   LearningRate 0.0007   Epoch: 18   Global Step: 227520   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:28:47,871-Speed 3303.30 samples/sec   Loss 0.8175   LearningRate 0.0007   Epoch: 18   Global Step: 227530   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:28:51,020-Speed 3252.43 samples/sec   Loss 0.8362   LearningRate 0.0007   Epoch: 18   Global Step: 227540   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:28:54,128-Speed 3295.96 samples/sec   Loss 0.8459   LearningRate 0.0007   Epoch: 18   Global Step: 227550   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:28:57,234-Speed 3297.55 samples/sec   Loss 0.8357   LearningRate 0.0007   Epoch: 18   Global Step: 227560   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:29:00,346-Speed 3292.25 samples/sec   Loss 0.8727   LearningRate 0.0007   Epoch: 18   Global Step: 227570   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:29:03,403-Speed 3350.09 samples/sec   Loss 0.7976   LearningRate 0.0007   Epoch: 18   Global Step: 227580   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:29:06,541-Speed 3264.23 samples/sec   Loss 0.8457   LearningRate 0.0007   Epoch: 18   Global Step: 227590   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:29:09,603-Speed 3346.10 samples/sec   Loss 0.8131   LearningRate 0.0007   Epoch: 18   Global Step: 227600   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:29:12,724-Speed 3282.19 samples/sec   Loss 0.8635   LearningRate 0.0007   Epoch: 18   Global Step: 227610   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:29:15,831-Speed 3296.96 samples/sec   Loss 0.8169   LearningRate 0.0007   Epoch: 18   Global Step: 227620   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:29:18,892-Speed 3346.45 samples/sec   Loss 0.8206   LearningRate 0.0007   Epoch: 18   Global Step: 227630   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:29:21,949-Speed 3349.91 samples/sec   Loss 0.8465   LearningRate 0.0007   Epoch: 18   Global Step: 227640   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:29:25,746-Speed 2697.41 samples/sec   Loss 0.8808   LearningRate 0.0007   Epoch: 18   Global Step: 227650   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:29:28,859-Speed 3290.76 samples/sec   Loss 0.8490   LearningRate 0.0007   Epoch: 18   Global Step: 227660   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:29:31,947-Speed 3317.04 samples/sec   Loss 0.8235   LearningRate 0.0007   Epoch: 18   Global Step: 227670   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:29:35,110-Speed 3239.04 samples/sec   Loss 0.8239   LearningRate 0.0007   Epoch: 18   Global Step: 227680   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:29:38,225-Speed 3288.07 samples/sec   Loss 0.8357   LearningRate 0.0007   Epoch: 18   Global Step: 227690   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:29:41,328-Speed 3300.85 samples/sec   Loss 0.8250   LearningRate 0.0007   Epoch: 18   Global Step: 227700   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:29:44,487-Speed 3242.42 samples/sec   Loss 0.8407   LearningRate 0.0007   Epoch: 18   Global Step: 227710   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:29:47,571-Speed 3321.70 samples/sec   Loss 0.8603   LearningRate 0.0007   Epoch: 18   Global Step: 227720   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:29:50,700-Speed 3273.54 samples/sec   Loss 0.7889   LearningRate 0.0007   Epoch: 18   Global Step: 227730   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:29:53,951-Speed 3150.70 samples/sec   Loss 0.8419   LearningRate 0.0007   Epoch: 18   Global Step: 227740   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:29:57,033-Speed 3324.45 samples/sec   Loss 0.8396   LearningRate 0.0007   Epoch: 18   Global Step: 227750   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:30:00,113-Speed 3325.27 samples/sec   Loss 0.8077   LearningRate 0.0007   Epoch: 18   Global Step: 227760   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:30:03,169-Speed 3351.86 samples/sec   Loss 0.8649   LearningRate 0.0007   Epoch: 18   Global Step: 227770   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:30:06,308-Speed 3263.02 samples/sec   Loss 0.8437   LearningRate 0.0007   Epoch: 18   Global Step: 227780   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:30:09,368-Speed 3347.19 samples/sec   Loss 0.8419   LearningRate 0.0007   Epoch: 18   Global Step: 227790   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:30:12,458-Speed 3315.37 samples/sec   Loss 0.8376   LearningRate 0.0007   Epoch: 18   Global Step: 227800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:30:15,607-Speed 3253.37 samples/sec   Loss 0.8133   LearningRate 0.0007   Epoch: 18   Global Step: 227810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:30:18,711-Speed 3299.43 samples/sec   Loss 0.7974   LearningRate 0.0007   Epoch: 18   Global Step: 227820   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:30:21,770-Speed 3348.82 samples/sec   Loss 0.8577   LearningRate 0.0007   Epoch: 18   Global Step: 227830   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:30:24,894-Speed 3278.58 samples/sec   Loss 0.8634   LearningRate 0.0007   Epoch: 18   Global Step: 227840   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:30:27,951-Speed 3350.50 samples/sec   Loss 0.8523   LearningRate 0.0007   Epoch: 18   Global Step: 227850   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:30:31,043-Speed 3313.56 samples/sec   Loss 0.8344   LearningRate 0.0007   Epoch: 18   Global Step: 227860   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:30:34,101-Speed 3349.55 samples/sec   Loss 0.8426   LearningRate 0.0007   Epoch: 18   Global Step: 227870   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:30:37,238-Speed 3264.95 samples/sec   Loss 0.8356   LearningRate 0.0007   Epoch: 18   Global Step: 227880   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:30:40,311-Speed 3332.86 samples/sec   Loss 0.9002   LearningRate 0.0007   Epoch: 18   Global Step: 227890   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:30:43,462-Speed 3250.47 samples/sec   Loss 0.8365   LearningRate 0.0007   Epoch: 18   Global Step: 227900   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:30:46,521-Speed 3348.93 samples/sec   Loss 0.8500   LearningRate 0.0007   Epoch: 18   Global Step: 227910   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:30:49,607-Speed 3319.44 samples/sec   Loss 0.8203   LearningRate 0.0007   Epoch: 18   Global Step: 227920   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:30:52,663-Speed 3351.86 samples/sec   Loss 0.8168   LearningRate 0.0007   Epoch: 18   Global Step: 227930   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:30:55,734-Speed 3335.00 samples/sec   Loss 0.8446   LearningRate 0.0007   Epoch: 18   Global Step: 227940   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:30:58,793-Speed 3348.69 samples/sec   Loss 0.8569   LearningRate 0.0007   Epoch: 18   Global Step: 227950   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:31:01,870-Speed 3329.21 samples/sec   Loss 0.8411   LearningRate 0.0007   Epoch: 18   Global Step: 227960   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:31:04,959-Speed 3316.30 samples/sec   Loss 0.8572   LearningRate 0.0007   Epoch: 18   Global Step: 227970   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:31:08,065-Speed 3297.09 samples/sec   Loss 0.8329   LearningRate 0.0007   Epoch: 18   Global Step: 227980   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:31:11,131-Speed 3341.02 samples/sec   Loss 0.8761   LearningRate 0.0007   Epoch: 18   Global Step: 227990   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:31:14,248-Speed 3286.61 samples/sec   Loss 0.8209   LearningRate 0.0007   Epoch: 18   Global Step: 228000   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:31:17,303-Speed 3353.29 samples/sec   Loss 0.8725   LearningRate 0.0007   Epoch: 18   Global Step: 228010   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:31:20,364-Speed 3355.21 samples/sec   Loss 0.8290   LearningRate 0.0007   Epoch: 18   Global Step: 228020   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:31:23,465-Speed 3302.14 samples/sec   Loss 0.8383   LearningRate 0.0007   Epoch: 18   Global Step: 228030   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:31:26,499-Speed 3376.32 samples/sec   Loss 0.8223   LearningRate 0.0007   Epoch: 18   Global Step: 228040   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:31:29,670-Speed 3230.31 samples/sec   Loss 0.8508   LearningRate 0.0007   Epoch: 18   Global Step: 228050   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:31:32,755-Speed 3321.29 samples/sec   Loss 0.8449   LearningRate 0.0007   Epoch: 18   Global Step: 228060   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:31:35,897-Speed 3260.01 samples/sec   Loss 0.8615   LearningRate 0.0007   Epoch: 18   Global Step: 228070   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:31:38,955-Speed 3349.50 samples/sec   Loss 0.8402   LearningRate 0.0007   Epoch: 18   Global Step: 228080   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:31:42,009-Speed 3353.43 samples/sec   Loss 0.8507   LearningRate 0.0007   Epoch: 18   Global Step: 228090   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:31:45,100-Speed 3314.62 samples/sec   Loss 0.8327   LearningRate 0.0007   Epoch: 18   Global Step: 228100   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:31:48,238-Speed 3264.24 samples/sec   Loss 0.8424   LearningRate 0.0007   Epoch: 18   Global Step: 228110   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:31:51,296-Speed 3349.85 samples/sec   Loss 0.8364   LearningRate 0.0007   Epoch: 18   Global Step: 228120   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:31:54,402-Speed 3297.51 samples/sec   Loss 0.8839   LearningRate 0.0007   Epoch: 18   Global Step: 228130   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:31:57,455-Speed 3354.94 samples/sec   Loss 0.8270   LearningRate 0.0007   Epoch: 18   Global Step: 228140   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:32:00,526-Speed 3336.16 samples/sec   Loss 0.8508   LearningRate 0.0007   Epoch: 18   Global Step: 228150   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:32:03,722-Speed 3204.38 samples/sec   Loss 0.7981   LearningRate 0.0007   Epoch: 18   Global Step: 228160   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:32:06,850-Speed 3275.09 samples/sec   Loss 0.8220   LearningRate 0.0007   Epoch: 18   Global Step: 228170   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:32:09,941-Speed 3313.60 samples/sec   Loss 0.8326   LearningRate 0.0007   Epoch: 18   Global Step: 228180   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:32:13,057-Speed 3287.39 samples/sec   Loss 0.8481   LearningRate 0.0007   Epoch: 18   Global Step: 228190   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:32:16,146-Speed 3316.32 samples/sec   Loss 0.8882   LearningRate 0.0007   Epoch: 18   Global Step: 228200   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:32:19,310-Speed 3237.26 samples/sec   Loss 0.8707   LearningRate 0.0007   Epoch: 18   Global Step: 228210   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:32:22,403-Speed 3311.33 samples/sec   Loss 0.8315   LearningRate 0.0007   Epoch: 18   Global Step: 228220   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:32:25,521-Speed 3285.99 samples/sec   Loss 0.8641   LearningRate 0.0007   Epoch: 18   Global Step: 228230   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:32:28,687-Speed 3235.04 samples/sec   Loss 0.8281   LearningRate 0.0007   Epoch: 18   Global Step: 228240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:32:31,806-Speed 3284.20 samples/sec   Loss 0.8765   LearningRate 0.0007   Epoch: 18   Global Step: 228250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:32:34,910-Speed 3299.71 samples/sec   Loss 0.8256   LearningRate 0.0007   Epoch: 18   Global Step: 228260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:32:38,033-Speed 3280.10 samples/sec   Loss 0.8457   LearningRate 0.0007   Epoch: 18   Global Step: 228270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:32:41,133-Speed 3303.79 samples/sec   Loss 0.8677   LearningRate 0.0007   Epoch: 18   Global Step: 228280   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:32:44,255-Speed 3281.40 samples/sec   Loss 0.8191   LearningRate 0.0007   Epoch: 18   Global Step: 228290   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:32:47,373-Speed 3284.59 samples/sec   Loss 0.8245   LearningRate 0.0007   Epoch: 18   Global Step: 228300   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:32:50,556-Speed 3218.25 samples/sec   Loss 0.8186   LearningRate 0.0007   Epoch: 18   Global Step: 228310   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:32:53,657-Speed 3303.30 samples/sec   Loss 0.8183   LearningRate 0.0007   Epoch: 18   Global Step: 228320   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:32:56,716-Speed 3348.36 samples/sec   Loss 0.8590   LearningRate 0.0007   Epoch: 18   Global Step: 228330   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:32:59,795-Speed 3327.33 samples/sec   Loss 0.8540   LearningRate 0.0007   Epoch: 18   Global Step: 228340   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:33:02,920-Speed 3277.24 samples/sec   Loss 0.8304   LearningRate 0.0007   Epoch: 18   Global Step: 228350   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:33:06,073-Speed 3249.34 samples/sec   Loss 0.8322   LearningRate 0.0007   Epoch: 18   Global Step: 228360   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:33:09,206-Speed 3269.55 samples/sec   Loss 0.8610   LearningRate 0.0007   Epoch: 18   Global Step: 228370   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:33:12,277-Speed 3334.62 samples/sec   Loss 0.8273   LearningRate 0.0007   Epoch: 18   Global Step: 228380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:33:15,382-Speed 3299.38 samples/sec   Loss 0.8380   LearningRate 0.0007   Epoch: 18   Global Step: 228390   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:33:18,551-Speed 3232.70 samples/sec   Loss 0.8545   LearningRate 0.0006   Epoch: 18   Global Step: 228400   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:33:21,633-Speed 3322.78 samples/sec   Loss 0.8463   LearningRate 0.0006   Epoch: 18   Global Step: 228410   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:33:24,725-Speed 3313.36 samples/sec   Loss 0.8210   LearningRate 0.0006   Epoch: 18   Global Step: 228420   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:33:27,878-Speed 3248.96 samples/sec   Loss 0.8380   LearningRate 0.0006   Epoch: 18   Global Step: 228430   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:33:30,940-Speed 3344.22 samples/sec   Loss 0.8274   LearningRate 0.0006   Epoch: 18   Global Step: 228440   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:33:34,026-Speed 3319.92 samples/sec   Loss 0.8251   LearningRate 0.0006   Epoch: 18   Global Step: 228450   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:33:37,122-Speed 3307.88 samples/sec   Loss 0.8418   LearningRate 0.0006   Epoch: 18   Global Step: 228460   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:33:40,177-Speed 3353.38 samples/sec   Loss 0.8216   LearningRate 0.0006   Epoch: 18   Global Step: 228470   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:33:43,275-Speed 3305.69 samples/sec   Loss 0.8732   LearningRate 0.0006   Epoch: 18   Global Step: 228480   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:33:46,361-Speed 3319.81 samples/sec   Loss 0.8944   LearningRate 0.0006   Epoch: 18   Global Step: 228490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:33:49,466-Speed 3298.61 samples/sec   Loss 0.8312   LearningRate 0.0006   Epoch: 18   Global Step: 228500   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:33:52,556-Speed 3314.87 samples/sec   Loss 0.8732   LearningRate 0.0006   Epoch: 18   Global Step: 228510   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:33:55,714-Speed 3244.42 samples/sec   Loss 0.8258   LearningRate 0.0006   Epoch: 18   Global Step: 228520   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:33:58,791-Speed 3328.54 samples/sec   Loss 0.7978   LearningRate 0.0006   Epoch: 18   Global Step: 228530   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:34:01,938-Speed 3255.33 samples/sec   Loss 0.8529   LearningRate 0.0006   Epoch: 18   Global Step: 228540   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:34:05,041-Speed 3300.95 samples/sec   Loss 0.8488   LearningRate 0.0006   Epoch: 18   Global Step: 228550   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:34:08,095-Speed 3354.01 samples/sec   Loss 0.8618   LearningRate 0.0006   Epoch: 18   Global Step: 228560   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:34:11,152-Speed 3350.35 samples/sec   Loss 0.8443   LearningRate 0.0006   Epoch: 18   Global Step: 228570   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:34:14,261-Speed 3295.06 samples/sec   Loss 0.8374   LearningRate 0.0006   Epoch: 18   Global Step: 228580   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:34:17,408-Speed 3254.65 samples/sec   Loss 0.8831   LearningRate 0.0006   Epoch: 18   Global Step: 228590   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:34:20,490-Speed 3323.62 samples/sec   Loss 0.8304   LearningRate 0.0006   Epoch: 18   Global Step: 228600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:34:23,595-Speed 3299.62 samples/sec   Loss 0.8164   LearningRate 0.0006   Epoch: 18   Global Step: 228610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:34:26,736-Speed 3261.06 samples/sec   Loss 0.8643   LearningRate 0.0006   Epoch: 18   Global Step: 228620   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:34:29,892-Speed 3245.75 samples/sec   Loss 0.8522   LearningRate 0.0006   Epoch: 18   Global Step: 228630   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:34:32,957-Speed 3340.84 samples/sec   Loss 0.8657   LearningRate 0.0006   Epoch: 18   Global Step: 228640   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:34:36,062-Speed 3299.80 samples/sec   Loss 0.8595   LearningRate 0.0006   Epoch: 18   Global Step: 228650   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:34:39,197-Speed 3267.53 samples/sec   Loss 0.8553   LearningRate 0.0006   Epoch: 18   Global Step: 228660   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:34:42,433-Speed 3165.11 samples/sec   Loss 0.8775   LearningRate 0.0006   Epoch: 18   Global Step: 228670   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:34:45,498-Speed 3342.06 samples/sec   Loss 0.8469   LearningRate 0.0006   Epoch: 18   Global Step: 228680   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:34:48,617-Speed 3283.63 samples/sec   Loss 0.8505   LearningRate 0.0006   Epoch: 18   Global Step: 228690   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:34:51,736-Speed 3284.25 samples/sec   Loss 0.8962   LearningRate 0.0006   Epoch: 18   Global Step: 228700   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:34:54,823-Speed 3318.28 samples/sec   Loss 0.8105   LearningRate 0.0006   Epoch: 18   Global Step: 228710   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:34:57,868-Speed 3364.09 samples/sec   Loss 0.8450   LearningRate 0.0006   Epoch: 18   Global Step: 228720   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:35:01,083-Speed 3186.38 samples/sec   Loss 0.8391   LearningRate 0.0006   Epoch: 18   Global Step: 228730   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:35:04,193-Speed 3293.82 samples/sec   Loss 0.8356   LearningRate 0.0006   Epoch: 18   Global Step: 228740   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:35:07,297-Speed 3299.83 samples/sec   Loss 0.8455   LearningRate 0.0006   Epoch: 18   Global Step: 228750   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:35:10,393-Speed 3308.14 samples/sec   Loss 0.8593   LearningRate 0.0006   Epoch: 18   Global Step: 228760   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:35:13,600-Speed 3194.54 samples/sec   Loss 0.8129   LearningRate 0.0006   Epoch: 18   Global Step: 228770   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:35:16,689-Speed 3315.83 samples/sec   Loss 0.8472   LearningRate 0.0006   Epoch: 18   Global Step: 228780   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:35:19,767-Speed 3328.01 samples/sec   Loss 0.8174   LearningRate 0.0006   Epoch: 18   Global Step: 228790   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:35:22,918-Speed 3250.21 samples/sec   Loss 0.8380   LearningRate 0.0006   Epoch: 18   Global Step: 228800   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-27 21:35:26,052-Speed 3268.43 samples/sec   Loss 0.8682   LearningRate 0.0006   Epoch: 18   Global Step: 228810   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-27 21:35:29,158-Speed 3298.38 samples/sec   Loss 0.8898   LearningRate 0.0006   Epoch: 18   Global Step: 228820   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-27 21:35:32,275-Speed 3286.62 samples/sec   Loss 0.8663   LearningRate 0.0006   Epoch: 18   Global Step: 228830   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-27 21:35:35,372-Speed 3307.61 samples/sec   Loss 0.8793   LearningRate 0.0006   Epoch: 18   Global Step: 228840   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-27 21:35:38,527-Speed 3246.66 samples/sec   Loss 0.8257   LearningRate 0.0006   Epoch: 18   Global Step: 228850   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-27 21:35:41,632-Speed 3298.51 samples/sec   Loss 0.8656   LearningRate 0.0006   Epoch: 18   Global Step: 228860   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-27 21:35:44,718-Speed 3318.63 samples/sec   Loss 0.8414   LearningRate 0.0006   Epoch: 18   Global Step: 228870   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-27 21:35:47,836-Speed 3286.07 samples/sec   Loss 0.8407   LearningRate 0.0006   Epoch: 18   Global Step: 228880   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-27 21:35:51,023-Speed 3213.91 samples/sec   Loss 0.8570   LearningRate 0.0006   Epoch: 18   Global Step: 228890   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-27 21:35:54,138-Speed 3288.34 samples/sec   Loss 0.8109   LearningRate 0.0006   Epoch: 18   Global Step: 228900   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:35:57,206-Speed 3338.80 samples/sec   Loss 0.8877   LearningRate 0.0006   Epoch: 18   Global Step: 228910   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:36:00,272-Speed 3341.00 samples/sec   Loss 0.8083   LearningRate 0.0006   Epoch: 18   Global Step: 228920   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:36:03,429-Speed 3244.08 samples/sec   Loss 0.8503   LearningRate 0.0006   Epoch: 18   Global Step: 228930   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:36:06,579-Speed 3252.35 samples/sec   Loss 0.8509   LearningRate 0.0006   Epoch: 18   Global Step: 228940   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:36:09,652-Speed 3333.13 samples/sec   Loss 0.8561   LearningRate 0.0006   Epoch: 18   Global Step: 228950   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:36:12,744-Speed 3313.02 samples/sec   Loss 0.8902   LearningRate 0.0006   Epoch: 18   Global Step: 228960   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:36:15,905-Speed 3240.36 samples/sec   Loss 0.8453   LearningRate 0.0006   Epoch: 18   Global Step: 228970   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:36:19,053-Speed 3253.94 samples/sec   Loss 0.8302   LearningRate 0.0006   Epoch: 18   Global Step: 228980   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:36:22,132-Speed 3327.08 samples/sec   Loss 0.8347   LearningRate 0.0006   Epoch: 18   Global Step: 228990   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:36:25,209-Speed 3327.79 samples/sec   Loss 0.8505   LearningRate 0.0006   Epoch: 18   Global Step: 229000   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:36:28,399-Speed 3211.38 samples/sec   Loss 0.8287   LearningRate 0.0006   Epoch: 18   Global Step: 229010   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:36:31,552-Speed 3249.13 samples/sec   Loss 0.8827   LearningRate 0.0006   Epoch: 18   Global Step: 229020   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:36:34,662-Speed 3293.00 samples/sec   Loss 0.8240   LearningRate 0.0006   Epoch: 18   Global Step: 229030   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:36:37,808-Speed 3256.55 samples/sec   Loss 0.8553   LearningRate 0.0006   Epoch: 18   Global Step: 229040   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:36:40,919-Speed 3292.26 samples/sec   Loss 0.8814   LearningRate 0.0006   Epoch: 18   Global Step: 229050   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:36:44,040-Speed 3282.64 samples/sec   Loss 0.8431   LearningRate 0.0006   Epoch: 18   Global Step: 229060   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:36:47,096-Speed 3351.05 samples/sec   Loss 0.8484   LearningRate 0.0006   Epoch: 18   Global Step: 229070   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:36:50,168-Speed 3334.80 samples/sec   Loss 0.8487   LearningRate 0.0006   Epoch: 18   Global Step: 229080   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:36:53,307-Speed 3262.60 samples/sec   Loss 0.8725   LearningRate 0.0006   Epoch: 18   Global Step: 229090   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:36:56,368-Speed 3346.46 samples/sec   Loss 0.8679   LearningRate 0.0006   Epoch: 18   Global Step: 229100   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:36:59,464-Speed 3309.44 samples/sec   Loss 0.8450   LearningRate 0.0006   Epoch: 18   Global Step: 229110   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:37:02,601-Speed 3265.31 samples/sec   Loss 0.8054   LearningRate 0.0006   Epoch: 18   Global Step: 229120   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:37:05,743-Speed 3259.59 samples/sec   Loss 0.8511   LearningRate 0.0006   Epoch: 18   Global Step: 229130   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:37:08,815-Speed 3335.13 samples/sec   Loss 0.8394   LearningRate 0.0006   Epoch: 18   Global Step: 229140   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:37:11,962-Speed 3254.32 samples/sec   Loss 0.8382   LearningRate 0.0006   Epoch: 18   Global Step: 229150   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:37:15,111-Speed 3252.78 samples/sec   Loss 0.8533   LearningRate 0.0006   Epoch: 18   Global Step: 229160   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:37:18,240-Speed 3273.99 samples/sec   Loss 0.8366   LearningRate 0.0006   Epoch: 18   Global Step: 229170   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:37:21,326-Speed 3319.26 samples/sec   Loss 0.8109   LearningRate 0.0006   Epoch: 18   Global Step: 229180   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:37:24,504-Speed 3223.35 samples/sec   Loss 0.8356   LearningRate 0.0006   Epoch: 18   Global Step: 229190   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:37:27,661-Speed 3243.95 samples/sec   Loss 0.8147   LearningRate 0.0006   Epoch: 18   Global Step: 229200   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:37:30,799-Speed 3264.84 samples/sec   Loss 0.8351   LearningRate 0.0006   Epoch: 18   Global Step: 229210   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:37:33,862-Speed 3343.94 samples/sec   Loss 0.8285   LearningRate 0.0006   Epoch: 18   Global Step: 229220   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:37:36,958-Speed 3308.70 samples/sec   Loss 0.8413   LearningRate 0.0006   Epoch: 18   Global Step: 229230   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:37:40,046-Speed 3317.65 samples/sec   Loss 0.8170   LearningRate 0.0006   Epoch: 18   Global Step: 229240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:37:43,183-Speed 3264.83 samples/sec   Loss 0.8429   LearningRate 0.0006   Epoch: 18   Global Step: 229250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:37:46,303-Speed 3283.57 samples/sec   Loss 0.8363   LearningRate 0.0006   Epoch: 18   Global Step: 229260   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:37:49,396-Speed 3310.99 samples/sec   Loss 0.8296   LearningRate 0.0006   Epoch: 18   Global Step: 229270   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:37:52,505-Speed 3295.11 samples/sec   Loss 0.8576   LearningRate 0.0006   Epoch: 18   Global Step: 229280   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:37:55,613-Speed 3295.65 samples/sec   Loss 0.8230   LearningRate 0.0006   Epoch: 18   Global Step: 229290   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:37:58,708-Speed 3309.71 samples/sec   Loss 0.8541   LearningRate 0.0006   Epoch: 18   Global Step: 229300   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:38:01,790-Speed 3324.25 samples/sec   Loss 0.8494   LearningRate 0.0006   Epoch: 18   Global Step: 229310   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:38:04,881-Speed 3314.06 samples/sec   Loss 0.8351   LearningRate 0.0006   Epoch: 18   Global Step: 229320   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:38:07,978-Speed 3307.73 samples/sec   Loss 0.8466   LearningRate 0.0006   Epoch: 18   Global Step: 229330   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:38:11,094-Speed 3286.75 samples/sec   Loss 0.8184   LearningRate 0.0006   Epoch: 18   Global Step: 229340   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:38:14,175-Speed 3325.17 samples/sec   Loss 0.8244   LearningRate 0.0006   Epoch: 18   Global Step: 229350   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:38:17,296-Speed 3281.54 samples/sec   Loss 0.8475   LearningRate 0.0006   Epoch: 18   Global Step: 229360   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:38:20,423-Speed 3276.40 samples/sec   Loss 0.8474   LearningRate 0.0006   Epoch: 18   Global Step: 229370   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:38:23,601-Speed 3222.64 samples/sec   Loss 0.8814   LearningRate 0.0006   Epoch: 18   Global Step: 229380   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:38:26,714-Speed 3291.03 samples/sec   Loss 0.8286   LearningRate 0.0006   Epoch: 18   Global Step: 229390   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:38:29,850-Speed 3266.22 samples/sec   Loss 0.8831   LearningRate 0.0006   Epoch: 18   Global Step: 229400   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:38:32,928-Speed 3327.39 samples/sec   Loss 0.8362   LearningRate 0.0006   Epoch: 18   Global Step: 229410   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:38:36,042-Speed 3289.66 samples/sec   Loss 0.8383   LearningRate 0.0006   Epoch: 18   Global Step: 229420   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:38:39,117-Speed 3330.86 samples/sec   Loss 0.8582   LearningRate 0.0006   Epoch: 18   Global Step: 229430   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:38:42,256-Speed 3262.93 samples/sec   Loss 0.8623   LearningRate 0.0006   Epoch: 18   Global Step: 229440   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:38:45,352-Speed 3308.58 samples/sec   Loss 0.8256   LearningRate 0.0006   Epoch: 18   Global Step: 229450   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:38:48,490-Speed 3264.96 samples/sec   Loss 0.8100   LearningRate 0.0006   Epoch: 18   Global Step: 229460   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:38:51,572-Speed 3323.81 samples/sec   Loss 0.8672   LearningRate 0.0006   Epoch: 18   Global Step: 229470   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:38:54,660-Speed 3316.64 samples/sec   Loss 0.8549   LearningRate 0.0006   Epoch: 18   Global Step: 229480   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:38:57,722-Speed 3344.68 samples/sec   Loss 0.8348   LearningRate 0.0006   Epoch: 18   Global Step: 229490   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:39:00,872-Speed 3251.84 samples/sec   Loss 0.8199   LearningRate 0.0006   Epoch: 18   Global Step: 229500   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:39:03,992-Speed 3283.79 samples/sec   Loss 0.8577   LearningRate 0.0006   Epoch: 18   Global Step: 229510   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:39:07,134-Speed 3260.00 samples/sec   Loss 0.8440   LearningRate 0.0006   Epoch: 18   Global Step: 229520   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:39:10,183-Speed 3358.79 samples/sec   Loss 0.8569   LearningRate 0.0006   Epoch: 18   Global Step: 229530   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:39:13,262-Speed 3326.95 samples/sec   Loss 0.8598   LearningRate 0.0006   Epoch: 18   Global Step: 229540   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:39:16,328-Speed 3341.34 samples/sec   Loss 0.8400   LearningRate 0.0006   Epoch: 18   Global Step: 229550   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:39:19,458-Speed 3272.38 samples/sec   Loss 0.8074   LearningRate 0.0006   Epoch: 18   Global Step: 229560   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:39:22,535-Speed 3328.58 samples/sec   Loss 0.8663   LearningRate 0.0006   Epoch: 18   Global Step: 229570   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:39:25,605-Speed 3336.97 samples/sec   Loss 0.8492   LearningRate 0.0006   Epoch: 18   Global Step: 229580   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:39:28,743-Speed 3263.68 samples/sec   Loss 0.8333   LearningRate 0.0006   Epoch: 18   Global Step: 229590   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:39:31,865-Speed 3281.43 samples/sec   Loss 0.8105   LearningRate 0.0006   Epoch: 18   Global Step: 229600   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:39:34,928-Speed 3343.64 samples/sec   Loss 0.8190   LearningRate 0.0006   Epoch: 18   Global Step: 229610   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:39:38,044-Speed 3287.66 samples/sec   Loss 0.8125   LearningRate 0.0006   Epoch: 18   Global Step: 229620   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:39:41,153-Speed 3294.80 samples/sec   Loss 0.8125   LearningRate 0.0006   Epoch: 18   Global Step: 229630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:39:44,246-Speed 3312.34 samples/sec   Loss 0.8192   LearningRate 0.0006   Epoch: 18   Global Step: 229640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:39:47,367-Speed 3281.52 samples/sec   Loss 0.8593   LearningRate 0.0006   Epoch: 18   Global Step: 229650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:39:50,481-Speed 3290.01 samples/sec   Loss 0.8315   LearningRate 0.0006   Epoch: 18   Global Step: 229660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:39:53,557-Speed 3329.37 samples/sec   Loss 0.8456   LearningRate 0.0006   Epoch: 18   Global Step: 229670   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:39:56,608-Speed 3358.63 samples/sec   Loss 0.7931   LearningRate 0.0006   Epoch: 18   Global Step: 229680   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:39:59,686-Speed 3327.26 samples/sec   Loss 0.8240   LearningRate 0.0006   Epoch: 18   Global Step: 229690   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:40:02,758-Speed 3334.49 samples/sec   Loss 0.8051   LearningRate 0.0006   Epoch: 18   Global Step: 229700   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:40:05,914-Speed 3245.25 samples/sec   Loss 0.8263   LearningRate 0.0006   Epoch: 18   Global Step: 229710   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:40:08,995-Speed 3325.57 samples/sec   Loss 0.8379   LearningRate 0.0006   Epoch: 18   Global Step: 229720   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:40:12,058-Speed 3344.18 samples/sec   Loss 0.8585   LearningRate 0.0006   Epoch: 18   Global Step: 229730   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:40:15,201-Speed 3259.22 samples/sec   Loss 0.8306   LearningRate 0.0006   Epoch: 18   Global Step: 229740   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:40:18,313-Speed 3291.73 samples/sec   Loss 0.8565   LearningRate 0.0006   Epoch: 18   Global Step: 229750   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:40:21,417-Speed 3299.26 samples/sec   Loss 0.8329   LearningRate 0.0006   Epoch: 18   Global Step: 229760   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:40:24,498-Speed 3324.27 samples/sec   Loss 0.8398   LearningRate 0.0006   Epoch: 18   Global Step: 229770   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:40:27,645-Speed 3255.49 samples/sec   Loss 0.8550   LearningRate 0.0006   Epoch: 18   Global Step: 229780   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:40:30,809-Speed 3237.78 samples/sec   Loss 0.8732   LearningRate 0.0006   Epoch: 18   Global Step: 229790   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:40:33,899-Speed 3314.74 samples/sec   Loss 0.7926   LearningRate 0.0006   Epoch: 18   Global Step: 229800   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:40:36,988-Speed 3315.98 samples/sec   Loss 0.8621   LearningRate 0.0006   Epoch: 18   Global Step: 229810   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:40:40,096-Speed 3295.25 samples/sec   Loss 0.8241   LearningRate 0.0006   Epoch: 18   Global Step: 229820   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:40:43,193-Speed 3307.63 samples/sec   Loss 0.8424   LearningRate 0.0006   Epoch: 18   Global Step: 229830   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:40:46,248-Speed 3353.36 samples/sec   Loss 0.8583   LearningRate 0.0006   Epoch: 18   Global Step: 229840   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:40:49,344-Speed 3308.40 samples/sec   Loss 0.8586   LearningRate 0.0006   Epoch: 18   Global Step: 229850   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:40:52,453-Speed 3294.91 samples/sec   Loss 0.8342   LearningRate 0.0006   Epoch: 18   Global Step: 229860   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:40:55,570-Speed 3286.15 samples/sec   Loss 0.8538   LearningRate 0.0006   Epoch: 18   Global Step: 229870   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:40:58,647-Speed 3329.50 samples/sec   Loss 0.8798   LearningRate 0.0006   Epoch: 18   Global Step: 229880   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:41:01,743-Speed 3307.54 samples/sec   Loss 0.8689   LearningRate 0.0006   Epoch: 18   Global Step: 229890   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:41:04,889-Speed 3256.43 samples/sec   Loss 0.8531   LearningRate 0.0006   Epoch: 18   Global Step: 229900   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:41:08,048-Speed 3242.59 samples/sec   Loss 0.8658   LearningRate 0.0006   Epoch: 18   Global Step: 229910   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:41:11,139-Speed 3313.75 samples/sec   Loss 0.8274   LearningRate 0.0006   Epoch: 18   Global Step: 229920   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:41:14,251-Speed 3291.94 samples/sec   Loss 0.7980   LearningRate 0.0006   Epoch: 18   Global Step: 229930   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:41:17,325-Speed 3332.15 samples/sec   Loss 0.8106   LearningRate 0.0006   Epoch: 18   Global Step: 229940   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:41:20,409-Speed 3321.82 samples/sec   Loss 0.8455   LearningRate 0.0006   Epoch: 18   Global Step: 229950   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:41:23,530-Speed 3282.09 samples/sec   Loss 0.8292   LearningRate 0.0006   Epoch: 18   Global Step: 229960   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:41:26,653-Speed 3279.92 samples/sec   Loss 0.8131   LearningRate 0.0006   Epoch: 18   Global Step: 229970   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:41:29,744-Speed 3313.28 samples/sec   Loss 0.8439   LearningRate 0.0006   Epoch: 18   Global Step: 229980   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:41:32,815-Speed 3336.34 samples/sec   Loss 0.8306   LearningRate 0.0006   Epoch: 18   Global Step: 229990   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:41:35,898-Speed 3322.15 samples/sec   Loss 0.8144   LearningRate 0.0005   Epoch: 18   Global Step: 230000   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:41:39,050-Speed 3251.26 samples/sec   Loss 0.8401   LearningRate 0.0005   Epoch: 18   Global Step: 230010   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:41:42,138-Speed 3317.43 samples/sec   Loss 0.8621   LearningRate 0.0005   Epoch: 18   Global Step: 230020   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:41:45,253-Speed 3288.51 samples/sec   Loss 0.8531   LearningRate 0.0005   Epoch: 18   Global Step: 230030   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:41:48,397-Speed 3258.83 samples/sec   Loss 0.8136   LearningRate 0.0005   Epoch: 18   Global Step: 230040   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:41:51,482-Speed 3319.81 samples/sec   Loss 0.8452   LearningRate 0.0005   Epoch: 18   Global Step: 230050   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:41:54,653-Speed 3230.51 samples/sec   Loss 0.8671   LearningRate 0.0005   Epoch: 18   Global Step: 230060   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:41:57,753-Speed 3304.55 samples/sec   Loss 0.8146   LearningRate 0.0005   Epoch: 18   Global Step: 230070   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:42:00,852-Speed 3305.07 samples/sec   Loss 0.8636   LearningRate 0.0005   Epoch: 18   Global Step: 230080   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:42:04,042-Speed 3210.48 samples/sec   Loss 0.8146   LearningRate 0.0005   Epoch: 18   Global Step: 230090   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:42:07,213-Speed 3230.53 samples/sec   Loss 0.8383   LearningRate 0.0005   Epoch: 18   Global Step: 230100   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:42:10,300-Speed 3318.08 samples/sec   Loss 0.8907   LearningRate 0.0005   Epoch: 18   Global Step: 230110   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:42:13,393-Speed 3312.44 samples/sec   Loss 0.8137   LearningRate 0.0005   Epoch: 18   Global Step: 230120   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:42:16,512-Speed 3283.95 samples/sec   Loss 0.8634   LearningRate 0.0005   Epoch: 18   Global Step: 230130   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:42:19,630-Speed 3284.76 samples/sec   Loss 0.8564   LearningRate 0.0005   Epoch: 18   Global Step: 230140   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:42:22,685-Speed 3352.58 samples/sec   Loss 0.8357   LearningRate 0.0005   Epoch: 18   Global Step: 230150   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:42:25,794-Speed 3295.12 samples/sec   Loss 0.8306   LearningRate 0.0005   Epoch: 18   Global Step: 230160   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:42:28,903-Speed 3294.99 samples/sec   Loss 0.8301   LearningRate 0.0005   Epoch: 18   Global Step: 230170   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:42:31,994-Speed 3313.56 samples/sec   Loss 0.8448   LearningRate 0.0005   Epoch: 18   Global Step: 230180   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:42:35,109-Speed 3287.56 samples/sec   Loss 0.7795   LearningRate 0.0005   Epoch: 18   Global Step: 230190   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:42:38,146-Speed 3374.10 samples/sec   Loss 0.8529   LearningRate 0.0005   Epoch: 18   Global Step: 230200   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:42:41,257-Speed 3292.38 samples/sec   Loss 0.7920   LearningRate 0.0005   Epoch: 18   Global Step: 230210   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:42:44,373-Speed 3286.78 samples/sec   Loss 0.8385   LearningRate 0.0005   Epoch: 18   Global Step: 230220   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:42:47,528-Speed 3246.90 samples/sec   Loss 0.8400   LearningRate 0.0005   Epoch: 18   Global Step: 230230   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:42:50,690-Speed 3239.73 samples/sec   Loss 0.8264   LearningRate 0.0005   Epoch: 18   Global Step: 230240   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:42:53,832-Speed 3259.38 samples/sec   Loss 0.8501   LearningRate 0.0005   Epoch: 18   Global Step: 230250   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:42:56,888-Speed 3351.74 samples/sec   Loss 0.8412   LearningRate 0.0005   Epoch: 18   Global Step: 230260   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:43:00,024-Speed 3267.09 samples/sec   Loss 0.8325   LearningRate 0.0005   Epoch: 18   Global Step: 230270   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:43:03,109-Speed 3320.45 samples/sec   Loss 0.8530   LearningRate 0.0005   Epoch: 18   Global Step: 230280   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:43:06,186-Speed 3328.30 samples/sec   Loss 0.8207   LearningRate 0.0005   Epoch: 18   Global Step: 230290   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:43:09,236-Speed 3358.23 samples/sec   Loss 0.8309   LearningRate 0.0005   Epoch: 18   Global Step: 230300   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:43:12,306-Speed 3337.42 samples/sec   Loss 0.8269   LearningRate 0.0005   Epoch: 18   Global Step: 230310   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:43:15,380-Speed 3331.74 samples/sec   Loss 0.8505   LearningRate 0.0005   Epoch: 18   Global Step: 230320   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:43:18,438-Speed 3350.05 samples/sec   Loss 0.8072   LearningRate 0.0005   Epoch: 18   Global Step: 230330   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:43:21,523-Speed 3320.28 samples/sec   Loss 0.8462   LearningRate 0.0005   Epoch: 18   Global Step: 230340   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:43:24,654-Speed 3271.63 samples/sec   Loss 0.8244   LearningRate 0.0005   Epoch: 18   Global Step: 230350   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:43:27,727-Speed 3333.55 samples/sec   Loss 0.7989   LearningRate 0.0005   Epoch: 18   Global Step: 230360   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:43:30,804-Speed 3327.95 samples/sec   Loss 0.8277   LearningRate 0.0005   Epoch: 18   Global Step: 230370   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:43:33,864-Speed 3347.51 samples/sec   Loss 0.8675   LearningRate 0.0005   Epoch: 18   Global Step: 230380   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:43:36,905-Speed 3369.47 samples/sec   Loss 0.8399   LearningRate 0.0005   Epoch: 18   Global Step: 230390   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-27 21:43:40,039-Speed 3268.56 samples/sec   Loss 0.8416   LearningRate 0.0005   Epoch: 18   Global Step: 230400   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-27 21:43:43,133-Speed 3311.39 samples/sec   Loss 0.8345   LearningRate 0.0005   Epoch: 18   Global Step: 230410   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-27 21:43:46,194-Speed 3346.01 samples/sec   Loss 0.8380   LearningRate 0.0005   Epoch: 18   Global Step: 230420   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-27 21:43:49,304-Speed 3294.08 samples/sec   Loss 0.8300   LearningRate 0.0005   Epoch: 18   Global Step: 230430   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-27 21:43:52,350-Speed 3363.46 samples/sec   Loss 0.8346   LearningRate 0.0005   Epoch: 18   Global Step: 230440   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-27 21:43:55,420-Speed 3336.17 samples/sec   Loss 0.8214   LearningRate 0.0005   Epoch: 18   Global Step: 230450   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-27 21:43:58,473-Speed 3355.37 samples/sec   Loss 0.8646   LearningRate 0.0005   Epoch: 18   Global Step: 230460   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-27 21:44:01,596-Speed 3279.59 samples/sec   Loss 0.8495   LearningRate 0.0005   Epoch: 18   Global Step: 230470   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-27 21:44:04,777-Speed 3220.97 samples/sec   Loss 0.8118   LearningRate 0.0005   Epoch: 18   Global Step: 230480   Fp16 Grad Scale: 4096   Required: 2 hours
Training: 2022-04-27 21:44:07,848-Speed 3335.07 samples/sec   Loss 0.8413   LearningRate 0.0005   Epoch: 18   Global Step: 230490   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:44:10,924-Speed 3329.89 samples/sec   Loss 0.8257   LearningRate 0.0005   Epoch: 18   Global Step: 230500   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:44:14,001-Speed 3329.32 samples/sec   Loss 0.8783   LearningRate 0.0005   Epoch: 18   Global Step: 230510   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:44:17,095-Speed 3310.24 samples/sec   Loss 0.8337   LearningRate 0.0005   Epoch: 18   Global Step: 230520   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:44:20,150-Speed 3353.09 samples/sec   Loss 0.8437   LearningRate 0.0005   Epoch: 18   Global Step: 230530   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:44:23,223-Speed 3333.39 samples/sec   Loss 0.8446   LearningRate 0.0005   Epoch: 18   Global Step: 230540   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:44:26,317-Speed 3310.98 samples/sec   Loss 0.8172   LearningRate 0.0005   Epoch: 18   Global Step: 230550   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:44:29,394-Speed 3328.32 samples/sec   Loss 0.8077   LearningRate 0.0005   Epoch: 18   Global Step: 230560   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:44:32,492-Speed 3306.35 samples/sec   Loss 0.7915   LearningRate 0.0005   Epoch: 18   Global Step: 230570   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:44:35,580-Speed 3317.24 samples/sec   Loss 0.8192   LearningRate 0.0005   Epoch: 18   Global Step: 230580   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:44:38,663-Speed 3323.22 samples/sec   Loss 0.8529   LearningRate 0.0005   Epoch: 18   Global Step: 230590   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:44:41,795-Speed 3269.51 samples/sec   Loss 0.8575   LearningRate 0.0005   Epoch: 18   Global Step: 230600   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:44:44,843-Speed 3361.52 samples/sec   Loss 0.8288   LearningRate 0.0005   Epoch: 18   Global Step: 230610   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:44:48,004-Speed 3240.41 samples/sec   Loss 0.8713   LearningRate 0.0005   Epoch: 18   Global Step: 230620   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:44:51,107-Speed 3301.11 samples/sec   Loss 0.8089   LearningRate 0.0005   Epoch: 18   Global Step: 230630   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:44:54,247-Speed 3262.40 samples/sec   Loss 0.8583   LearningRate 0.0005   Epoch: 18   Global Step: 230640   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:44:57,301-Speed 3353.75 samples/sec   Loss 0.7999   LearningRate 0.0005   Epoch: 18   Global Step: 230650   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:45:00,372-Speed 3335.90 samples/sec   Loss 0.8124   LearningRate 0.0005   Epoch: 18   Global Step: 230660   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:45:03,475-Speed 3300.51 samples/sec   Loss 0.7913   LearningRate 0.0005   Epoch: 18   Global Step: 230670   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:45:06,612-Speed 3265.46 samples/sec   Loss 0.8435   LearningRate 0.0005   Epoch: 18   Global Step: 230680   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-04-27 21:45:09,704-Speed 3313.33 samples/sec   Loss 0.8048   LearningRate 0.0005   Epoch: 18   Global Step: 230690   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:45:12,843-Speed 3262.98 samples/sec   Loss 0.8392   LearningRate 0.0005   Epoch: 18   Global Step: 230700   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:45:15,966-Speed 3280.04 samples/sec   Loss 0.8334   LearningRate 0.0005   Epoch: 18   Global Step: 230710   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:45:19,074-Speed 3295.01 samples/sec   Loss 0.8587   LearningRate 0.0005   Epoch: 18   Global Step: 230720   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:45:22,197-Speed 3280.89 samples/sec   Loss 0.8272   LearningRate 0.0005   Epoch: 18   Global Step: 230730   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:45:25,292-Speed 3309.61 samples/sec   Loss 0.8458   LearningRate 0.0005   Epoch: 18   Global Step: 230740   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:45:28,490-Speed 3202.76 samples/sec   Loss 0.8744   LearningRate 0.0005   Epoch: 18   Global Step: 230750   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:45:31,574-Speed 3320.44 samples/sec   Loss 0.8318   LearningRate 0.0005   Epoch: 18   Global Step: 230760   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:45:34,639-Speed 3342.73 samples/sec   Loss 0.8314   LearningRate 0.0005   Epoch: 18   Global Step: 230770   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:45:37,701-Speed 3345.72 samples/sec   Loss 0.8358   LearningRate 0.0005   Epoch: 18   Global Step: 230780   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:45:40,786-Speed 3319.63 samples/sec   Loss 0.8271   LearningRate 0.0005   Epoch: 18   Global Step: 230790   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:45:43,908-Speed 3280.75 samples/sec   Loss 0.8135   LearningRate 0.0005   Epoch: 18   Global Step: 230800   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:45:46,965-Speed 3351.64 samples/sec   Loss 0.8301   LearningRate 0.0005   Epoch: 18   Global Step: 230810   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:45:50,040-Speed 3330.34 samples/sec   Loss 0.8087   LearningRate 0.0005   Epoch: 18   Global Step: 230820   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:45:53,188-Speed 3254.00 samples/sec   Loss 0.8189   LearningRate 0.0005   Epoch: 18   Global Step: 230830   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:45:56,298-Speed 3294.08 samples/sec   Loss 0.8346   LearningRate 0.0005   Epoch: 18   Global Step: 230840   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:45:59,419-Speed 3281.88 samples/sec   Loss 0.8471   LearningRate 0.0005   Epoch: 18   Global Step: 230850   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:46:02,536-Speed 3285.76 samples/sec   Loss 0.8695   LearningRate 0.0005   Epoch: 18   Global Step: 230860   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:46:05,631-Speed 3310.14 samples/sec   Loss 0.8755   LearningRate 0.0005   Epoch: 18   Global Step: 230870   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:46:08,717-Speed 3318.46 samples/sec   Loss 0.8259   LearningRate 0.0005   Epoch: 18   Global Step: 230880   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:46:11,828-Speed 3292.90 samples/sec   Loss 0.8310   LearningRate 0.0005   Epoch: 18   Global Step: 230890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:46:14,903-Speed 3330.93 samples/sec   Loss 0.8249   LearningRate 0.0005   Epoch: 18   Global Step: 230900   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:46:18,011-Speed 3295.61 samples/sec   Loss 0.7944   LearningRate 0.0005   Epoch: 18   Global Step: 230910   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:46:21,065-Speed 3354.53 samples/sec   Loss 0.8210   LearningRate 0.0005   Epoch: 18   Global Step: 230920   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:46:24,192-Speed 3275.67 samples/sec   Loss 0.8366   LearningRate 0.0005   Epoch: 18   Global Step: 230930   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:46:27,270-Speed 3327.76 samples/sec   Loss 0.8527   LearningRate 0.0005   Epoch: 18   Global Step: 230940   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:46:30,358-Speed 3317.02 samples/sec   Loss 0.8304   LearningRate 0.0005   Epoch: 18   Global Step: 230950   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:46:33,489-Speed 3272.03 samples/sec   Loss 0.8318   LearningRate 0.0005   Epoch: 18   Global Step: 230960   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:46:36,640-Speed 3250.16 samples/sec   Loss 0.7959   LearningRate 0.0005   Epoch: 18   Global Step: 230970   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:46:39,806-Speed 3235.74 samples/sec   Loss 0.8594   LearningRate 0.0005   Epoch: 18   Global Step: 230980   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:46:42,955-Speed 3253.14 samples/sec   Loss 0.8410   LearningRate 0.0005   Epoch: 18   Global Step: 230990   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:46:46,047-Speed 3313.07 samples/sec   Loss 0.8176   LearningRate 0.0005   Epoch: 18   Global Step: 231000   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:46:49,149-Speed 3301.93 samples/sec   Loss 0.8253   LearningRate 0.0005   Epoch: 18   Global Step: 231010   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-04-27 21:46:52,264-Speed 3287.80 samples/sec   Loss 0.8773   LearningRate 0.0005   Epoch: 18   Global Step: 231020   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:46:55,348-Speed 3321.43 samples/sec   Loss 0.8020   LearningRate 0.0005   Epoch: 18   Global Step: 231030   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:46:58,470-Speed 3280.76 samples/sec   Loss 0.8190   LearningRate 0.0005   Epoch: 18   Global Step: 231040   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:47:01,692-Speed 3179.29 samples/sec   Loss 0.8569   LearningRate 0.0005   Epoch: 18   Global Step: 231050   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:47:04,836-Speed 3258.50 samples/sec   Loss 0.8492   LearningRate 0.0005   Epoch: 18   Global Step: 231060   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:47:07,945-Speed 3294.21 samples/sec   Loss 0.8153   LearningRate 0.0005   Epoch: 18   Global Step: 231070   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:47:11,073-Speed 3274.99 samples/sec   Loss 0.7942   LearningRate 0.0005   Epoch: 18   Global Step: 231080   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:47:14,165-Speed 3312.48 samples/sec   Loss 0.8446   LearningRate 0.0005   Epoch: 18   Global Step: 231090   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-04-27 21:47:17,277-Speed 3292.03 samples/sec   Loss 0.8402   LearningRate 0.0005   Epoch: 18   Global Step: 231100   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:47:20,433-Speed 3245.40 samples/sec   Loss 0.8169   LearningRate 0.0005   Epoch: 18   Global Step: 231110   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:47:23,573-Speed 3261.79 samples/sec   Loss 0.8467   LearningRate 0.0005   Epoch: 18   Global Step: 231120   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:47:26,677-Speed 3300.07 samples/sec   Loss 0.8389   LearningRate 0.0005   Epoch: 18   Global Step: 231130   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:47:29,829-Speed 3249.53 samples/sec   Loss 0.8382   LearningRate 0.0005   Epoch: 18   Global Step: 231140   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:47:32,919-Speed 3314.72 samples/sec   Loss 0.8323   LearningRate 0.0005   Epoch: 18   Global Step: 231150   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:47:36,004-Speed 3320.66 samples/sec   Loss 0.8405   LearningRate 0.0005   Epoch: 18   Global Step: 231160   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:47:39,167-Speed 3239.20 samples/sec   Loss 0.8107   LearningRate 0.0005   Epoch: 18   Global Step: 231170   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:47:42,301-Speed 3267.54 samples/sec   Loss 0.8461   LearningRate 0.0005   Epoch: 18   Global Step: 231180   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:47:45,379-Speed 3327.84 samples/sec   Loss 0.8422   LearningRate 0.0005   Epoch: 18   Global Step: 231190   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:47:48,577-Speed 3203.60 samples/sec   Loss 0.8512   LearningRate 0.0005   Epoch: 18   Global Step: 231200   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:47:51,768-Speed 3209.92 samples/sec   Loss 0.8627   LearningRate 0.0005   Epoch: 18   Global Step: 231210   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:47:54,878-Speed 3293.30 samples/sec   Loss 0.8583   LearningRate 0.0005   Epoch: 18   Global Step: 231220   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:47:57,955-Speed 3329.22 samples/sec   Loss 0.8410   LearningRate 0.0005   Epoch: 18   Global Step: 231230   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:48:01,004-Speed 3359.41 samples/sec   Loss 0.8126   LearningRate 0.0005   Epoch: 18   Global Step: 231240   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:48:04,114-Speed 3294.06 samples/sec   Loss 0.8119   LearningRate 0.0005   Epoch: 18   Global Step: 231250   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:48:07,244-Speed 3272.36 samples/sec   Loss 0.8230   LearningRate 0.0005   Epoch: 18   Global Step: 231260   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:48:10,298-Speed 3353.87 samples/sec   Loss 0.8789   LearningRate 0.0005   Epoch: 18   Global Step: 231270   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:48:13,370-Speed 3334.73 samples/sec   Loss 0.8170   LearningRate 0.0005   Epoch: 18   Global Step: 231280   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:48:16,447-Speed 3329.20 samples/sec   Loss 0.8156   LearningRate 0.0005   Epoch: 18   Global Step: 231290   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:48:19,546-Speed 3305.37 samples/sec   Loss 0.8393   LearningRate 0.0005   Epoch: 18   Global Step: 231300   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:48:22,613-Speed 3339.94 samples/sec   Loss 0.8281   LearningRate 0.0005   Epoch: 18   Global Step: 231310   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:48:25,746-Speed 3268.88 samples/sec   Loss 0.8175   LearningRate 0.0005   Epoch: 18   Global Step: 231320   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:48:28,869-Speed 3280.44 samples/sec   Loss 0.8193   LearningRate 0.0005   Epoch: 18   Global Step: 231330   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:48:31,941-Speed 3333.78 samples/sec   Loss 0.8481   LearningRate 0.0005   Epoch: 18   Global Step: 231340   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:48:35,021-Speed 3326.08 samples/sec   Loss 0.7958   LearningRate 0.0005   Epoch: 18   Global Step: 231350   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:48:38,125-Speed 3299.69 samples/sec   Loss 0.8553   LearningRate 0.0005   Epoch: 18   Global Step: 231360   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:48:41,239-Speed 3290.25 samples/sec   Loss 0.8068   LearningRate 0.0005   Epoch: 18   Global Step: 231370   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:48:44,402-Speed 3237.91 samples/sec   Loss 0.8460   LearningRate 0.0005   Epoch: 18   Global Step: 231380   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:48:47,478-Speed 3330.62 samples/sec   Loss 0.8221   LearningRate 0.0005   Epoch: 18   Global Step: 231390   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:48:50,542-Speed 3342.95 samples/sec   Loss 0.8223   LearningRate 0.0005   Epoch: 18   Global Step: 231400   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:48:53,626-Speed 3321.22 samples/sec   Loss 0.8427   LearningRate 0.0005   Epoch: 18   Global Step: 231410   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:48:56,724-Speed 3306.92 samples/sec   Loss 0.8043   LearningRate 0.0005   Epoch: 18   Global Step: 231420   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:48:59,788-Speed 3342.61 samples/sec   Loss 0.7914   LearningRate 0.0005   Epoch: 18   Global Step: 231430   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:49:02,963-Speed 3226.33 samples/sec   Loss 0.8788   LearningRate 0.0005   Epoch: 18   Global Step: 231440   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:49:06,118-Speed 3247.03 samples/sec   Loss 0.8514   LearningRate 0.0005   Epoch: 18   Global Step: 231450   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:49:09,199-Speed 3324.67 samples/sec   Loss 0.8575   LearningRate 0.0005   Epoch: 18   Global Step: 231460   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:49:12,293-Speed 3310.89 samples/sec   Loss 0.8337   LearningRate 0.0005   Epoch: 18   Global Step: 231470   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:49:15,345-Speed 3356.03 samples/sec   Loss 0.8329   LearningRate 0.0005   Epoch: 18   Global Step: 231480   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:49:18,413-Speed 3338.77 samples/sec   Loss 0.8167   LearningRate 0.0005   Epoch: 18   Global Step: 231490   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:49:21,468-Speed 3353.09 samples/sec   Loss 0.8505   LearningRate 0.0005   Epoch: 18   Global Step: 231500   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:49:24,555-Speed 3318.50 samples/sec   Loss 0.8289   LearningRate 0.0005   Epoch: 18   Global Step: 231510   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:49:27,628-Speed 3333.12 samples/sec   Loss 0.8634   LearningRate 0.0005   Epoch: 18   Global Step: 231520   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:49:30,728-Speed 3303.75 samples/sec   Loss 0.8109   LearningRate 0.0005   Epoch: 18   Global Step: 231530   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:49:33,794-Speed 3341.28 samples/sec   Loss 0.8154   LearningRate 0.0005   Epoch: 18   Global Step: 231540   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:49:36,873-Speed 3326.69 samples/sec   Loss 0.8854   LearningRate 0.0005   Epoch: 18   Global Step: 231550   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:49:40,015-Speed 3260.99 samples/sec   Loss 0.8368   LearningRate 0.0005   Epoch: 18   Global Step: 231560   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:49:43,187-Speed 3228.76 samples/sec   Loss 0.7985   LearningRate 0.0005   Epoch: 18   Global Step: 231570   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:49:46,236-Speed 3359.91 samples/sec   Loss 0.8166   LearningRate 0.0005   Epoch: 18   Global Step: 231580   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:49:49,317-Speed 3323.74 samples/sec   Loss 0.8684   LearningRate 0.0005   Epoch: 18   Global Step: 231590   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:49:52,448-Speed 3272.65 samples/sec   Loss 0.8237   LearningRate 0.0005   Epoch: 18   Global Step: 231600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 21:49:55,553-Speed 3298.21 samples/sec   Loss 0.8151   LearningRate 0.0005   Epoch: 18   Global Step: 231610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 21:49:58,632-Speed 3327.58 samples/sec   Loss 0.8431   LearningRate 0.0005   Epoch: 18   Global Step: 231620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 21:50:01,707-Speed 3331.37 samples/sec   Loss 0.8313   LearningRate 0.0005   Epoch: 18   Global Step: 231630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 21:50:04,815-Speed 3295.34 samples/sec   Loss 0.8510   LearningRate 0.0005   Epoch: 18   Global Step: 231640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 21:50:07,869-Speed 3354.02 samples/sec   Loss 0.8032   LearningRate 0.0005   Epoch: 18   Global Step: 231650   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:50:10,943-Speed 3332.76 samples/sec   Loss 0.8411   LearningRate 0.0005   Epoch: 18   Global Step: 231660   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:50:14,167-Speed 3177.01 samples/sec   Loss 0.8182   LearningRate 0.0005   Epoch: 18   Global Step: 231670   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:50:17,362-Speed 3205.41 samples/sec   Loss 0.8474   LearningRate 0.0005   Epoch: 18   Global Step: 231680   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:50:20,462-Speed 3304.33 samples/sec   Loss 0.8540   LearningRate 0.0005   Epoch: 18   Global Step: 231690   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:50:23,593-Speed 3271.14 samples/sec   Loss 0.8463   LearningRate 0.0005   Epoch: 18   Global Step: 231700   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:50:26,751-Speed 3244.11 samples/sec   Loss 0.8512   LearningRate 0.0005   Epoch: 18   Global Step: 231710   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:50:29,895-Speed 3258.43 samples/sec   Loss 0.8050   LearningRate 0.0005   Epoch: 18   Global Step: 231720   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:50:32,984-Speed 3315.74 samples/sec   Loss 0.8312   LearningRate 0.0005   Epoch: 18   Global Step: 231730   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:50:36,135-Speed 3250.84 samples/sec   Loss 0.8400   LearningRate 0.0005   Epoch: 18   Global Step: 231740   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:50:39,232-Speed 3306.98 samples/sec   Loss 0.8211   LearningRate 0.0005   Epoch: 18   Global Step: 231750   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:50:42,372-Speed 3263.47 samples/sec   Loss 0.8290   LearningRate 0.0004   Epoch: 18   Global Step: 231760   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:50:45,447-Speed 3330.92 samples/sec   Loss 0.8486   LearningRate 0.0004   Epoch: 18   Global Step: 231770   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:50:48,553-Speed 3297.93 samples/sec   Loss 0.8826   LearningRate 0.0004   Epoch: 18   Global Step: 231780   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:50:51,759-Speed 3194.51 samples/sec   Loss 0.8147   LearningRate 0.0004   Epoch: 18   Global Step: 231790   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:50:54,868-Speed 3295.61 samples/sec   Loss 0.8252   LearningRate 0.0004   Epoch: 18   Global Step: 231800   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:50:57,926-Speed 3349.37 samples/sec   Loss 0.8674   LearningRate 0.0004   Epoch: 18   Global Step: 231810   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:51:01,066-Speed 3263.10 samples/sec   Loss 0.8125   LearningRate 0.0004   Epoch: 18   Global Step: 231820   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:51:04,170-Speed 3299.62 samples/sec   Loss 0.8336   LearningRate 0.0004   Epoch: 18   Global Step: 231830   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:51:07,276-Speed 3298.61 samples/sec   Loss 0.8357   LearningRate 0.0004   Epoch: 18   Global Step: 231840   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:51:10,351-Speed 3330.51 samples/sec   Loss 0.8074   LearningRate 0.0004   Epoch: 18   Global Step: 231850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 21:51:13,415-Speed 3343.95 samples/sec   Loss 0.8190   LearningRate 0.0004   Epoch: 18   Global Step: 231860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 21:51:16,536-Speed 3281.31 samples/sec   Loss 0.8172   LearningRate 0.0004   Epoch: 18   Global Step: 231870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 21:51:19,678-Speed 3260.25 samples/sec   Loss 0.8369   LearningRate 0.0004   Epoch: 18   Global Step: 231880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 21:51:22,777-Speed 3306.21 samples/sec   Loss 0.8432   LearningRate 0.0004   Epoch: 18   Global Step: 231890   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:51:25,887-Speed 3292.82 samples/sec   Loss 0.8149   LearningRate 0.0004   Epoch: 18   Global Step: 231900   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:51:29,050-Speed 3239.39 samples/sec   Loss 0.7931   LearningRate 0.0004   Epoch: 18   Global Step: 231910   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:51:32,206-Speed 3246.12 samples/sec   Loss 0.8433   LearningRate 0.0004   Epoch: 18   Global Step: 231920   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:51:35,285-Speed 3326.84 samples/sec   Loss 0.8302   LearningRate 0.0004   Epoch: 18   Global Step: 231930   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:51:38,380-Speed 3309.70 samples/sec   Loss 0.8181   LearningRate 0.0004   Epoch: 18   Global Step: 231940   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:51:41,510-Speed 3272.87 samples/sec   Loss 0.8374   LearningRate 0.0004   Epoch: 18   Global Step: 231950   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:51:44,663-Speed 3248.42 samples/sec   Loss 0.8319   LearningRate 0.0004   Epoch: 18   Global Step: 231960   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:51:47,817-Speed 3248.23 samples/sec   Loss 0.8557   LearningRate 0.0004   Epoch: 18   Global Step: 231970   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:51:50,912-Speed 3308.78 samples/sec   Loss 0.8447   LearningRate 0.0004   Epoch: 18   Global Step: 231980   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:51:54,072-Speed 3241.11 samples/sec   Loss 0.8129   LearningRate 0.0004   Epoch: 18   Global Step: 231990   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:51:57,143-Speed 3336.33 samples/sec   Loss 0.8084   LearningRate 0.0004   Epoch: 18   Global Step: 232000   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:52:00,285-Speed 3260.20 samples/sec   Loss 0.7952   LearningRate 0.0004   Epoch: 18   Global Step: 232010   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:52:03,430-Speed 3257.01 samples/sec   Loss 0.8472   LearningRate 0.0004   Epoch: 18   Global Step: 232020   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:52:06,500-Speed 3336.14 samples/sec   Loss 0.8310   LearningRate 0.0004   Epoch: 18   Global Step: 232030   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:52:09,568-Speed 3338.72 samples/sec   Loss 0.8411   LearningRate 0.0004   Epoch: 18   Global Step: 232040   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:52:12,699-Speed 3271.74 samples/sec   Loss 0.8283   LearningRate 0.0004   Epoch: 18   Global Step: 232050   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:52:15,887-Speed 3212.50 samples/sec   Loss 0.8402   LearningRate 0.0004   Epoch: 18   Global Step: 232060   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:52:19,022-Speed 3267.93 samples/sec   Loss 0.8674   LearningRate 0.0004   Epoch: 18   Global Step: 232070   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:52:22,069-Speed 3361.80 samples/sec   Loss 0.8532   LearningRate 0.0004   Epoch: 18   Global Step: 232080   Fp16 Grad Scale: 4096   Required: 1 hours
Training: 2022-04-27 21:52:25,183-Speed 3289.72 samples/sec   Loss 0.8346   LearningRate 0.0004   Epoch: 18   Global Step: 232090   Fp16 Grad Scale: 4096   Required: 1 hours
Training: 2022-04-27 21:52:28,273-Speed 3314.08 samples/sec   Loss 0.8389   LearningRate 0.0004   Epoch: 18   Global Step: 232100   Fp16 Grad Scale: 4096   Required: 1 hours
Training: 2022-04-27 21:52:31,364-Speed 3313.83 samples/sec   Loss 0.8153   LearningRate 0.0004   Epoch: 18   Global Step: 232110   Fp16 Grad Scale: 4096   Required: 1 hours
Training: 2022-04-27 21:52:34,467-Speed 3301.87 samples/sec   Loss 0.8367   LearningRate 0.0004   Epoch: 18   Global Step: 232120   Fp16 Grad Scale: 4096   Required: 1 hours
Training: 2022-04-27 21:52:37,598-Speed 3271.28 samples/sec   Loss 0.8481   LearningRate 0.0004   Epoch: 18   Global Step: 232130   Fp16 Grad Scale: 4096   Required: 1 hours
Training: 2022-04-27 21:52:40,687-Speed 3315.38 samples/sec   Loss 0.8378   LearningRate 0.0004   Epoch: 18   Global Step: 232140   Fp16 Grad Scale: 4096   Required: 1 hours
Training: 2022-04-27 21:52:43,777-Speed 3315.75 samples/sec   Loss 0.8487   LearningRate 0.0004   Epoch: 18   Global Step: 232150   Fp16 Grad Scale: 4096   Required: 1 hours
Training: 2022-04-27 21:52:46,881-Speed 3300.26 samples/sec   Loss 0.8283   LearningRate 0.0004   Epoch: 18   Global Step: 232160   Fp16 Grad Scale: 4096   Required: 1 hours
Training: 2022-04-27 21:52:49,969-Speed 3316.90 samples/sec   Loss 0.7810   LearningRate 0.0004   Epoch: 18   Global Step: 232170   Fp16 Grad Scale: 4096   Required: 1 hours
Training: 2022-04-27 21:52:53,106-Speed 3264.72 samples/sec   Loss 0.8215   LearningRate 0.0004   Epoch: 18   Global Step: 232180   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:52:56,177-Speed 3335.42 samples/sec   Loss 0.8297   LearningRate 0.0004   Epoch: 18   Global Step: 232190   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:52:59,280-Speed 3301.61 samples/sec   Loss 0.8725   LearningRate 0.0004   Epoch: 18   Global Step: 232200   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:53:02,421-Speed 3260.40 samples/sec   Loss 0.8016   LearningRate 0.0004   Epoch: 18   Global Step: 232210   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:53:05,542-Speed 3282.17 samples/sec   Loss 0.8186   LearningRate 0.0004   Epoch: 18   Global Step: 232220   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:53:08,681-Speed 3263.56 samples/sec   Loss 0.8054   LearningRate 0.0004   Epoch: 18   Global Step: 232230   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:53:11,787-Speed 3297.52 samples/sec   Loss 0.8166   LearningRate 0.0004   Epoch: 18   Global Step: 232240   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:53:14,871-Speed 3321.51 samples/sec   Loss 0.8011   LearningRate 0.0004   Epoch: 18   Global Step: 232250   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:53:17,976-Speed 3300.06 samples/sec   Loss 0.8560   LearningRate 0.0004   Epoch: 18   Global Step: 232260   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:53:21,047-Speed 3334.66 samples/sec   Loss 0.7966   LearningRate 0.0004   Epoch: 18   Global Step: 232270   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:53:24,120-Speed 3333.64 samples/sec   Loss 0.8047   LearningRate 0.0004   Epoch: 18   Global Step: 232280   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:53:27,274-Speed 3247.75 samples/sec   Loss 0.8601   LearningRate 0.0004   Epoch: 18   Global Step: 232290   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:53:30,386-Speed 3291.24 samples/sec   Loss 0.8818   LearningRate 0.0004   Epoch: 18   Global Step: 232300   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:53:33,468-Speed 3323.82 samples/sec   Loss 0.8032   LearningRate 0.0004   Epoch: 18   Global Step: 232310   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:53:36,625-Speed 3243.78 samples/sec   Loss 0.8009   LearningRate 0.0004   Epoch: 18   Global Step: 232320   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:53:39,739-Speed 3289.42 samples/sec   Loss 0.8286   LearningRate 0.0004   Epoch: 18   Global Step: 232330   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:53:42,931-Speed 3208.97 samples/sec   Loss 0.8410   LearningRate 0.0004   Epoch: 18   Global Step: 232340   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:53:46,027-Speed 3308.88 samples/sec   Loss 0.8075   LearningRate 0.0004   Epoch: 18   Global Step: 232350   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:53:49,154-Speed 3275.97 samples/sec   Loss 0.8516   LearningRate 0.0004   Epoch: 18   Global Step: 232360   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:53:52,344-Speed 3211.36 samples/sec   Loss 0.8040   LearningRate 0.0004   Epoch: 18   Global Step: 232370   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:53:55,454-Speed 3292.74 samples/sec   Loss 0.8304   LearningRate 0.0004   Epoch: 18   Global Step: 232380   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:53:58,560-Speed 3298.10 samples/sec   Loss 0.7984   LearningRate 0.0004   Epoch: 18   Global Step: 232390   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:54:01,647-Speed 3318.26 samples/sec   Loss 0.8369   LearningRate 0.0004   Epoch: 18   Global Step: 232400   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:54:04,767-Speed 3283.19 samples/sec   Loss 0.8326   LearningRate 0.0004   Epoch: 18   Global Step: 232410   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:54:07,846-Speed 3326.74 samples/sec   Loss 0.8236   LearningRate 0.0004   Epoch: 18   Global Step: 232420   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:54:10,942-Speed 3309.02 samples/sec   Loss 0.8271   LearningRate 0.0004   Epoch: 18   Global Step: 232430   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:54:14,035-Speed 3312.05 samples/sec   Loss 0.8274   LearningRate 0.0004   Epoch: 18   Global Step: 232440   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:54:17,116-Speed 3324.40 samples/sec   Loss 0.8295   LearningRate 0.0004   Epoch: 18   Global Step: 232450   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:54:20,250-Speed 3268.77 samples/sec   Loss 0.8462   LearningRate 0.0004   Epoch: 18   Global Step: 232460   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:54:23,368-Speed 3284.98 samples/sec   Loss 0.8193   LearningRate 0.0004   Epoch: 18   Global Step: 232470   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:54:26,568-Speed 3201.10 samples/sec   Loss 0.8057   LearningRate 0.0004   Epoch: 18   Global Step: 232480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 21:54:29,688-Speed 3283.45 samples/sec   Loss 0.8013   LearningRate 0.0004   Epoch: 18   Global Step: 232490   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:54:32,784-Speed 3307.87 samples/sec   Loss 0.8636   LearningRate 0.0004   Epoch: 18   Global Step: 232500   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:54:35,917-Speed 3269.78 samples/sec   Loss 0.8055   LearningRate 0.0004   Epoch: 18   Global Step: 232510   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:54:39,001-Speed 3321.51 samples/sec   Loss 0.7996   LearningRate 0.0004   Epoch: 18   Global Step: 232520   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:54:42,069-Speed 3338.49 samples/sec   Loss 0.7915   LearningRate 0.0004   Epoch: 18   Global Step: 232530   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:54:45,139-Speed 3339.67 samples/sec   Loss 0.8367   LearningRate 0.0004   Epoch: 18   Global Step: 232540   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:54:48,239-Speed 3304.28 samples/sec   Loss 0.8331   LearningRate 0.0004   Epoch: 18   Global Step: 232550   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:54:51,310-Speed 3335.13 samples/sec   Loss 0.8426   LearningRate 0.0004   Epoch: 18   Global Step: 232560   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:54:54,471-Speed 3240.48 samples/sec   Loss 0.8353   LearningRate 0.0004   Epoch: 18   Global Step: 232570   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:54:57,558-Speed 3317.95 samples/sec   Loss 0.7999   LearningRate 0.0004   Epoch: 18   Global Step: 232580   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:55:00,654-Speed 3308.73 samples/sec   Loss 0.8334   LearningRate 0.0004   Epoch: 18   Global Step: 232590   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:55:03,761-Speed 3297.03 samples/sec   Loss 0.8314   LearningRate 0.0004   Epoch: 18   Global Step: 232600   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:55:06,852-Speed 3313.61 samples/sec   Loss 0.8360   LearningRate 0.0004   Epoch: 18   Global Step: 232610   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:55:09,977-Speed 3277.74 samples/sec   Loss 0.8412   LearningRate 0.0004   Epoch: 18   Global Step: 232620   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:55:13,098-Speed 3281.65 samples/sec   Loss 0.8094   LearningRate 0.0004   Epoch: 18   Global Step: 232630   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:55:16,230-Speed 3270.69 samples/sec   Loss 0.8442   LearningRate 0.0004   Epoch: 18   Global Step: 232640   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:55:19,359-Speed 3273.86 samples/sec   Loss 0.8572   LearningRate 0.0004   Epoch: 18   Global Step: 232650   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:55:22,464-Speed 3298.83 samples/sec   Loss 0.8332   LearningRate 0.0004   Epoch: 18   Global Step: 232660   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:55:25,567-Speed 3301.28 samples/sec   Loss 0.8195   LearningRate 0.0004   Epoch: 18   Global Step: 232670   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:55:28,695-Speed 3274.85 samples/sec   Loss 0.7765   LearningRate 0.0004   Epoch: 18   Global Step: 232680   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:55:31,796-Speed 3302.44 samples/sec   Loss 0.8278   LearningRate 0.0004   Epoch: 18   Global Step: 232690   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:55:34,903-Speed 3297.73 samples/sec   Loss 0.8134   LearningRate 0.0004   Epoch: 18   Global Step: 232700   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:55:38,014-Speed 3292.29 samples/sec   Loss 0.8154   LearningRate 0.0004   Epoch: 18   Global Step: 232710   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:55:41,089-Speed 3331.21 samples/sec   Loss 0.8275   LearningRate 0.0004   Epoch: 18   Global Step: 232720   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:55:44,168-Speed 3327.44 samples/sec   Loss 0.8129   LearningRate 0.0004   Epoch: 18   Global Step: 232730   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:55:47,308-Speed 3262.53 samples/sec   Loss 0.8059   LearningRate 0.0004   Epoch: 18   Global Step: 232740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 21:55:50,375-Speed 3339.70 samples/sec   Loss 0.8204   LearningRate 0.0004   Epoch: 18   Global Step: 232750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 21:55:53,479-Speed 3299.96 samples/sec   Loss 0.8172   LearningRate 0.0004   Epoch: 18   Global Step: 232760   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:55:56,560-Speed 3324.75 samples/sec   Loss 0.8042   LearningRate 0.0004   Epoch: 18   Global Step: 232770   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:55:59,666-Speed 3296.82 samples/sec   Loss 0.8625   LearningRate 0.0004   Epoch: 18   Global Step: 232780   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:56:02,863-Speed 3204.67 samples/sec   Loss 0.8550   LearningRate 0.0004   Epoch: 18   Global Step: 232790   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:56:05,974-Speed 3292.89 samples/sec   Loss 0.8507   LearningRate 0.0004   Epoch: 18   Global Step: 232800   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:56:09,112-Speed 3263.69 samples/sec   Loss 0.8404   LearningRate 0.0004   Epoch: 18   Global Step: 232810   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:56:12,192-Speed 3326.09 samples/sec   Loss 0.8442   LearningRate 0.0004   Epoch: 18   Global Step: 232820   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:56:15,290-Speed 3306.31 samples/sec   Loss 0.8317   LearningRate 0.0004   Epoch: 18   Global Step: 232830   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:56:18,494-Speed 3197.46 samples/sec   Loss 0.8559   LearningRate 0.0004   Epoch: 18   Global Step: 232840   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:56:21,573-Speed 3326.56 samples/sec   Loss 0.8248   LearningRate 0.0004   Epoch: 18   Global Step: 232850   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:56:24,742-Speed 3232.32 samples/sec   Loss 0.8142   LearningRate 0.0004   Epoch: 18   Global Step: 232860   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:56:27,916-Speed 3226.95 samples/sec   Loss 0.8112   LearningRate 0.0004   Epoch: 18   Global Step: 232870   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:56:31,045-Speed 3273.52 samples/sec   Loss 0.8282   LearningRate 0.0004   Epoch: 18   Global Step: 232880   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:56:34,139-Speed 3310.80 samples/sec   Loss 0.8736   LearningRate 0.0004   Epoch: 18   Global Step: 232890   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:56:37,245-Speed 3298.14 samples/sec   Loss 0.8615   LearningRate 0.0004   Epoch: 18   Global Step: 232900   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:56:40,381-Speed 3265.92 samples/sec   Loss 0.8227   LearningRate 0.0004   Epoch: 18   Global Step: 232910   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:56:43,532-Speed 3251.37 samples/sec   Loss 0.7905   LearningRate 0.0004   Epoch: 18   Global Step: 232920   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:56:46,621-Speed 3315.27 samples/sec   Loss 0.8228   LearningRate 0.0004   Epoch: 18   Global Step: 232930   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:56:49,766-Speed 3257.77 samples/sec   Loss 0.8511   LearningRate 0.0004   Epoch: 18   Global Step: 232940   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:56:52,950-Speed 3216.15 samples/sec   Loss 0.8167   LearningRate 0.0004   Epoch: 18   Global Step: 232950   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:56:56,103-Speed 3248.84 samples/sec   Loss 0.8179   LearningRate 0.0004   Epoch: 18   Global Step: 232960   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:56:59,257-Speed 3248.34 samples/sec   Loss 0.8274   LearningRate 0.0004   Epoch: 18   Global Step: 232970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 21:57:02,378-Speed 3282.08 samples/sec   Loss 0.8279   LearningRate 0.0004   Epoch: 18   Global Step: 232980   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:57:05,501-Speed 3279.11 samples/sec   Loss 0.7966   LearningRate 0.0004   Epoch: 18   Global Step: 232990   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:57:08,586-Speed 3320.55 samples/sec   Loss 0.8422   LearningRate 0.0004   Epoch: 18   Global Step: 233000   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:57:11,720-Speed 3268.42 samples/sec   Loss 0.8581   LearningRate 0.0004   Epoch: 18   Global Step: 233010   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:57:14,847-Speed 3275.73 samples/sec   Loss 0.8139   LearningRate 0.0004   Epoch: 18   Global Step: 233020   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:57:18,035-Speed 3213.10 samples/sec   Loss 0.8380   LearningRate 0.0004   Epoch: 18   Global Step: 233030   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:57:21,137-Speed 3302.14 samples/sec   Loss 0.8216   LearningRate 0.0004   Epoch: 18   Global Step: 233040   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:57:24,200-Speed 3344.48 samples/sec   Loss 0.8047   LearningRate 0.0004   Epoch: 18   Global Step: 233050   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:57:27,301-Speed 3303.72 samples/sec   Loss 0.8544   LearningRate 0.0004   Epoch: 18   Global Step: 233060   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:57:30,418-Speed 3286.19 samples/sec   Loss 0.8344   LearningRate 0.0004   Epoch: 18   Global Step: 233070   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:57:33,493-Speed 3331.05 samples/sec   Loss 0.8621   LearningRate 0.0004   Epoch: 18   Global Step: 233080   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:57:36,629-Speed 3266.37 samples/sec   Loss 0.8435   LearningRate 0.0004   Epoch: 18   Global Step: 233090   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:57:39,745-Speed 3287.22 samples/sec   Loss 0.8128   LearningRate 0.0004   Epoch: 18   Global Step: 233100   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:57:42,902-Speed 3244.36 samples/sec   Loss 0.8386   LearningRate 0.0004   Epoch: 18   Global Step: 233110   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:57:46,003-Speed 3303.27 samples/sec   Loss 0.8080   LearningRate 0.0004   Epoch: 18   Global Step: 233120   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:57:49,168-Speed 3235.70 samples/sec   Loss 0.8267   LearningRate 0.0004   Epoch: 18   Global Step: 233130   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:57:52,295-Speed 3276.90 samples/sec   Loss 0.8468   LearningRate 0.0004   Epoch: 18   Global Step: 233140   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:57:55,416-Speed 3282.34 samples/sec   Loss 0.8510   LearningRate 0.0004   Epoch: 18   Global Step: 233150   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:57:58,536-Speed 3282.52 samples/sec   Loss 0.8196   LearningRate 0.0004   Epoch: 18   Global Step: 233160   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:58:01,633-Speed 3307.78 samples/sec   Loss 0.8099   LearningRate 0.0004   Epoch: 18   Global Step: 233170   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:58:04,705-Speed 3334.23 samples/sec   Loss 0.8010   LearningRate 0.0004   Epoch: 18   Global Step: 233180   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:58:07,802-Speed 3307.86 samples/sec   Loss 0.8205   LearningRate 0.0004   Epoch: 18   Global Step: 233190   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:58:10,897-Speed 3309.70 samples/sec   Loss 0.8314   LearningRate 0.0004   Epoch: 18   Global Step: 233200   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:58:14,037-Speed 3261.93 samples/sec   Loss 0.8207   LearningRate 0.0004   Epoch: 18   Global Step: 233210   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:58:17,229-Speed 3209.29 samples/sec   Loss 0.8443   LearningRate 0.0004   Epoch: 18   Global Step: 233220   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:58:20,335-Speed 3297.52 samples/sec   Loss 0.7940   LearningRate 0.0004   Epoch: 18   Global Step: 233230   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:58:23,423-Speed 3316.27 samples/sec   Loss 0.8395   LearningRate 0.0004   Epoch: 18   Global Step: 233240   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:58:26,534-Speed 3293.27 samples/sec   Loss 0.8063   LearningRate 0.0004   Epoch: 18   Global Step: 233250   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:58:29,718-Speed 3216.79 samples/sec   Loss 0.8167   LearningRate 0.0004   Epoch: 18   Global Step: 233260   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:58:32,877-Speed 3243.11 samples/sec   Loss 0.8077   LearningRate 0.0004   Epoch: 18   Global Step: 233270   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:58:36,094-Speed 3183.38 samples/sec   Loss 0.8115   LearningRate 0.0004   Epoch: 18   Global Step: 233280   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:58:39,236-Speed 3260.52 samples/sec   Loss 0.8418   LearningRate 0.0004   Epoch: 18   Global Step: 233290   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:58:42,429-Speed 3207.97 samples/sec   Loss 0.8207   LearningRate 0.0004   Epoch: 18   Global Step: 233300   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:58:45,562-Speed 3269.33 samples/sec   Loss 0.8303   LearningRate 0.0004   Epoch: 18   Global Step: 233310   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:58:48,751-Speed 3213.08 samples/sec   Loss 0.8490   LearningRate 0.0004   Epoch: 18   Global Step: 233320   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:58:51,925-Speed 3227.01 samples/sec   Loss 0.8472   LearningRate 0.0004   Epoch: 18   Global Step: 233330   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:58:55,041-Speed 3287.01 samples/sec   Loss 0.8573   LearningRate 0.0004   Epoch: 18   Global Step: 233340   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:58:58,148-Speed 3297.10 samples/sec   Loss 0.8498   LearningRate 0.0004   Epoch: 18   Global Step: 233350   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:59:01,259-Speed 3292.86 samples/sec   Loss 0.8229   LearningRate 0.0004   Epoch: 18   Global Step: 233360   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:59:04,409-Speed 3250.80 samples/sec   Loss 0.8317   LearningRate 0.0004   Epoch: 18   Global Step: 233370   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:59:07,509-Speed 3305.07 samples/sec   Loss 0.8041   LearningRate 0.0004   Epoch: 18   Global Step: 233380   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:59:10,650-Speed 3261.06 samples/sec   Loss 0.7818   LearningRate 0.0004   Epoch: 18   Global Step: 233390   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:59:13,809-Speed 3242.48 samples/sec   Loss 0.8551   LearningRate 0.0004   Epoch: 18   Global Step: 233400   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 21:59:16,904-Speed 3309.01 samples/sec   Loss 0.8562   LearningRate 0.0004   Epoch: 18   Global Step: 233410   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:59:20,020-Speed 3287.12 samples/sec   Loss 0.8128   LearningRate 0.0004   Epoch: 18   Global Step: 233420   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:59:23,142-Speed 3281.63 samples/sec   Loss 0.8397   LearningRate 0.0004   Epoch: 18   Global Step: 233430   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:59:26,244-Speed 3302.06 samples/sec   Loss 0.8760   LearningRate 0.0004   Epoch: 18   Global Step: 233440   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:59:29,486-Speed 3158.71 samples/sec   Loss 0.8133   LearningRate 0.0004   Epoch: 18   Global Step: 233450   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:59:32,575-Speed 3316.19 samples/sec   Loss 0.8370   LearningRate 0.0004   Epoch: 18   Global Step: 233460   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:59:35,664-Speed 3315.82 samples/sec   Loss 0.7882   LearningRate 0.0004   Epoch: 18   Global Step: 233470   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:59:38,788-Speed 3278.84 samples/sec   Loss 0.8032   LearningRate 0.0004   Epoch: 18   Global Step: 233480   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:59:41,936-Speed 3254.44 samples/sec   Loss 0.8359   LearningRate 0.0004   Epoch: 18   Global Step: 233490   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:59:45,043-Speed 3296.25 samples/sec   Loss 0.8299   LearningRate 0.0004   Epoch: 18   Global Step: 233500   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 21:59:48,178-Speed 3268.02 samples/sec   Loss 0.7900   LearningRate 0.0004   Epoch: 18   Global Step: 233510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 21:59:51,286-Speed 3295.06 samples/sec   Loss 0.8392   LearningRate 0.0004   Epoch: 18   Global Step: 233520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 21:59:54,377-Speed 3314.25 samples/sec   Loss 0.8156   LearningRate 0.0004   Epoch: 18   Global Step: 233530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 21:59:57,450-Speed 3333.57 samples/sec   Loss 0.7785   LearningRate 0.0004   Epoch: 18   Global Step: 233540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:00:00,535-Speed 3319.93 samples/sec   Loss 0.8484   LearningRate 0.0004   Epoch: 18   Global Step: 233550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:00:03,617-Speed 3324.03 samples/sec   Loss 0.8147   LearningRate 0.0004   Epoch: 18   Global Step: 233560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:00:06,759-Speed 3260.06 samples/sec   Loss 0.7902   LearningRate 0.0004   Epoch: 18   Global Step: 233570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:00:09,823-Speed 3342.53 samples/sec   Loss 0.8671   LearningRate 0.0004   Epoch: 18   Global Step: 233580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:00:12,949-Speed 3277.12 samples/sec   Loss 0.8301   LearningRate 0.0004   Epoch: 18   Global Step: 233590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:00:16,033-Speed 3321.68 samples/sec   Loss 0.8275   LearningRate 0.0004   Epoch: 18   Global Step: 233600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:00:19,127-Speed 3310.53 samples/sec   Loss 0.8099   LearningRate 0.0004   Epoch: 18   Global Step: 233610   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:00:22,221-Speed 3310.68 samples/sec   Loss 0.8217   LearningRate 0.0004   Epoch: 18   Global Step: 233620   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:00:25,356-Speed 3267.84 samples/sec   Loss 0.8328   LearningRate 0.0004   Epoch: 18   Global Step: 233630   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:00:28,478-Speed 3280.82 samples/sec   Loss 0.7669   LearningRate 0.0004   Epoch: 18   Global Step: 233640   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:00:31,612-Speed 3268.57 samples/sec   Loss 0.8139   LearningRate 0.0004   Epoch: 18   Global Step: 233650   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:00:34,695-Speed 3322.18 samples/sec   Loss 0.8346   LearningRate 0.0004   Epoch: 18   Global Step: 233660   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:00:37,857-Speed 3239.58 samples/sec   Loss 0.8420   LearningRate 0.0004   Epoch: 18   Global Step: 233670   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:00:41,087-Speed 3170.84 samples/sec   Loss 0.8435   LearningRate 0.0004   Epoch: 18   Global Step: 233680   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:00:44,200-Speed 3291.19 samples/sec   Loss 0.8165   LearningRate 0.0004   Epoch: 18   Global Step: 233690   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:00:47,321-Speed 3281.43 samples/sec   Loss 0.8534   LearningRate 0.0004   Epoch: 18   Global Step: 233700   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:00:50,472-Speed 3250.64 samples/sec   Loss 0.8451   LearningRate 0.0004   Epoch: 18   Global Step: 233710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:00:53,629-Speed 3245.45 samples/sec   Loss 0.8323   LearningRate 0.0004   Epoch: 18   Global Step: 233720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:00:56,716-Speed 3317.93 samples/sec   Loss 0.8600   LearningRate 0.0003   Epoch: 18   Global Step: 233730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:00:59,806-Speed 3315.42 samples/sec   Loss 0.8407   LearningRate 0.0003   Epoch: 18   Global Step: 233740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:01:02,892-Speed 3318.84 samples/sec   Loss 0.8218   LearningRate 0.0003   Epoch: 18   Global Step: 233750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:01:05,992-Speed 3304.52 samples/sec   Loss 0.8053   LearningRate 0.0003   Epoch: 18   Global Step: 233760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:01:09,098-Speed 3297.72 samples/sec   Loss 0.8262   LearningRate 0.0003   Epoch: 18   Global Step: 233770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:01:12,254-Speed 3245.80 samples/sec   Loss 0.8042   LearningRate 0.0003   Epoch: 18   Global Step: 233780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:01:15,379-Speed 3277.71 samples/sec   Loss 0.8605   LearningRate 0.0003   Epoch: 18   Global Step: 233790   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:01:18,539-Speed 3241.11 samples/sec   Loss 0.8488   LearningRate 0.0003   Epoch: 18   Global Step: 233800   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:01:21,617-Speed 3329.29 samples/sec   Loss 0.8265   LearningRate 0.0003   Epoch: 18   Global Step: 233810   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:01:24,740-Speed 3279.86 samples/sec   Loss 0.8490   LearningRate 0.0003   Epoch: 18   Global Step: 233820   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:01:27,935-Speed 3206.02 samples/sec   Loss 0.8429   LearningRate 0.0003   Epoch: 18   Global Step: 233830   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:01:31,106-Speed 3230.78 samples/sec   Loss 0.7942   LearningRate 0.0003   Epoch: 18   Global Step: 233840   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:01:34,160-Speed 3354.25 samples/sec   Loss 0.8566   LearningRate 0.0003   Epoch: 18   Global Step: 233850   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:01:37,262-Speed 3302.37 samples/sec   Loss 0.8536   LearningRate 0.0003   Epoch: 18   Global Step: 233860   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:01:40,356-Speed 3309.85 samples/sec   Loss 0.7697   LearningRate 0.0003   Epoch: 18   Global Step: 233870   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:01:43,437-Speed 3325.66 samples/sec   Loss 0.8594   LearningRate 0.0003   Epoch: 18   Global Step: 233880   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:01:46,535-Speed 3305.79 samples/sec   Loss 0.8155   LearningRate 0.0003   Epoch: 18   Global Step: 233890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:01:49,641-Speed 3297.75 samples/sec   Loss 0.8430   LearningRate 0.0003   Epoch: 18   Global Step: 233900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:01:52,718-Speed 3329.64 samples/sec   Loss 0.8376   LearningRate 0.0003   Epoch: 18   Global Step: 233910   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:01:55,831-Speed 3289.89 samples/sec   Loss 0.8399   LearningRate 0.0003   Epoch: 18   Global Step: 233920   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:01:58,925-Speed 3310.79 samples/sec   Loss 0.8178   LearningRate 0.0003   Epoch: 18   Global Step: 233930   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:02:02,036-Speed 3293.44 samples/sec   Loss 0.8295   LearningRate 0.0003   Epoch: 18   Global Step: 233940   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:02:05,146-Speed 3293.25 samples/sec   Loss 0.8037   LearningRate 0.0003   Epoch: 18   Global Step: 233950   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:02:08,227-Speed 3324.39 samples/sec   Loss 0.8036   LearningRate 0.0003   Epoch: 18   Global Step: 233960   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:02:11,304-Speed 3329.97 samples/sec   Loss 0.8005   LearningRate 0.0003   Epoch: 18   Global Step: 233970   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:02:14,435-Speed 3271.09 samples/sec   Loss 0.8086   LearningRate 0.0003   Epoch: 18   Global Step: 233980   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:02:17,532-Speed 3307.32 samples/sec   Loss 0.8218   LearningRate 0.0003   Epoch: 18   Global Step: 233990   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:02:20,642-Speed 3293.31 samples/sec   Loss 0.8141   LearningRate 0.0003   Epoch: 18   Global Step: 234000   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:02:23,764-Speed 3282.20 samples/sec   Loss 0.8249   LearningRate 0.0003   Epoch: 18   Global Step: 234010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:02:26,803-Speed 3370.54 samples/sec   Loss 0.8435   LearningRate 0.0003   Epoch: 18   Global Step: 234020   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:02:29,857-Speed 3353.17 samples/sec   Loss 0.8144   LearningRate 0.0003   Epoch: 18   Global Step: 234030   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:02:32,925-Speed 3339.32 samples/sec   Loss 0.8051   LearningRate 0.0003   Epoch: 18   Global Step: 234040   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:02:36,064-Speed 3263.58 samples/sec   Loss 0.8237   LearningRate 0.0003   Epoch: 18   Global Step: 234050   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:02:39,119-Speed 3352.76 samples/sec   Loss 0.8326   LearningRate 0.0003   Epoch: 18   Global Step: 234060   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:02:42,191-Speed 3334.44 samples/sec   Loss 0.8393   LearningRate 0.0003   Epoch: 18   Global Step: 234070   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:02:45,239-Speed 3361.06 samples/sec   Loss 0.7827   LearningRate 0.0003   Epoch: 18   Global Step: 234080   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:02:48,315-Speed 3329.55 samples/sec   Loss 0.8446   LearningRate 0.0003   Epoch: 18   Global Step: 234090   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:02:51,403-Speed 3317.22 samples/sec   Loss 0.7984   LearningRate 0.0003   Epoch: 18   Global Step: 234100   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:02:54,505-Speed 3301.90 samples/sec   Loss 0.8186   LearningRate 0.0003   Epoch: 18   Global Step: 234110   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:02:57,617-Speed 3291.80 samples/sec   Loss 0.8257   LearningRate 0.0003   Epoch: 18   Global Step: 234120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:03:00,709-Speed 3312.14 samples/sec   Loss 0.8017   LearningRate 0.0003   Epoch: 18   Global Step: 234130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:03:03,836-Speed 3276.39 samples/sec   Loss 0.8275   LearningRate 0.0003   Epoch: 18   Global Step: 234140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:03:06,944-Speed 3296.23 samples/sec   Loss 0.8710   LearningRate 0.0003   Epoch: 18   Global Step: 234150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:03:10,018-Speed 3331.53 samples/sec   Loss 0.8267   LearningRate 0.0003   Epoch: 18   Global Step: 234160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:03:13,142-Speed 3279.83 samples/sec   Loss 0.8490   LearningRate 0.0003   Epoch: 18   Global Step: 234170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:03:16,175-Speed 3377.08 samples/sec   Loss 0.8144   LearningRate 0.0003   Epoch: 18   Global Step: 234180   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:03:19,240-Speed 3342.16 samples/sec   Loss 0.8355   LearningRate 0.0003   Epoch: 18   Global Step: 234190   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:03:22,376-Speed 3265.83 samples/sec   Loss 0.8438   LearningRate 0.0003   Epoch: 18   Global Step: 234200   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:03:25,472-Speed 3308.90 samples/sec   Loss 0.8684   LearningRate 0.0003   Epoch: 18   Global Step: 234210   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:03:28,573-Speed 3302.69 samples/sec   Loss 0.8403   LearningRate 0.0003   Epoch: 18   Global Step: 234220   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:03:31,690-Speed 3286.29 samples/sec   Loss 0.7943   LearningRate 0.0003   Epoch: 18   Global Step: 234230   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:03:34,791-Speed 3303.23 samples/sec   Loss 0.7847   LearningRate 0.0003   Epoch: 18   Global Step: 234240   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:03:37,892-Speed 3303.38 samples/sec   Loss 0.8167   LearningRate 0.0003   Epoch: 18   Global Step: 234250   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:03:41,005-Speed 3290.25 samples/sec   Loss 0.8324   LearningRate 0.0003   Epoch: 18   Global Step: 234260   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:03:44,111-Speed 3298.42 samples/sec   Loss 0.8388   LearningRate 0.0003   Epoch: 18   Global Step: 234270   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:03:47,188-Speed 3328.65 samples/sec   Loss 0.8231   LearningRate 0.0003   Epoch: 18   Global Step: 234280   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:03:50,277-Speed 3315.95 samples/sec   Loss 0.8283   LearningRate 0.0003   Epoch: 18   Global Step: 234290   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:03:53,389-Speed 3292.15 samples/sec   Loss 0.8175   LearningRate 0.0003   Epoch: 18   Global Step: 234300   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:03:56,454-Speed 3342.08 samples/sec   Loss 0.8128   LearningRate 0.0003   Epoch: 18   Global Step: 234310   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:03:59,603-Speed 3252.72 samples/sec   Loss 0.8354   LearningRate 0.0003   Epoch: 18   Global Step: 234320   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:04:02,693-Speed 3315.34 samples/sec   Loss 0.8115   LearningRate 0.0003   Epoch: 18   Global Step: 234330   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:04:05,764-Speed 3334.70 samples/sec   Loss 0.8159   LearningRate 0.0003   Epoch: 18   Global Step: 234340   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:04:08,834-Speed 3337.36 samples/sec   Loss 0.8086   LearningRate 0.0003   Epoch: 18   Global Step: 234350   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:04:11,967-Speed 3269.47 samples/sec   Loss 0.8007   LearningRate 0.0003   Epoch: 18   Global Step: 234360   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:04:15,048-Speed 3323.91 samples/sec   Loss 0.8204   LearningRate 0.0003   Epoch: 18   Global Step: 234370   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:04:18,153-Speed 3299.43 samples/sec   Loss 0.8181   LearningRate 0.0003   Epoch: 18   Global Step: 234380   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:04:21,218-Speed 3341.89 samples/sec   Loss 0.8333   LearningRate 0.0003   Epoch: 18   Global Step: 234390   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:04:24,301-Speed 3322.95 samples/sec   Loss 0.7883   LearningRate 0.0003   Epoch: 18   Global Step: 234400   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:04:27,414-Speed 3290.55 samples/sec   Loss 0.8670   LearningRate 0.0003   Epoch: 18   Global Step: 234410   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:04:30,583-Speed 3231.93 samples/sec   Loss 0.8182   LearningRate 0.0003   Epoch: 18   Global Step: 234420   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:04:33,650-Speed 3340.10 samples/sec   Loss 0.8320   LearningRate 0.0003   Epoch: 18   Global Step: 234430   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:04:36,787-Speed 3266.10 samples/sec   Loss 0.8841   LearningRate 0.0003   Epoch: 18   Global Step: 234440   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:04:39,924-Speed 3264.73 samples/sec   Loss 0.8438   LearningRate 0.0003   Epoch: 18   Global Step: 234450   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:04:43,024-Speed 3303.94 samples/sec   Loss 0.8421   LearningRate 0.0003   Epoch: 18   Global Step: 234460   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:04:46,129-Speed 3299.13 samples/sec   Loss 0.8079   LearningRate 0.0003   Epoch: 18   Global Step: 234470   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:04:49,261-Speed 3270.53 samples/sec   Loss 0.8434   LearningRate 0.0003   Epoch: 18   Global Step: 234480   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:04:52,404-Speed 3258.74 samples/sec   Loss 0.7865   LearningRate 0.0003   Epoch: 18   Global Step: 234490   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:04:55,598-Speed 3207.32 samples/sec   Loss 0.8072   LearningRate 0.0003   Epoch: 18   Global Step: 234500   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:04:58,719-Speed 3282.26 samples/sec   Loss 0.8088   LearningRate 0.0003   Epoch: 18   Global Step: 234510   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:05:01,789-Speed 3336.56 samples/sec   Loss 0.8121   LearningRate 0.0003   Epoch: 18   Global Step: 234520   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:05:04,915-Speed 3276.86 samples/sec   Loss 0.8128   LearningRate 0.0003   Epoch: 18   Global Step: 234530   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:05:07,961-Speed 3363.74 samples/sec   Loss 0.8420   LearningRate 0.0003   Epoch: 18   Global Step: 234540   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:05:11,084-Speed 3279.59 samples/sec   Loss 0.7938   LearningRate 0.0003   Epoch: 18   Global Step: 234550   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:05:14,174-Speed 3315.72 samples/sec   Loss 0.8339   LearningRate 0.0003   Epoch: 18   Global Step: 234560   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:05:17,330-Speed 3244.65 samples/sec   Loss 0.8163   LearningRate 0.0003   Epoch: 18   Global Step: 234570   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:05:20,483-Speed 3249.28 samples/sec   Loss 0.8534   LearningRate 0.0003   Epoch: 18   Global Step: 234580   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:05:23,573-Speed 3314.00 samples/sec   Loss 0.8176   LearningRate 0.0003   Epoch: 18   Global Step: 234590   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:05:26,702-Speed 3274.78 samples/sec   Loss 0.8151   LearningRate 0.0003   Epoch: 18   Global Step: 234600   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:05:29,819-Speed 3285.93 samples/sec   Loss 0.8189   LearningRate 0.0003   Epoch: 18   Global Step: 234610   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:05:32,951-Speed 3270.56 samples/sec   Loss 0.8322   LearningRate 0.0003   Epoch: 18   Global Step: 234620   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:05:36,038-Speed 3318.46 samples/sec   Loss 0.8279   LearningRate 0.0003   Epoch: 18   Global Step: 234630   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:05:39,150-Speed 3291.56 samples/sec   Loss 0.8143   LearningRate 0.0003   Epoch: 18   Global Step: 234640   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:05:42,320-Speed 3230.59 samples/sec   Loss 0.8132   LearningRate 0.0003   Epoch: 18   Global Step: 234650   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:05:45,402-Speed 3324.32 samples/sec   Loss 0.8318   LearningRate 0.0003   Epoch: 18   Global Step: 234660   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:05:48,632-Speed 3170.89 samples/sec   Loss 0.7967   LearningRate 0.0003   Epoch: 18   Global Step: 234670   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:05:51,722-Speed 3314.89 samples/sec   Loss 0.8280   LearningRate 0.0003   Epoch: 18   Global Step: 234680   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:05:54,811-Speed 3316.15 samples/sec   Loss 0.7968   LearningRate 0.0003   Epoch: 18   Global Step: 234690   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:05:57,891-Speed 3326.23 samples/sec   Loss 0.8376   LearningRate 0.0003   Epoch: 18   Global Step: 234700   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:06:01,005-Speed 3288.36 samples/sec   Loss 0.8552   LearningRate 0.0003   Epoch: 18   Global Step: 234710   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:06:04,207-Speed 3199.28 samples/sec   Loss 0.8636   LearningRate 0.0003   Epoch: 18   Global Step: 234720   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:06:07,327-Speed 3283.65 samples/sec   Loss 0.8376   LearningRate 0.0003   Epoch: 18   Global Step: 234730   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:06:10,450-Speed 3279.73 samples/sec   Loss 0.8200   LearningRate 0.0003   Epoch: 18   Global Step: 234740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:06:13,536-Speed 3319.57 samples/sec   Loss 0.8372   LearningRate 0.0003   Epoch: 18   Global Step: 234750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:06:16,654-Speed 3284.84 samples/sec   Loss 0.8323   LearningRate 0.0003   Epoch: 18   Global Step: 234760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:06:19,748-Speed 3310.29 samples/sec   Loss 0.8154   LearningRate 0.0003   Epoch: 18   Global Step: 234770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:06:22,829-Speed 3324.84 samples/sec   Loss 0.8131   LearningRate 0.0003   Epoch: 18   Global Step: 234780   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:06:25,939-Speed 3293.74 samples/sec   Loss 0.8488   LearningRate 0.0003   Epoch: 18   Global Step: 234790   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:06:29,056-Speed 3285.90 samples/sec   Loss 0.7993   LearningRate 0.0003   Epoch: 18   Global Step: 234800   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:06:32,157-Speed 3303.80 samples/sec   Loss 0.8428   LearningRate 0.0003   Epoch: 18   Global Step: 234810   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:06:35,282-Speed 3277.15 samples/sec   Loss 0.8440   LearningRate 0.0003   Epoch: 18   Global Step: 234820   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:06:38,388-Speed 3298.49 samples/sec   Loss 0.8321   LearningRate 0.0003   Epoch: 18   Global Step: 234830   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:06:41,525-Speed 3265.02 samples/sec   Loss 0.8153   LearningRate 0.0003   Epoch: 18   Global Step: 234840   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:06:44,654-Speed 3273.93 samples/sec   Loss 0.8449   LearningRate 0.0003   Epoch: 18   Global Step: 234850   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:06:47,852-Speed 3203.04 samples/sec   Loss 0.8195   LearningRate 0.0003   Epoch: 18   Global Step: 234860   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:06:51,027-Speed 3226.05 samples/sec   Loss 0.8632   LearningRate 0.0003   Epoch: 18   Global Step: 234870   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:06:54,134-Speed 3296.74 samples/sec   Loss 0.7964   LearningRate 0.0003   Epoch: 18   Global Step: 234880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:06:57,193-Speed 3348.68 samples/sec   Loss 0.8233   LearningRate 0.0003   Epoch: 18   Global Step: 234890   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:07:00,396-Speed 3197.55 samples/sec   Loss 0.8284   LearningRate 0.0003   Epoch: 18   Global Step: 234900   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:07:03,509-Speed 3290.89 samples/sec   Loss 0.8263   LearningRate 0.0003   Epoch: 18   Global Step: 234910   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:07:06,647-Speed 3264.38 samples/sec   Loss 0.7912   LearningRate 0.0003   Epoch: 18   Global Step: 234920   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:07:09,719-Speed 3333.82 samples/sec   Loss 0.8276   LearningRate 0.0003   Epoch: 18   Global Step: 234930   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:07:12,877-Speed 3243.86 samples/sec   Loss 0.8129   LearningRate 0.0003   Epoch: 18   Global Step: 234940   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:07:15,981-Speed 3300.12 samples/sec   Loss 0.8237   LearningRate 0.0003   Epoch: 18   Global Step: 234950   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:07:19,069-Speed 3317.10 samples/sec   Loss 0.8037   LearningRate 0.0003   Epoch: 18   Global Step: 234960   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:07:22,136-Speed 3339.71 samples/sec   Loss 0.8338   LearningRate 0.0003   Epoch: 18   Global Step: 234970   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:07:25,267-Speed 3271.35 samples/sec   Loss 0.8091   LearningRate 0.0003   Epoch: 18   Global Step: 234980   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:07:28,386-Speed 3284.77 samples/sec   Loss 0.8199   LearningRate 0.0003   Epoch: 18   Global Step: 234990   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:07:31,470-Speed 3320.99 samples/sec   Loss 0.8149   LearningRate 0.0003   Epoch: 18   Global Step: 235000   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:07:34,575-Speed 3298.44 samples/sec   Loss 0.8377   LearningRate 0.0003   Epoch: 18   Global Step: 235010   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:07:37,733-Speed 3244.43 samples/sec   Loss 0.7933   LearningRate 0.0003   Epoch: 18   Global Step: 235020   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:07:40,850-Speed 3285.89 samples/sec   Loss 0.8441   LearningRate 0.0003   Epoch: 18   Global Step: 235030   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:07:43,971-Speed 3282.36 samples/sec   Loss 0.8082   LearningRate 0.0003   Epoch: 18   Global Step: 235040   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:07:47,089-Speed 3284.64 samples/sec   Loss 0.8292   LearningRate 0.0003   Epoch: 18   Global Step: 235050   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:07:50,180-Speed 3314.02 samples/sec   Loss 0.8344   LearningRate 0.0003   Epoch: 18   Global Step: 235060   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:07:53,364-Speed 3217.25 samples/sec   Loss 0.7959   LearningRate 0.0003   Epoch: 18   Global Step: 235070   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:07:56,459-Speed 3310.00 samples/sec   Loss 0.7671   LearningRate 0.0003   Epoch: 18   Global Step: 235080   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:07:59,556-Speed 3307.31 samples/sec   Loss 0.8167   LearningRate 0.0003   Epoch: 18   Global Step: 235090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:08:02,646-Speed 3315.38 samples/sec   Loss 0.8281   LearningRate 0.0003   Epoch: 18   Global Step: 235100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:08:05,729-Speed 3322.57 samples/sec   Loss 0.8346   LearningRate 0.0003   Epoch: 18   Global Step: 235110   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:08:08,848-Speed 3283.84 samples/sec   Loss 0.8093   LearningRate 0.0003   Epoch: 18   Global Step: 235120   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:08:11,975-Speed 3275.51 samples/sec   Loss 0.8322   LearningRate 0.0003   Epoch: 18   Global Step: 235130   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:08:15,103-Speed 3274.85 samples/sec   Loss 0.8400   LearningRate 0.0003   Epoch: 18   Global Step: 235140   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:08:18,201-Speed 3306.97 samples/sec   Loss 0.7903   LearningRate 0.0003   Epoch: 18   Global Step: 235150   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:08:21,272-Speed 3334.90 samples/sec   Loss 0.8475   LearningRate 0.0003   Epoch: 18   Global Step: 235160   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:08:24,355-Speed 3322.85 samples/sec   Loss 0.8355   LearningRate 0.0003   Epoch: 18   Global Step: 235170   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:08:27,442-Speed 3317.69 samples/sec   Loss 0.8202   LearningRate 0.0003   Epoch: 18   Global Step: 235180   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:08:30,621-Speed 3222.18 samples/sec   Loss 0.8115   LearningRate 0.0003   Epoch: 18   Global Step: 235190   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:08:33,700-Speed 3326.57 samples/sec   Loss 0.8206   LearningRate 0.0003   Epoch: 18   Global Step: 235200   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:08:36,788-Speed 3317.20 samples/sec   Loss 0.8155   LearningRate 0.0003   Epoch: 18   Global Step: 235210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:08:39,957-Speed 3232.42 samples/sec   Loss 0.8521   LearningRate 0.0003   Epoch: 18   Global Step: 235220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:08:43,022-Speed 3342.15 samples/sec   Loss 0.8380   LearningRate 0.0003   Epoch: 18   Global Step: 235230   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:08:46,075-Speed 3355.39 samples/sec   Loss 0.8080   LearningRate 0.0003   Epoch: 18   Global Step: 235240   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:08:49,160-Speed 3320.16 samples/sec   Loss 0.7848   LearningRate 0.0003   Epoch: 18   Global Step: 235250   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:08:52,263-Speed 3301.70 samples/sec   Loss 0.8069   LearningRate 0.0003   Epoch: 18   Global Step: 235260   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:08:55,339-Speed 3330.09 samples/sec   Loss 0.8413   LearningRate 0.0003   Epoch: 18   Global Step: 235270   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:08:58,434-Speed 3308.99 samples/sec   Loss 0.7963   LearningRate 0.0003   Epoch: 18   Global Step: 235280   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:09:01,605-Speed 3230.71 samples/sec   Loss 0.7650   LearningRate 0.0003   Epoch: 18   Global Step: 235290   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:09:04,799-Speed 3207.06 samples/sec   Loss 0.8274   LearningRate 0.0003   Epoch: 18   Global Step: 235300   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:09:07,891-Speed 3312.49 samples/sec   Loss 0.8711   LearningRate 0.0003   Epoch: 18   Global Step: 235310   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:09:10,992-Speed 3303.58 samples/sec   Loss 0.8300   LearningRate 0.0003   Epoch: 18   Global Step: 235320   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:09:14,093-Speed 3302.99 samples/sec   Loss 0.8344   LearningRate 0.0003   Epoch: 18   Global Step: 235330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:09:17,251-Speed 3244.07 samples/sec   Loss 0.8229   LearningRate 0.0003   Epoch: 18   Global Step: 235340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:09:20,321-Speed 3336.25 samples/sec   Loss 0.8272   LearningRate 0.0003   Epoch: 18   Global Step: 235350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:09:23,459-Speed 3264.66 samples/sec   Loss 0.7923   LearningRate 0.0003   Epoch: 18   Global Step: 235360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:09:26,523-Speed 3342.63 samples/sec   Loss 0.7631   LearningRate 0.0003   Epoch: 18   Global Step: 235370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:09:29,587-Speed 3343.41 samples/sec   Loss 0.8385   LearningRate 0.0003   Epoch: 18   Global Step: 235380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:09:32,642-Speed 3352.58 samples/sec   Loss 0.8188   LearningRate 0.0003   Epoch: 18   Global Step: 235390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:09:35,732-Speed 3315.63 samples/sec   Loss 0.8540   LearningRate 0.0003   Epoch: 18   Global Step: 235400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:09:38,832-Speed 3303.61 samples/sec   Loss 0.8333   LearningRate 0.0003   Epoch: 18   Global Step: 235410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:09:41,916-Speed 3322.02 samples/sec   Loss 0.7878   LearningRate 0.0003   Epoch: 18   Global Step: 235420   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:09:44,963-Speed 3361.62 samples/sec   Loss 0.8349   LearningRate 0.0003   Epoch: 18   Global Step: 235430   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:09:48,028-Speed 3342.54 samples/sec   Loss 0.8432   LearningRate 0.0003   Epoch: 18   Global Step: 235440   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:09:51,120-Speed 3311.90 samples/sec   Loss 0.8368   LearningRate 0.0003   Epoch: 18   Global Step: 235450   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:09:54,187-Speed 3340.53 samples/sec   Loss 0.8203   LearningRate 0.0003   Epoch: 18   Global Step: 235460   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:09:57,265-Speed 3327.96 samples/sec   Loss 0.8063   LearningRate 0.0003   Epoch: 18   Global Step: 235470   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:10:00,332-Speed 3339.60 samples/sec   Loss 0.8116   LearningRate 0.0003   Epoch: 18   Global Step: 235480   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:10:03,444-Speed 3292.00 samples/sec   Loss 0.8055   LearningRate 0.0003   Epoch: 18   Global Step: 235490   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:10:06,502-Speed 3348.77 samples/sec   Loss 0.8481   LearningRate 0.0003   Epoch: 18   Global Step: 235500   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:10:09,593-Speed 3314.81 samples/sec   Loss 0.8743   LearningRate 0.0003   Epoch: 18   Global Step: 235510   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:10:12,746-Speed 3248.55 samples/sec   Loss 0.8117   LearningRate 0.0003   Epoch: 18   Global Step: 235520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:10:15,850-Speed 3299.34 samples/sec   Loss 0.8157   LearningRate 0.0003   Epoch: 18   Global Step: 235530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:10:18,927-Speed 3329.02 samples/sec   Loss 0.8230   LearningRate 0.0003   Epoch: 18   Global Step: 235540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:10:21,988-Speed 3347.09 samples/sec   Loss 0.8247   LearningRate 0.0003   Epoch: 18   Global Step: 235550   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:10:25,073-Speed 3319.29 samples/sec   Loss 0.8174   LearningRate 0.0003   Epoch: 18   Global Step: 235560   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:10:28,172-Speed 3305.55 samples/sec   Loss 0.8459   LearningRate 0.0003   Epoch: 18   Global Step: 235570   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:10:31,314-Speed 3260.78 samples/sec   Loss 0.8500   LearningRate 0.0003   Epoch: 18   Global Step: 235580   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:10:34,403-Speed 3315.96 samples/sec   Loss 0.8089   LearningRate 0.0003   Epoch: 18   Global Step: 235590   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:10:37,542-Speed 3263.26 samples/sec   Loss 0.8291   LearningRate 0.0003   Epoch: 18   Global Step: 235600   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:10:40,595-Speed 3354.40 samples/sec   Loss 0.8036   LearningRate 0.0003   Epoch: 18   Global Step: 235610   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:10:43,667-Speed 3334.48 samples/sec   Loss 0.8129   LearningRate 0.0003   Epoch: 18   Global Step: 235620   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:10:46,791-Speed 3279.41 samples/sec   Loss 0.8373   LearningRate 0.0003   Epoch: 18   Global Step: 235630   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:10:49,904-Speed 3290.19 samples/sec   Loss 0.8036   LearningRate 0.0003   Epoch: 18   Global Step: 235640   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:10:52,962-Speed 3349.63 samples/sec   Loss 0.8114   LearningRate 0.0003   Epoch: 18   Global Step: 235650   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:10:56,077-Speed 3287.78 samples/sec   Loss 0.8084   LearningRate 0.0003   Epoch: 18   Global Step: 235660   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:10:59,167-Speed 3315.43 samples/sec   Loss 0.8208   LearningRate 0.0003   Epoch: 18   Global Step: 235670   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:11:02,253-Speed 3319.52 samples/sec   Loss 0.8105   LearningRate 0.0003   Epoch: 18   Global Step: 235680   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:11:05,321-Speed 3338.73 samples/sec   Loss 0.8505   LearningRate 0.0003   Epoch: 18   Global Step: 235690   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:11:08,409-Speed 3317.18 samples/sec   Loss 0.8085   LearningRate 0.0003   Epoch: 18   Global Step: 235700   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:11:11,482-Speed 3333.23 samples/sec   Loss 0.8043   LearningRate 0.0003   Epoch: 18   Global Step: 235710   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:11:14,542-Speed 3347.35 samples/sec   Loss 0.8161   LearningRate 0.0003   Epoch: 18   Global Step: 235720   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:11:17,627-Speed 3320.58 samples/sec   Loss 0.8123   LearningRate 0.0003   Epoch: 18   Global Step: 235730   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:11:20,701-Speed 3331.21 samples/sec   Loss 0.8122   LearningRate 0.0003   Epoch: 18   Global Step: 235740   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:11:23,768-Speed 3340.92 samples/sec   Loss 0.8148   LearningRate 0.0003   Epoch: 18   Global Step: 235750   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:11:26,906-Speed 3264.12 samples/sec   Loss 0.8073   LearningRate 0.0003   Epoch: 18   Global Step: 235760   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:11:29,975-Speed 3337.41 samples/sec   Loss 0.7878   LearningRate 0.0003   Epoch: 18   Global Step: 235770   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:11:33,040-Speed 3341.62 samples/sec   Loss 0.8012   LearningRate 0.0003   Epoch: 18   Global Step: 235780   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:11:36,145-Speed 3298.81 samples/sec   Loss 0.8044   LearningRate 0.0003   Epoch: 18   Global Step: 235790   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:11:39,270-Speed 3278.22 samples/sec   Loss 0.8375   LearningRate 0.0003   Epoch: 18   Global Step: 235800   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:11:42,397-Speed 3275.37 samples/sec   Loss 0.8475   LearningRate 0.0003   Epoch: 18   Global Step: 235810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:11:45,477-Speed 3326.30 samples/sec   Loss 0.8102   LearningRate 0.0003   Epoch: 18   Global Step: 235820   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:11:48,615-Speed 3263.97 samples/sec   Loss 0.8274   LearningRate 0.0003   Epoch: 18   Global Step: 235830   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:11:51,731-Speed 3287.02 samples/sec   Loss 0.8051   LearningRate 0.0003   Epoch: 18   Global Step: 235840   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:11:54,881-Speed 3252.50 samples/sec   Loss 0.8087   LearningRate 0.0003   Epoch: 18   Global Step: 235850   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:11:57,939-Speed 3349.55 samples/sec   Loss 0.8163   LearningRate 0.0003   Epoch: 18   Global Step: 235860   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:12:01,005-Speed 3340.85 samples/sec   Loss 0.8208   LearningRate 0.0003   Epoch: 18   Global Step: 235870   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:12:04,128-Speed 3280.30 samples/sec   Loss 0.8092   LearningRate 0.0003   Epoch: 18   Global Step: 235880   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:12:07,268-Speed 3261.10 samples/sec   Loss 0.8512   LearningRate 0.0003   Epoch: 18   Global Step: 235890   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:12:10,359-Speed 3314.18 samples/sec   Loss 0.8229   LearningRate 0.0003   Epoch: 18   Global Step: 235900   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:12:13,496-Speed 3266.01 samples/sec   Loss 0.8029   LearningRate 0.0003   Epoch: 18   Global Step: 235910   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:12:16,642-Speed 3255.20 samples/sec   Loss 0.8361   LearningRate 0.0003   Epoch: 18   Global Step: 235920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:12:19,775-Speed 3269.49 samples/sec   Loss 0.8434   LearningRate 0.0003   Epoch: 18   Global Step: 235930   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:12:22,924-Speed 3253.36 samples/sec   Loss 0.7992   LearningRate 0.0003   Epoch: 18   Global Step: 235940   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:12:26,111-Speed 3214.20 samples/sec   Loss 0.8392   LearningRate 0.0003   Epoch: 18   Global Step: 235950   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:12:29,241-Speed 3272.38 samples/sec   Loss 0.8076   LearningRate 0.0003   Epoch: 18   Global Step: 235960   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:12:32,409-Speed 3233.15 samples/sec   Loss 0.8040   LearningRate 0.0003   Epoch: 18   Global Step: 235970   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:12:35,479-Speed 3336.75 samples/sec   Loss 0.8558   LearningRate 0.0003   Epoch: 18   Global Step: 235980   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:12:38,800-Speed 3084.41 samples/sec   Loss 0.8287   LearningRate 0.0003   Epoch: 18   Global Step: 235990   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:13:10,211-Speed 326.01 samples/sec   Loss 0.8139   LearningRate 0.0002   Epoch: 19   Global Step: 236000   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:13:13,636-Speed 2991.13 samples/sec   Loss 0.7350   LearningRate 0.0002   Epoch: 19   Global Step: 236010   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:13:16,820-Speed 3217.04 samples/sec   Loss 0.7127   LearningRate 0.0002   Epoch: 19   Global Step: 236020   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:13:19,883-Speed 3344.66 samples/sec   Loss 0.7515   LearningRate 0.0002   Epoch: 19   Global Step: 236030   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:13:23,082-Speed 3201.51 samples/sec   Loss 0.7396   LearningRate 0.0002   Epoch: 19   Global Step: 236040   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:13:26,204-Speed 3281.40 samples/sec   Loss 0.6710   LearningRate 0.0002   Epoch: 19   Global Step: 236050   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:13:29,503-Speed 3105.48 samples/sec   Loss 0.7264   LearningRate 0.0002   Epoch: 19   Global Step: 236060   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:13:32,599-Speed 3307.75 samples/sec   Loss 0.7406   LearningRate 0.0002   Epoch: 19   Global Step: 236070   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:13:35,803-Speed 3197.47 samples/sec   Loss 0.7408   LearningRate 0.0002   Epoch: 19   Global Step: 236080   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:13:38,953-Speed 3251.99 samples/sec   Loss 0.7333   LearningRate 0.0002   Epoch: 19   Global Step: 236090   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:13:42,023-Speed 3337.01 samples/sec   Loss 0.7422   LearningRate 0.0002   Epoch: 19   Global Step: 236100   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:13:45,070-Speed 3361.29 samples/sec   Loss 0.7146   LearningRate 0.0002   Epoch: 19   Global Step: 236110   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:13:48,203-Speed 3270.00 samples/sec   Loss 0.7255   LearningRate 0.0002   Epoch: 19   Global Step: 236120   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:13:51,288-Speed 3320.39 samples/sec   Loss 0.7151   LearningRate 0.0002   Epoch: 19   Global Step: 236130   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:13:54,383-Speed 3309.68 samples/sec   Loss 0.7116   LearningRate 0.0002   Epoch: 19   Global Step: 236140   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:13:57,434-Speed 3357.40 samples/sec   Loss 0.7465   LearningRate 0.0002   Epoch: 19   Global Step: 236150   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:14:00,501-Speed 3339.30 samples/sec   Loss 0.7493   LearningRate 0.0002   Epoch: 19   Global Step: 236160   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:14:03,802-Speed 3102.78 samples/sec   Loss 0.7215   LearningRate 0.0002   Epoch: 19   Global Step: 236170   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:14:06,852-Speed 3359.41 samples/sec   Loss 0.7327   LearningRate 0.0002   Epoch: 19   Global Step: 236180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:14:09,902-Speed 3357.77 samples/sec   Loss 0.7294   LearningRate 0.0002   Epoch: 19   Global Step: 236190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:14:12,985-Speed 3322.64 samples/sec   Loss 0.7350   LearningRate 0.0002   Epoch: 19   Global Step: 236200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:14:16,126-Speed 3261.45 samples/sec   Loss 0.7216   LearningRate 0.0002   Epoch: 19   Global Step: 236210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:14:19,199-Speed 3332.94 samples/sec   Loss 0.7484   LearningRate 0.0002   Epoch: 19   Global Step: 236220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:14:22,247-Speed 3360.78 samples/sec   Loss 0.7693   LearningRate 0.0002   Epoch: 19   Global Step: 236230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:14:25,350-Speed 3300.99 samples/sec   Loss 0.7371   LearningRate 0.0002   Epoch: 19   Global Step: 236240   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:14:28,457-Speed 3296.55 samples/sec   Loss 0.7602   LearningRate 0.0002   Epoch: 19   Global Step: 236250   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:14:31,586-Speed 3274.03 samples/sec   Loss 0.7157   LearningRate 0.0002   Epoch: 19   Global Step: 236260   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:14:34,658-Speed 3333.47 samples/sec   Loss 0.7372   LearningRate 0.0002   Epoch: 19   Global Step: 236270   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:14:37,745-Speed 3318.82 samples/sec   Loss 0.7198   LearningRate 0.0002   Epoch: 19   Global Step: 236280   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:14:40,867-Speed 3281.31 samples/sec   Loss 0.7353   LearningRate 0.0002   Epoch: 19   Global Step: 236290   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:14:43,933-Speed 3340.82 samples/sec   Loss 0.7212   LearningRate 0.0002   Epoch: 19   Global Step: 236300   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:14:47,018-Speed 3320.37 samples/sec   Loss 0.7619   LearningRate 0.0002   Epoch: 19   Global Step: 236310   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:14:50,135-Speed 3286.04 samples/sec   Loss 0.6882   LearningRate 0.0002   Epoch: 19   Global Step: 236320   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:14:53,267-Speed 3270.64 samples/sec   Loss 0.7741   LearningRate 0.0002   Epoch: 19   Global Step: 236330   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:14:56,346-Speed 3326.90 samples/sec   Loss 0.7322   LearningRate 0.0002   Epoch: 19   Global Step: 236340   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:14:59,462-Speed 3287.24 samples/sec   Loss 0.7324   LearningRate 0.0002   Epoch: 19   Global Step: 236350   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:15:02,664-Speed 3198.53 samples/sec   Loss 0.7439   LearningRate 0.0002   Epoch: 19   Global Step: 236360   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:15:05,814-Speed 3252.28 samples/sec   Loss 0.7486   LearningRate 0.0002   Epoch: 19   Global Step: 236370   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:15:08,869-Speed 3352.82 samples/sec   Loss 0.7253   LearningRate 0.0002   Epoch: 19   Global Step: 236380   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:15:12,036-Speed 3234.48 samples/sec   Loss 0.7016   LearningRate 0.0002   Epoch: 19   Global Step: 236390   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:15:15,243-Speed 3193.86 samples/sec   Loss 0.7413   LearningRate 0.0002   Epoch: 19   Global Step: 236400   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:15:18,439-Speed 3205.23 samples/sec   Loss 0.7478   LearningRate 0.0002   Epoch: 19   Global Step: 236410   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:15:21,496-Speed 3349.97 samples/sec   Loss 0.7284   LearningRate 0.0002   Epoch: 19   Global Step: 236420   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:15:24,582-Speed 3319.48 samples/sec   Loss 0.7040   LearningRate 0.0002   Epoch: 19   Global Step: 236430   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:15:27,830-Speed 3153.33 samples/sec   Loss 0.7471   LearningRate 0.0002   Epoch: 19   Global Step: 236440   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:15:30,929-Speed 3306.01 samples/sec   Loss 0.7761   LearningRate 0.0002   Epoch: 19   Global Step: 236450   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:15:34,009-Speed 3325.21 samples/sec   Loss 0.7203   LearningRate 0.0002   Epoch: 19   Global Step: 236460   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:15:37,144-Speed 3267.72 samples/sec   Loss 0.7124   LearningRate 0.0002   Epoch: 19   Global Step: 236470   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:15:40,341-Speed 3203.58 samples/sec   Loss 0.7411   LearningRate 0.0002   Epoch: 19   Global Step: 236480   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:15:43,467-Speed 3277.38 samples/sec   Loss 0.7517   LearningRate 0.0002   Epoch: 19   Global Step: 236490   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:15:46,541-Speed 3331.73 samples/sec   Loss 0.7057   LearningRate 0.0002   Epoch: 19   Global Step: 236500   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:15:49,652-Speed 3292.28 samples/sec   Loss 0.7217   LearningRate 0.0002   Epoch: 19   Global Step: 236510   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:15:52,813-Speed 3241.03 samples/sec   Loss 0.7272   LearningRate 0.0002   Epoch: 19   Global Step: 236520   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:15:55,969-Speed 3245.39 samples/sec   Loss 0.6964   LearningRate 0.0002   Epoch: 19   Global Step: 236530   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:15:59,035-Speed 3340.85 samples/sec   Loss 0.7535   LearningRate 0.0002   Epoch: 19   Global Step: 236540   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:16:02,161-Speed 3276.76 samples/sec   Loss 0.6815   LearningRate 0.0002   Epoch: 19   Global Step: 236550   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:16:05,316-Speed 3246.58 samples/sec   Loss 0.7517   LearningRate 0.0002   Epoch: 19   Global Step: 236560   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:16:08,405-Speed 3316.75 samples/sec   Loss 0.7283   LearningRate 0.0002   Epoch: 19   Global Step: 236570   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:16:11,456-Speed 3356.81 samples/sec   Loss 0.7570   LearningRate 0.0002   Epoch: 19   Global Step: 236580   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:16:14,620-Speed 3237.59 samples/sec   Loss 0.6886   LearningRate 0.0002   Epoch: 19   Global Step: 236590   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:16:17,676-Speed 3352.28 samples/sec   Loss 0.7479   LearningRate 0.0002   Epoch: 19   Global Step: 236600   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:16:20,725-Speed 3358.76 samples/sec   Loss 0.7290   LearningRate 0.0002   Epoch: 19   Global Step: 236610   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:16:23,774-Speed 3360.32 samples/sec   Loss 0.7122   LearningRate 0.0002   Epoch: 19   Global Step: 236620   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:16:26,881-Speed 3297.21 samples/sec   Loss 0.7328   LearningRate 0.0002   Epoch: 19   Global Step: 236630   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:16:30,001-Speed 3282.23 samples/sec   Loss 0.7247   LearningRate 0.0002   Epoch: 19   Global Step: 236640   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:16:33,087-Speed 3320.38 samples/sec   Loss 0.7615   LearningRate 0.0002   Epoch: 19   Global Step: 236650   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:16:36,245-Speed 3243.06 samples/sec   Loss 0.7409   LearningRate 0.0002   Epoch: 19   Global Step: 236660   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:16:39,384-Speed 3262.99 samples/sec   Loss 0.7406   LearningRate 0.0002   Epoch: 19   Global Step: 236670   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:16:42,516-Speed 3270.05 samples/sec   Loss 0.7492   LearningRate 0.0002   Epoch: 19   Global Step: 236680   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:16:45,614-Speed 3306.31 samples/sec   Loss 0.7399   LearningRate 0.0002   Epoch: 19   Global Step: 236690   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:16:48,725-Speed 3292.67 samples/sec   Loss 0.7053   LearningRate 0.0002   Epoch: 19   Global Step: 236700   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:16:51,810-Speed 3320.46 samples/sec   Loss 0.7384   LearningRate 0.0002   Epoch: 19   Global Step: 236710   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:16:54,893-Speed 3322.60 samples/sec   Loss 0.7349   LearningRate 0.0002   Epoch: 19   Global Step: 236720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:16:57,933-Speed 3370.30 samples/sec   Loss 0.7116   LearningRate 0.0002   Epoch: 19   Global Step: 236730   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:17:01,015-Speed 3323.02 samples/sec   Loss 0.7139   LearningRate 0.0002   Epoch: 19   Global Step: 236740   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:17:04,118-Speed 3300.58 samples/sec   Loss 0.7480   LearningRate 0.0002   Epoch: 19   Global Step: 236750   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:17:07,208-Speed 3314.76 samples/sec   Loss 0.7061   LearningRate 0.0002   Epoch: 19   Global Step: 236760   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:17:10,287-Speed 3327.67 samples/sec   Loss 0.7267   LearningRate 0.0002   Epoch: 19   Global Step: 236770   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:17:13,356-Speed 3337.08 samples/sec   Loss 0.7040   LearningRate 0.0002   Epoch: 19   Global Step: 236780   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:17:16,429-Speed 3333.99 samples/sec   Loss 0.7732   LearningRate 0.0002   Epoch: 19   Global Step: 236790   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:17:19,579-Speed 3251.27 samples/sec   Loss 0.7477   LearningRate 0.0002   Epoch: 19   Global Step: 236800   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:17:22,655-Speed 3331.17 samples/sec   Loss 0.7426   LearningRate 0.0002   Epoch: 19   Global Step: 236810   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:17:25,730-Speed 3330.90 samples/sec   Loss 0.7193   LearningRate 0.0002   Epoch: 19   Global Step: 236820   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:17:28,818-Speed 3317.32 samples/sec   Loss 0.7383   LearningRate 0.0002   Epoch: 19   Global Step: 236830   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:17:31,886-Speed 3338.19 samples/sec   Loss 0.7562   LearningRate 0.0002   Epoch: 19   Global Step: 236840   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:17:35,019-Speed 3269.46 samples/sec   Loss 0.7061   LearningRate 0.0002   Epoch: 19   Global Step: 236850   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:17:38,142-Speed 3280.00 samples/sec   Loss 0.7050   LearningRate 0.0002   Epoch: 19   Global Step: 236860   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:17:41,319-Speed 3223.91 samples/sec   Loss 0.7246   LearningRate 0.0002   Epoch: 19   Global Step: 236870   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:17:44,443-Speed 3279.48 samples/sec   Loss 0.7254   LearningRate 0.0002   Epoch: 19   Global Step: 236880   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:17:47,573-Speed 3272.12 samples/sec   Loss 0.7089   LearningRate 0.0002   Epoch: 19   Global Step: 236890   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:17:50,654-Speed 3324.47 samples/sec   Loss 0.7245   LearningRate 0.0002   Epoch: 19   Global Step: 236900   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:17:53,730-Speed 3330.85 samples/sec   Loss 0.7836   LearningRate 0.0002   Epoch: 19   Global Step: 236910   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:17:56,801-Speed 3335.46 samples/sec   Loss 0.7447   LearningRate 0.0002   Epoch: 19   Global Step: 236920   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:17:59,882-Speed 3323.76 samples/sec   Loss 0.7246   LearningRate 0.0002   Epoch: 19   Global Step: 236930   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:18:02,973-Speed 3315.31 samples/sec   Loss 0.7429   LearningRate 0.0002   Epoch: 19   Global Step: 236940   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:18:06,049-Speed 3329.93 samples/sec   Loss 0.7143   LearningRate 0.0002   Epoch: 19   Global Step: 236950   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:18:09,116-Speed 3339.52 samples/sec   Loss 0.7266   LearningRate 0.0002   Epoch: 19   Global Step: 236960   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:18:12,176-Speed 3348.57 samples/sec   Loss 0.7353   LearningRate 0.0002   Epoch: 19   Global Step: 236970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:18:15,265-Speed 3315.86 samples/sec   Loss 0.7310   LearningRate 0.0002   Epoch: 19   Global Step: 236980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:18:18,387-Speed 3280.75 samples/sec   Loss 0.7255   LearningRate 0.0002   Epoch: 19   Global Step: 236990   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:18:21,479-Speed 3312.96 samples/sec   Loss 0.7537   LearningRate 0.0002   Epoch: 19   Global Step: 237000   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:18:24,633-Speed 3247.84 samples/sec   Loss 0.7412   LearningRate 0.0002   Epoch: 19   Global Step: 237010   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:18:27,707-Speed 3331.63 samples/sec   Loss 0.7153   LearningRate 0.0002   Epoch: 19   Global Step: 237020   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:18:30,763-Speed 3351.47 samples/sec   Loss 0.7107   LearningRate 0.0002   Epoch: 19   Global Step: 237030   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:18:33,888-Speed 3278.06 samples/sec   Loss 0.7368   LearningRate 0.0002   Epoch: 19   Global Step: 237040   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:18:36,975-Speed 3318.29 samples/sec   Loss 0.7446   LearningRate 0.0002   Epoch: 19   Global Step: 237050   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:18:40,086-Speed 3292.84 samples/sec   Loss 0.7132   LearningRate 0.0002   Epoch: 19   Global Step: 237060   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:18:43,221-Speed 3267.54 samples/sec   Loss 0.7299   LearningRate 0.0002   Epoch: 19   Global Step: 237070   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:18:46,338-Speed 3286.08 samples/sec   Loss 0.7140   LearningRate 0.0002   Epoch: 19   Global Step: 237080   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:18:49,448-Speed 3294.02 samples/sec   Loss 0.7063   LearningRate 0.0002   Epoch: 19   Global Step: 237090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:18:52,559-Speed 3293.02 samples/sec   Loss 0.7398   LearningRate 0.0002   Epoch: 19   Global Step: 237100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:18:55,695-Speed 3265.85 samples/sec   Loss 0.7209   LearningRate 0.0002   Epoch: 19   Global Step: 237110   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:18:58,787-Speed 3312.18 samples/sec   Loss 0.7047   LearningRate 0.0002   Epoch: 19   Global Step: 237120   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:19:01,960-Speed 3229.12 samples/sec   Loss 0.7574   LearningRate 0.0002   Epoch: 19   Global Step: 237130   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:19:05,028-Speed 3338.50 samples/sec   Loss 0.7544   LearningRate 0.0002   Epoch: 19   Global Step: 237140   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:19:08,123-Speed 3309.96 samples/sec   Loss 0.7399   LearningRate 0.0002   Epoch: 19   Global Step: 237150   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:19:11,185-Speed 3344.89 samples/sec   Loss 0.7223   LearningRate 0.0002   Epoch: 19   Global Step: 237160   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:19:14,274-Speed 3316.01 samples/sec   Loss 0.7385   LearningRate 0.0002   Epoch: 19   Global Step: 237170   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:19:17,370-Speed 3308.66 samples/sec   Loss 0.7559   LearningRate 0.0002   Epoch: 19   Global Step: 237180   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:19:20,443-Speed 3333.43 samples/sec   Loss 0.7443   LearningRate 0.0002   Epoch: 19   Global Step: 237190   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:19:23,558-Speed 3288.41 samples/sec   Loss 0.7478   LearningRate 0.0002   Epoch: 19   Global Step: 237200   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:19:26,740-Speed 3219.65 samples/sec   Loss 0.7466   LearningRate 0.0002   Epoch: 19   Global Step: 237210   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:19:29,865-Speed 3277.35 samples/sec   Loss 0.7256   LearningRate 0.0002   Epoch: 19   Global Step: 237220   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:19:32,949-Speed 3321.80 samples/sec   Loss 0.7552   LearningRate 0.0002   Epoch: 19   Global Step: 237230   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:19:36,073-Speed 3278.66 samples/sec   Loss 0.7292   LearningRate 0.0002   Epoch: 19   Global Step: 237240   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:19:39,135-Speed 3345.11 samples/sec   Loss 0.7626   LearningRate 0.0002   Epoch: 19   Global Step: 237250   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:19:42,204-Speed 3337.68 samples/sec   Loss 0.7157   LearningRate 0.0002   Epoch: 19   Global Step: 237260   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:19:45,281-Speed 3328.96 samples/sec   Loss 0.7349   LearningRate 0.0002   Epoch: 19   Global Step: 237270   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:19:48,407-Speed 3277.25 samples/sec   Loss 0.7277   LearningRate 0.0002   Epoch: 19   Global Step: 237280   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:19:51,538-Speed 3271.26 samples/sec   Loss 0.7115   LearningRate 0.0002   Epoch: 19   Global Step: 237290   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:19:54,677-Speed 3262.43 samples/sec   Loss 0.7353   LearningRate 0.0002   Epoch: 19   Global Step: 237300   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:19:57,730-Speed 3355.63 samples/sec   Loss 0.7054   LearningRate 0.0002   Epoch: 19   Global Step: 237310   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:20:00,824-Speed 3311.28 samples/sec   Loss 0.7290   LearningRate 0.0002   Epoch: 19   Global Step: 237320   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:20:03,966-Speed 3260.27 samples/sec   Loss 0.7204   LearningRate 0.0002   Epoch: 19   Global Step: 237330   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:20:07,021-Speed 3354.14 samples/sec   Loss 0.7163   LearningRate 0.0002   Epoch: 19   Global Step: 237340   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:20:10,105-Speed 3321.29 samples/sec   Loss 0.7460   LearningRate 0.0002   Epoch: 19   Global Step: 237350   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:20:13,229-Speed 3278.75 samples/sec   Loss 0.7115   LearningRate 0.0002   Epoch: 19   Global Step: 237360   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:20:16,356-Speed 3276.08 samples/sec   Loss 0.7608   LearningRate 0.0002   Epoch: 19   Global Step: 237370   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:20:19,413-Speed 3350.65 samples/sec   Loss 0.7204   LearningRate 0.0002   Epoch: 19   Global Step: 237380   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:20:22,489-Speed 3330.55 samples/sec   Loss 0.7273   LearningRate 0.0002   Epoch: 19   Global Step: 237390   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:20:25,568-Speed 3325.92 samples/sec   Loss 0.7086   LearningRate 0.0002   Epoch: 19   Global Step: 237400   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:20:28,645-Speed 3329.82 samples/sec   Loss 0.7520   LearningRate 0.0002   Epoch: 19   Global Step: 237410   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:20:31,794-Speed 3253.03 samples/sec   Loss 0.7340   LearningRate 0.0002   Epoch: 19   Global Step: 237420   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:20:34,919-Speed 3277.76 samples/sec   Loss 0.7222   LearningRate 0.0002   Epoch: 19   Global Step: 237430   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:20:38,020-Speed 3303.35 samples/sec   Loss 0.7463   LearningRate 0.0002   Epoch: 19   Global Step: 237440   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:20:41,125-Speed 3299.06 samples/sec   Loss 0.7346   LearningRate 0.0002   Epoch: 19   Global Step: 237450   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:20:44,187-Speed 3345.03 samples/sec   Loss 0.7063   LearningRate 0.0002   Epoch: 19   Global Step: 237460   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:20:47,243-Speed 3351.11 samples/sec   Loss 0.7310   LearningRate 0.0002   Epoch: 19   Global Step: 237470   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:20:50,328-Speed 3320.96 samples/sec   Loss 0.7145   LearningRate 0.0002   Epoch: 19   Global Step: 237480   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:20:53,479-Speed 3250.30 samples/sec   Loss 0.7378   LearningRate 0.0002   Epoch: 19   Global Step: 237490   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:20:56,557-Speed 3328.47 samples/sec   Loss 0.7326   LearningRate 0.0002   Epoch: 19   Global Step: 237500   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:20:59,652-Speed 3309.64 samples/sec   Loss 0.6945   LearningRate 0.0002   Epoch: 19   Global Step: 237510   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:21:02,755-Speed 3300.42 samples/sec   Loss 0.7041   LearningRate 0.0002   Epoch: 19   Global Step: 237520   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:21:05,897-Speed 3260.34 samples/sec   Loss 0.6988   LearningRate 0.0002   Epoch: 19   Global Step: 237530   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:21:09,032-Speed 3267.47 samples/sec   Loss 0.7431   LearningRate 0.0002   Epoch: 19   Global Step: 237540   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:21:12,123-Speed 3313.95 samples/sec   Loss 0.7486   LearningRate 0.0002   Epoch: 19   Global Step: 237550   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:21:15,268-Speed 3257.00 samples/sec   Loss 0.7188   LearningRate 0.0002   Epoch: 19   Global Step: 237560   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:21:18,421-Speed 3248.53 samples/sec   Loss 0.7346   LearningRate 0.0002   Epoch: 19   Global Step: 237570   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:21:21,493-Speed 3335.32 samples/sec   Loss 0.7408   LearningRate 0.0002   Epoch: 19   Global Step: 237580   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:21:24,588-Speed 3308.64 samples/sec   Loss 0.7090   LearningRate 0.0002   Epoch: 19   Global Step: 237590   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:21:27,666-Speed 3328.43 samples/sec   Loss 0.7085   LearningRate 0.0002   Epoch: 19   Global Step: 237600   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:21:30,740-Speed 3332.46 samples/sec   Loss 0.7220   LearningRate 0.0002   Epoch: 19   Global Step: 237610   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:21:33,807-Speed 3340.08 samples/sec   Loss 0.7523   LearningRate 0.0002   Epoch: 19   Global Step: 237620   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:21:36,884-Speed 3328.61 samples/sec   Loss 0.7211   LearningRate 0.0002   Epoch: 19   Global Step: 237630   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:21:39,959-Speed 3332.34 samples/sec   Loss 0.7591   LearningRate 0.0002   Epoch: 19   Global Step: 237640   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:21:43,082-Speed 3279.70 samples/sec   Loss 0.7527   LearningRate 0.0002   Epoch: 19   Global Step: 237650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:21:46,143-Speed 3346.78 samples/sec   Loss 0.7589   LearningRate 0.0002   Epoch: 19   Global Step: 237660   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:21:49,258-Speed 3287.73 samples/sec   Loss 0.7521   LearningRate 0.0002   Epoch: 19   Global Step: 237670   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:21:52,370-Speed 3291.68 samples/sec   Loss 0.7240   LearningRate 0.0002   Epoch: 19   Global Step: 237680   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:21:55,442-Speed 3334.85 samples/sec   Loss 0.7289   LearningRate 0.0002   Epoch: 19   Global Step: 237690   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:21:58,494-Speed 3356.88 samples/sec   Loss 0.7426   LearningRate 0.0002   Epoch: 19   Global Step: 237700   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:22:01,612-Speed 3284.03 samples/sec   Loss 0.7692   LearningRate 0.0002   Epoch: 19   Global Step: 237710   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:22:04,728-Speed 3287.67 samples/sec   Loss 0.7220   LearningRate 0.0002   Epoch: 19   Global Step: 237720   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:22:07,868-Speed 3262.81 samples/sec   Loss 0.7204   LearningRate 0.0002   Epoch: 19   Global Step: 237730   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:22:10,945-Speed 3328.14 samples/sec   Loss 0.7549   LearningRate 0.0002   Epoch: 19   Global Step: 237740   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:22:14,113-Speed 3233.65 samples/sec   Loss 0.7416   LearningRate 0.0002   Epoch: 19   Global Step: 237750   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:22:17,260-Speed 3255.49 samples/sec   Loss 0.7343   LearningRate 0.0002   Epoch: 19   Global Step: 237760   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:22:20,354-Speed 3310.54 samples/sec   Loss 0.7401   LearningRate 0.0002   Epoch: 19   Global Step: 237770   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:22:23,456-Speed 3301.59 samples/sec   Loss 0.7115   LearningRate 0.0002   Epoch: 19   Global Step: 237780   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:22:26,613-Speed 3244.53 samples/sec   Loss 0.7179   LearningRate 0.0002   Epoch: 19   Global Step: 237790   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:22:29,737-Speed 3279.37 samples/sec   Loss 0.7232   LearningRate 0.0002   Epoch: 19   Global Step: 237800   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:22:32,861-Speed 3278.52 samples/sec   Loss 0.7403   LearningRate 0.0002   Epoch: 19   Global Step: 237810   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:22:35,981-Speed 3283.75 samples/sec   Loss 0.7319   LearningRate 0.0002   Epoch: 19   Global Step: 237820   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:22:39,120-Speed 3262.86 samples/sec   Loss 0.6903   LearningRate 0.0002   Epoch: 19   Global Step: 237830   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:22:42,206-Speed 3319.22 samples/sec   Loss 0.7184   LearningRate 0.0002   Epoch: 19   Global Step: 237840   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:22:45,328-Speed 3280.94 samples/sec   Loss 0.7092   LearningRate 0.0002   Epoch: 19   Global Step: 237850   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:22:48,534-Speed 3194.90 samples/sec   Loss 0.6940   LearningRate 0.0002   Epoch: 19   Global Step: 237860   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:22:51,698-Speed 3238.39 samples/sec   Loss 0.7133   LearningRate 0.0002   Epoch: 19   Global Step: 237870   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:22:54,916-Speed 3182.44 samples/sec   Loss 0.7114   LearningRate 0.0002   Epoch: 19   Global Step: 237880   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:22:58,152-Speed 3165.10 samples/sec   Loss 0.7589   LearningRate 0.0002   Epoch: 19   Global Step: 237890   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:23:01,333-Speed 3220.39 samples/sec   Loss 0.7437   LearningRate 0.0002   Epoch: 19   Global Step: 237900   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:23:04,471-Speed 3264.70 samples/sec   Loss 0.7103   LearningRate 0.0002   Epoch: 19   Global Step: 237910   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:23:07,579-Speed 3295.64 samples/sec   Loss 0.7608   LearningRate 0.0002   Epoch: 19   Global Step: 237920   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:23:10,631-Speed 3356.11 samples/sec   Loss 0.7432   LearningRate 0.0002   Epoch: 19   Global Step: 237930   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:23:13,759-Speed 3275.21 samples/sec   Loss 0.7277   LearningRate 0.0002   Epoch: 19   Global Step: 237940   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:23:16,856-Speed 3307.19 samples/sec   Loss 0.7638   LearningRate 0.0002   Epoch: 19   Global Step: 237950   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:23:20,004-Speed 3254.25 samples/sec   Loss 0.7409   LearningRate 0.0002   Epoch: 19   Global Step: 237960   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:23:23,104-Speed 3304.76 samples/sec   Loss 0.7403   LearningRate 0.0002   Epoch: 19   Global Step: 237970   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:23:26,240-Speed 3265.68 samples/sec   Loss 0.7152   LearningRate 0.0002   Epoch: 19   Global Step: 237980   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:23:29,349-Speed 3295.61 samples/sec   Loss 0.7327   LearningRate 0.0002   Epoch: 19   Global Step: 237990   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:23:32,419-Speed 3336.43 samples/sec   Loss 0.7199   LearningRate 0.0002   Epoch: 19   Global Step: 238000   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:23:35,541-Speed 3280.67 samples/sec   Loss 0.7360   LearningRate 0.0002   Epoch: 19   Global Step: 238010   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:23:38,663-Speed 3281.13 samples/sec   Loss 0.7327   LearningRate 0.0002   Epoch: 19   Global Step: 238020   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:23:41,819-Speed 3245.33 samples/sec   Loss 0.7060   LearningRate 0.0002   Epoch: 19   Global Step: 238030   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:23:44,923-Speed 3300.43 samples/sec   Loss 0.7558   LearningRate 0.0002   Epoch: 19   Global Step: 238040   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:23:48,089-Speed 3236.21 samples/sec   Loss 0.7367   LearningRate 0.0002   Epoch: 19   Global Step: 238050   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:23:51,244-Speed 3245.97 samples/sec   Loss 0.7246   LearningRate 0.0002   Epoch: 19   Global Step: 238060   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:23:54,394-Speed 3252.55 samples/sec   Loss 0.7540   LearningRate 0.0002   Epoch: 19   Global Step: 238070   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:23:57,490-Speed 3308.40 samples/sec   Loss 0.7195   LearningRate 0.0002   Epoch: 19   Global Step: 238080   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:24:00,620-Speed 3272.54 samples/sec   Loss 0.7452   LearningRate 0.0002   Epoch: 19   Global Step: 238090   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:24:03,716-Speed 3309.19 samples/sec   Loss 0.7611   LearningRate 0.0002   Epoch: 19   Global Step: 238100   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:24:06,840-Speed 3278.71 samples/sec   Loss 0.7102   LearningRate 0.0002   Epoch: 19   Global Step: 238110   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:24:09,925-Speed 3320.14 samples/sec   Loss 0.7459   LearningRate 0.0002   Epoch: 19   Global Step: 238120   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:24:13,054-Speed 3273.36 samples/sec   Loss 0.7366   LearningRate 0.0002   Epoch: 19   Global Step: 238130   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:24:16,172-Speed 3285.21 samples/sec   Loss 0.7478   LearningRate 0.0002   Epoch: 19   Global Step: 238140   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:24:19,276-Speed 3299.91 samples/sec   Loss 0.7205   LearningRate 0.0002   Epoch: 19   Global Step: 238150   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:24:22,378-Speed 3302.84 samples/sec   Loss 0.7260   LearningRate 0.0002   Epoch: 19   Global Step: 238160   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:24:25,445-Speed 3339.87 samples/sec   Loss 0.7290   LearningRate 0.0002   Epoch: 19   Global Step: 238170   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:24:28,534-Speed 3315.25 samples/sec   Loss 0.7152   LearningRate 0.0002   Epoch: 19   Global Step: 238180   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:24:31,624-Speed 3315.30 samples/sec   Loss 0.7681   LearningRate 0.0002   Epoch: 19   Global Step: 238190   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:24:34,762-Speed 3264.04 samples/sec   Loss 0.7151   LearningRate 0.0002   Epoch: 19   Global Step: 238200   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:24:37,876-Speed 3289.49 samples/sec   Loss 0.7265   LearningRate 0.0002   Epoch: 19   Global Step: 238210   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:24:40,968-Speed 3313.28 samples/sec   Loss 0.7378   LearningRate 0.0002   Epoch: 19   Global Step: 238220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:24:44,102-Speed 3268.19 samples/sec   Loss 0.7327   LearningRate 0.0002   Epoch: 19   Global Step: 238230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:24:47,236-Speed 3268.85 samples/sec   Loss 0.7446   LearningRate 0.0002   Epoch: 19   Global Step: 238240   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:24:50,386-Speed 3251.49 samples/sec   Loss 0.7395   LearningRate 0.0002   Epoch: 19   Global Step: 238250   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:24:53,472-Speed 3319.32 samples/sec   Loss 0.7295   LearningRate 0.0002   Epoch: 19   Global Step: 238260   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:24:56,614-Speed 3260.41 samples/sec   Loss 0.7065   LearningRate 0.0002   Epoch: 19   Global Step: 238270   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:24:59,715-Speed 3303.70 samples/sec   Loss 0.7202   LearningRate 0.0002   Epoch: 19   Global Step: 238280   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:25:02,795-Speed 3326.05 samples/sec   Loss 0.7315   LearningRate 0.0002   Epoch: 19   Global Step: 238290   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:25:05,953-Speed 3242.76 samples/sec   Loss 0.7520   LearningRate 0.0002   Epoch: 19   Global Step: 238300   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:25:09,064-Speed 3292.87 samples/sec   Loss 0.7320   LearningRate 0.0002   Epoch: 19   Global Step: 238310   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:25:12,159-Speed 3310.51 samples/sec   Loss 0.7501   LearningRate 0.0002   Epoch: 19   Global Step: 238320   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:25:15,312-Speed 3248.35 samples/sec   Loss 0.7282   LearningRate 0.0002   Epoch: 19   Global Step: 238330   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:25:18,473-Speed 3240.97 samples/sec   Loss 0.7135   LearningRate 0.0002   Epoch: 19   Global Step: 238340   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:25:21,560-Speed 3317.19 samples/sec   Loss 0.7152   LearningRate 0.0002   Epoch: 19   Global Step: 238350   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:25:24,758-Speed 3203.11 samples/sec   Loss 0.7176   LearningRate 0.0002   Epoch: 19   Global Step: 238360   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:25:27,943-Speed 3216.84 samples/sec   Loss 0.6763   LearningRate 0.0002   Epoch: 19   Global Step: 238370   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:25:31,054-Speed 3291.92 samples/sec   Loss 0.7246   LearningRate 0.0002   Epoch: 19   Global Step: 238380   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:25:34,130-Speed 3329.72 samples/sec   Loss 0.7126   LearningRate 0.0002   Epoch: 19   Global Step: 238390   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:25:37,240-Speed 3294.31 samples/sec   Loss 0.7334   LearningRate 0.0002   Epoch: 19   Global Step: 238400   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:25:40,392-Speed 3249.54 samples/sec   Loss 0.7609   LearningRate 0.0002   Epoch: 19   Global Step: 238410   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:25:43,550-Speed 3243.25 samples/sec   Loss 0.7411   LearningRate 0.0002   Epoch: 19   Global Step: 238420   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:25:46,691-Speed 3261.06 samples/sec   Loss 0.7422   LearningRate 0.0002   Epoch: 19   Global Step: 238430   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:25:49,891-Speed 3201.37 samples/sec   Loss 0.6898   LearningRate 0.0002   Epoch: 19   Global Step: 238440   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:25:53,018-Speed 3276.11 samples/sec   Loss 0.7426   LearningRate 0.0002   Epoch: 19   Global Step: 238450   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:25:56,140-Speed 3281.42 samples/sec   Loss 0.7192   LearningRate 0.0002   Epoch: 19   Global Step: 238460   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:25:59,308-Speed 3232.94 samples/sec   Loss 0.7005   LearningRate 0.0002   Epoch: 19   Global Step: 238470   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:26:02,404-Speed 3308.43 samples/sec   Loss 0.7360   LearningRate 0.0002   Epoch: 19   Global Step: 238480   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:26:05,525-Speed 3281.63 samples/sec   Loss 0.7817   LearningRate 0.0002   Epoch: 19   Global Step: 238490   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:26:08,640-Speed 3288.60 samples/sec   Loss 0.7266   LearningRate 0.0002   Epoch: 19   Global Step: 238500   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:26:11,821-Speed 3219.69 samples/sec   Loss 0.7257   LearningRate 0.0002   Epoch: 19   Global Step: 238510   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:26:14,927-Speed 3297.79 samples/sec   Loss 0.7160   LearningRate 0.0002   Epoch: 19   Global Step: 238520   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:26:18,065-Speed 3265.00 samples/sec   Loss 0.7527   LearningRate 0.0002   Epoch: 19   Global Step: 238530   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:26:21,141-Speed 3330.73 samples/sec   Loss 0.7381   LearningRate 0.0002   Epoch: 19   Global Step: 238540   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:26:24,260-Speed 3283.55 samples/sec   Loss 0.7366   LearningRate 0.0002   Epoch: 19   Global Step: 238550   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:26:27,380-Speed 3282.93 samples/sec   Loss 0.7157   LearningRate 0.0002   Epoch: 19   Global Step: 238560   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:26:30,472-Speed 3313.64 samples/sec   Loss 0.7176   LearningRate 0.0002   Epoch: 19   Global Step: 238570   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:26:33,538-Speed 3341.01 samples/sec   Loss 0.7060   LearningRate 0.0002   Epoch: 19   Global Step: 238580   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:26:36,669-Speed 3271.32 samples/sec   Loss 0.6786   LearningRate 0.0002   Epoch: 19   Global Step: 238590   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:26:39,792-Speed 3280.36 samples/sec   Loss 0.7301   LearningRate 0.0002   Epoch: 19   Global Step: 238600   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:26:42,914-Speed 3280.07 samples/sec   Loss 0.7000   LearningRate 0.0002   Epoch: 19   Global Step: 238610   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:26:46,022-Speed 3295.63 samples/sec   Loss 0.7440   LearningRate 0.0002   Epoch: 19   Global Step: 238620   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:26:49,110-Speed 3318.02 samples/sec   Loss 0.7434   LearningRate 0.0002   Epoch: 19   Global Step: 238630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:26:52,213-Speed 3301.05 samples/sec   Loss 0.7110   LearningRate 0.0002   Epoch: 19   Global Step: 238640   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:26:55,357-Speed 3258.10 samples/sec   Loss 0.7338   LearningRate 0.0002   Epoch: 19   Global Step: 238650   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:26:58,424-Speed 3340.11 samples/sec   Loss 0.7365   LearningRate 0.0002   Epoch: 19   Global Step: 238660   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:27:01,516-Speed 3312.50 samples/sec   Loss 0.7325   LearningRate 0.0002   Epoch: 19   Global Step: 238670   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:27:04,662-Speed 3256.16 samples/sec   Loss 0.7878   LearningRate 0.0002   Epoch: 19   Global Step: 238680   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:27:07,744-Speed 3323.02 samples/sec   Loss 0.7432   LearningRate 0.0002   Epoch: 19   Global Step: 238690   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:27:10,805-Speed 3348.16 samples/sec   Loss 0.7597   LearningRate 0.0002   Epoch: 19   Global Step: 238700   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:27:13,989-Speed 3217.18 samples/sec   Loss 0.7368   LearningRate 0.0002   Epoch: 19   Global Step: 238710   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:27:17,125-Speed 3266.24 samples/sec   Loss 0.7187   LearningRate 0.0002   Epoch: 19   Global Step: 238720   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:27:20,207-Speed 3322.92 samples/sec   Loss 0.7196   LearningRate 0.0002   Epoch: 19   Global Step: 238730   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:27:23,282-Speed 3331.20 samples/sec   Loss 0.7475   LearningRate 0.0002   Epoch: 19   Global Step: 238740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:27:26,437-Speed 3246.47 samples/sec   Loss 0.7151   LearningRate 0.0002   Epoch: 19   Global Step: 238750   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:27:29,501-Speed 3343.53 samples/sec   Loss 0.7215   LearningRate 0.0002   Epoch: 19   Global Step: 238760   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:27:32,661-Speed 3241.64 samples/sec   Loss 0.7303   LearningRate 0.0002   Epoch: 19   Global Step: 238770   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:27:35,820-Speed 3242.30 samples/sec   Loss 0.7440   LearningRate 0.0002   Epoch: 19   Global Step: 238780   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:27:38,945-Speed 3277.50 samples/sec   Loss 0.7187   LearningRate 0.0002   Epoch: 19   Global Step: 238790   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:27:42,091-Speed 3256.77 samples/sec   Loss 0.7291   LearningRate 0.0001   Epoch: 19   Global Step: 238800   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:27:45,214-Speed 3279.46 samples/sec   Loss 0.7406   LearningRate 0.0001   Epoch: 19   Global Step: 238810   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:27:48,274-Speed 3346.79 samples/sec   Loss 0.7572   LearningRate 0.0001   Epoch: 19   Global Step: 238820   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:27:51,341-Speed 3340.28 samples/sec   Loss 0.7174   LearningRate 0.0001   Epoch: 19   Global Step: 238830   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:27:54,474-Speed 3269.47 samples/sec   Loss 0.7255   LearningRate 0.0001   Epoch: 19   Global Step: 238840   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:27:57,557-Speed 3322.50 samples/sec   Loss 0.7265   LearningRate 0.0001   Epoch: 19   Global Step: 238850   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:28:00,669-Speed 3291.15 samples/sec   Loss 0.7099   LearningRate 0.0001   Epoch: 19   Global Step: 238860   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:28:03,793-Speed 3279.21 samples/sec   Loss 0.7319   LearningRate 0.0001   Epoch: 19   Global Step: 238870   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:28:07,045-Speed 3150.16 samples/sec   Loss 0.6923   LearningRate 0.0001   Epoch: 19   Global Step: 238880   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:28:10,115-Speed 3336.39 samples/sec   Loss 0.7150   LearningRate 0.0001   Epoch: 19   Global Step: 238890   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:28:13,262-Speed 3255.06 samples/sec   Loss 0.7149   LearningRate 0.0001   Epoch: 19   Global Step: 238900   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:28:16,406-Speed 3258.22 samples/sec   Loss 0.7230   LearningRate 0.0001   Epoch: 19   Global Step: 238910   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:28:19,480-Speed 3332.09 samples/sec   Loss 0.7316   LearningRate 0.0001   Epoch: 19   Global Step: 238920   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:28:22,527-Speed 3361.61 samples/sec   Loss 0.7216   LearningRate 0.0001   Epoch: 19   Global Step: 238930   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:28:25,621-Speed 3310.94 samples/sec   Loss 0.7311   LearningRate 0.0001   Epoch: 19   Global Step: 238940   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:28:28,694-Speed 3333.70 samples/sec   Loss 0.6990   LearningRate 0.0001   Epoch: 19   Global Step: 238950   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:28:31,801-Speed 3296.50 samples/sec   Loss 0.7229   LearningRate 0.0001   Epoch: 19   Global Step: 238960   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:28:34,918-Speed 3286.64 samples/sec   Loss 0.7330   LearningRate 0.0001   Epoch: 19   Global Step: 238970   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:28:38,010-Speed 3313.10 samples/sec   Loss 0.6976   LearningRate 0.0001   Epoch: 19   Global Step: 238980   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:28:41,197-Speed 3213.47 samples/sec   Loss 0.7090   LearningRate 0.0001   Epoch: 19   Global Step: 238990   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:28:44,294-Speed 3307.16 samples/sec   Loss 0.7145   LearningRate 0.0001   Epoch: 19   Global Step: 239000   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:28:47,394-Speed 3304.93 samples/sec   Loss 0.7504   LearningRate 0.0001   Epoch: 19   Global Step: 239010   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:28:50,480-Speed 3318.51 samples/sec   Loss 0.7146   LearningRate 0.0001   Epoch: 19   Global Step: 239020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:28:53,607-Speed 3276.17 samples/sec   Loss 0.7204   LearningRate 0.0001   Epoch: 19   Global Step: 239030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:28:56,671-Speed 3343.39 samples/sec   Loss 0.7630   LearningRate 0.0001   Epoch: 19   Global Step: 239040   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:28:59,777-Speed 3297.95 samples/sec   Loss 0.7449   LearningRate 0.0001   Epoch: 19   Global Step: 239050   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:29:02,984-Speed 3194.04 samples/sec   Loss 0.7490   LearningRate 0.0001   Epoch: 19   Global Step: 239060   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:29:06,115-Speed 3271.54 samples/sec   Loss 0.7346   LearningRate 0.0001   Epoch: 19   Global Step: 239070   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:29:09,239-Speed 3279.03 samples/sec   Loss 0.7411   LearningRate 0.0001   Epoch: 19   Global Step: 239080   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:29:12,375-Speed 3266.09 samples/sec   Loss 0.7512   LearningRate 0.0001   Epoch: 19   Global Step: 239090   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:29:15,546-Speed 3229.89 samples/sec   Loss 0.7171   LearningRate 0.0001   Epoch: 19   Global Step: 239100   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:29:18,662-Speed 3287.38 samples/sec   Loss 0.7004   LearningRate 0.0001   Epoch: 19   Global Step: 239110   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:29:21,733-Speed 3335.26 samples/sec   Loss 0.7318   LearningRate 0.0001   Epoch: 19   Global Step: 239120   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:29:24,797-Speed 3343.61 samples/sec   Loss 0.6965   LearningRate 0.0001   Epoch: 19   Global Step: 239130   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:29:27,939-Speed 3260.16 samples/sec   Loss 0.7347   LearningRate 0.0001   Epoch: 19   Global Step: 239140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:29:31,071-Speed 3270.18 samples/sec   Loss 0.7195   LearningRate 0.0001   Epoch: 19   Global Step: 239150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:29:34,147-Speed 3330.14 samples/sec   Loss 0.7254   LearningRate 0.0001   Epoch: 19   Global Step: 239160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:29:37,231-Speed 3322.10 samples/sec   Loss 0.7196   LearningRate 0.0001   Epoch: 19   Global Step: 239170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:29:40,326-Speed 3309.24 samples/sec   Loss 0.7326   LearningRate 0.0001   Epoch: 19   Global Step: 239180   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:29:43,526-Speed 3201.26 samples/sec   Loss 0.7473   LearningRate 0.0001   Epoch: 19   Global Step: 239190   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:29:46,651-Speed 3276.88 samples/sec   Loss 0.7240   LearningRate 0.0001   Epoch: 19   Global Step: 239200   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:29:49,733-Speed 3324.52 samples/sec   Loss 0.7638   LearningRate 0.0001   Epoch: 19   Global Step: 239210   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:29:52,843-Speed 3293.27 samples/sec   Loss 0.7177   LearningRate 0.0001   Epoch: 19   Global Step: 239220   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:29:55,938-Speed 3309.14 samples/sec   Loss 0.7093   LearningRate 0.0001   Epoch: 19   Global Step: 239230   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:29:59,012-Speed 3332.85 samples/sec   Loss 0.7222   LearningRate 0.0001   Epoch: 19   Global Step: 239240   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:30:02,165-Speed 3248.73 samples/sec   Loss 0.7356   LearningRate 0.0001   Epoch: 19   Global Step: 239250   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:30:05,257-Speed 3313.35 samples/sec   Loss 0.7397   LearningRate 0.0001   Epoch: 19   Global Step: 239260   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:30:08,376-Speed 3283.03 samples/sec   Loss 0.6859   LearningRate 0.0001   Epoch: 19   Global Step: 239270   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:30:11,533-Speed 3244.71 samples/sec   Loss 0.7249   LearningRate 0.0001   Epoch: 19   Global Step: 239280   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:30:14,663-Speed 3273.18 samples/sec   Loss 0.7228   LearningRate 0.0001   Epoch: 19   Global Step: 239290   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:30:17,775-Speed 3291.79 samples/sec   Loss 0.7268   LearningRate 0.0001   Epoch: 19   Global Step: 239300   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:30:20,842-Speed 3339.26 samples/sec   Loss 0.7515   LearningRate 0.0001   Epoch: 19   Global Step: 239310   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:30:23,912-Speed 3336.52 samples/sec   Loss 0.7677   LearningRate 0.0001   Epoch: 19   Global Step: 239320   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:30:27,075-Speed 3238.48 samples/sec   Loss 0.7561   LearningRate 0.0001   Epoch: 19   Global Step: 239330   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:30:30,201-Speed 3276.15 samples/sec   Loss 0.7523   LearningRate 0.0001   Epoch: 19   Global Step: 239340   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:30:33,337-Speed 3266.72 samples/sec   Loss 0.7540   LearningRate 0.0001   Epoch: 19   Global Step: 239350   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:30:36,437-Speed 3304.51 samples/sec   Loss 0.7353   LearningRate 0.0001   Epoch: 19   Global Step: 239360   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:30:39,538-Speed 3302.56 samples/sec   Loss 0.7204   LearningRate 0.0001   Epoch: 19   Global Step: 239370   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:30:42,640-Speed 3302.51 samples/sec   Loss 0.7349   LearningRate 0.0001   Epoch: 19   Global Step: 239380   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:30:45,691-Speed 3357.44 samples/sec   Loss 0.7060   LearningRate 0.0001   Epoch: 19   Global Step: 239390   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:30:48,845-Speed 3247.78 samples/sec   Loss 0.7260   LearningRate 0.0001   Epoch: 19   Global Step: 239400   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:30:51,911-Speed 3341.16 samples/sec   Loss 0.7128   LearningRate 0.0001   Epoch: 19   Global Step: 239410   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:30:55,020-Speed 3295.05 samples/sec   Loss 0.6878   LearningRate 0.0001   Epoch: 19   Global Step: 239420   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:30:58,085-Speed 3341.88 samples/sec   Loss 0.7192   LearningRate 0.0001   Epoch: 19   Global Step: 239430   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:31:01,208-Speed 3279.61 samples/sec   Loss 0.7298   LearningRate 0.0001   Epoch: 19   Global Step: 239440   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:31:04,336-Speed 3275.20 samples/sec   Loss 0.7070   LearningRate 0.0001   Epoch: 19   Global Step: 239450   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:31:07,506-Speed 3231.43 samples/sec   Loss 0.7498   LearningRate 0.0001   Epoch: 19   Global Step: 239460   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:31:10,617-Speed 3292.22 samples/sec   Loss 0.7778   LearningRate 0.0001   Epoch: 19   Global Step: 239470   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:31:13,694-Speed 3330.19 samples/sec   Loss 0.7095   LearningRate 0.0001   Epoch: 19   Global Step: 239480   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:31:16,822-Speed 3275.28 samples/sec   Loss 0.7428   LearningRate 0.0001   Epoch: 19   Global Step: 239490   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:31:19,935-Speed 3289.83 samples/sec   Loss 0.7383   LearningRate 0.0001   Epoch: 19   Global Step: 239500   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:31:23,050-Speed 3288.46 samples/sec   Loss 0.7216   LearningRate 0.0001   Epoch: 19   Global Step: 239510   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:31:26,201-Speed 3250.72 samples/sec   Loss 0.7161   LearningRate 0.0001   Epoch: 19   Global Step: 239520   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:31:29,391-Speed 3211.77 samples/sec   Loss 0.7212   LearningRate 0.0001   Epoch: 19   Global Step: 239530   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:31:32,452-Speed 3345.88 samples/sec   Loss 0.6894   LearningRate 0.0001   Epoch: 19   Global Step: 239540   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:31:35,566-Speed 3290.02 samples/sec   Loss 0.7621   LearningRate 0.0001   Epoch: 19   Global Step: 239550   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:31:38,688-Speed 3280.85 samples/sec   Loss 0.7090   LearningRate 0.0001   Epoch: 19   Global Step: 239560   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:31:41,768-Speed 3324.71 samples/sec   Loss 0.7268   LearningRate 0.0001   Epoch: 19   Global Step: 239570   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:31:44,872-Speed 3300.27 samples/sec   Loss 0.7248   LearningRate 0.0001   Epoch: 19   Global Step: 239580   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:31:47,996-Speed 3279.15 samples/sec   Loss 0.6953   LearningRate 0.0001   Epoch: 19   Global Step: 239590   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:31:51,142-Speed 3256.11 samples/sec   Loss 0.7592   LearningRate 0.0001   Epoch: 19   Global Step: 239600   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:31:54,227-Speed 3319.93 samples/sec   Loss 0.7103   LearningRate 0.0001   Epoch: 19   Global Step: 239610   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:31:57,315-Speed 3317.26 samples/sec   Loss 0.7581   LearningRate 0.0001   Epoch: 19   Global Step: 239620   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:32:00,432-Speed 3286.42 samples/sec   Loss 0.7470   LearningRate 0.0001   Epoch: 19   Global Step: 239630   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:32:03,584-Speed 3250.04 samples/sec   Loss 0.7443   LearningRate 0.0001   Epoch: 19   Global Step: 239640   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:32:06,700-Speed 3287.37 samples/sec   Loss 0.7257   LearningRate 0.0001   Epoch: 19   Global Step: 239650   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:32:09,770-Speed 3336.51 samples/sec   Loss 0.7524   LearningRate 0.0001   Epoch: 19   Global Step: 239660   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:32:12,912-Speed 3259.88 samples/sec   Loss 0.7122   LearningRate 0.0001   Epoch: 19   Global Step: 239670   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:32:16,011-Speed 3304.82 samples/sec   Loss 0.7096   LearningRate 0.0001   Epoch: 19   Global Step: 239680   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:32:19,081-Speed 3337.04 samples/sec   Loss 0.7142   LearningRate 0.0001   Epoch: 19   Global Step: 239690   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:32:22,148-Speed 3339.90 samples/sec   Loss 0.7375   LearningRate 0.0001   Epoch: 19   Global Step: 239700   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:32:25,323-Speed 3225.19 samples/sec   Loss 0.7238   LearningRate 0.0001   Epoch: 19   Global Step: 239710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:32:28,474-Speed 3251.05 samples/sec   Loss 0.7331   LearningRate 0.0001   Epoch: 19   Global Step: 239720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:32:31,551-Speed 3328.54 samples/sec   Loss 0.7134   LearningRate 0.0001   Epoch: 19   Global Step: 239730   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:32:34,698-Speed 3256.05 samples/sec   Loss 0.7415   LearningRate 0.0001   Epoch: 19   Global Step: 239740   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:32:37,806-Speed 3295.27 samples/sec   Loss 0.7195   LearningRate 0.0001   Epoch: 19   Global Step: 239750   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:32:40,881-Speed 3331.85 samples/sec   Loss 0.7232   LearningRate 0.0001   Epoch: 19   Global Step: 239760   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:32:43,969-Speed 3318.17 samples/sec   Loss 0.7326   LearningRate 0.0001   Epoch: 19   Global Step: 239770   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:32:47,080-Speed 3293.50 samples/sec   Loss 0.7309   LearningRate 0.0001   Epoch: 19   Global Step: 239780   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:32:50,185-Speed 3298.83 samples/sec   Loss 0.7209   LearningRate 0.0001   Epoch: 19   Global Step: 239790   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:32:53,287-Speed 3301.37 samples/sec   Loss 0.7509   LearningRate 0.0001   Epoch: 19   Global Step: 239800   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:32:56,427-Speed 3262.81 samples/sec   Loss 0.7118   LearningRate 0.0001   Epoch: 19   Global Step: 239810   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:32:59,525-Speed 3306.33 samples/sec   Loss 0.7362   LearningRate 0.0001   Epoch: 19   Global Step: 239820   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:33:02,628-Speed 3300.44 samples/sec   Loss 0.7327   LearningRate 0.0001   Epoch: 19   Global Step: 239830   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:33:05,768-Speed 3262.41 samples/sec   Loss 0.7340   LearningRate 0.0001   Epoch: 19   Global Step: 239840   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:33:08,880-Speed 3291.35 samples/sec   Loss 0.7375   LearningRate 0.0001   Epoch: 19   Global Step: 239850   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:33:12,042-Speed 3239.44 samples/sec   Loss 0.7274   LearningRate 0.0001   Epoch: 19   Global Step: 239860   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:33:15,248-Speed 3194.97 samples/sec   Loss 0.7499   LearningRate 0.0001   Epoch: 19   Global Step: 239870   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:33:18,410-Speed 3240.15 samples/sec   Loss 0.7073   LearningRate 0.0001   Epoch: 19   Global Step: 239880   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:33:21,515-Speed 3299.02 samples/sec   Loss 0.7019   LearningRate 0.0001   Epoch: 19   Global Step: 239890   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:33:24,606-Speed 3313.27 samples/sec   Loss 0.7274   LearningRate 0.0001   Epoch: 19   Global Step: 239900   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:33:27,675-Speed 3337.34 samples/sec   Loss 0.7068   LearningRate 0.0001   Epoch: 19   Global Step: 239910   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:33:30,852-Speed 3224.67 samples/sec   Loss 0.7292   LearningRate 0.0001   Epoch: 19   Global Step: 239920   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:33:33,947-Speed 3309.61 samples/sec   Loss 0.7444   LearningRate 0.0001   Epoch: 19   Global Step: 239930   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:33:37,036-Speed 3315.35 samples/sec   Loss 0.7189   LearningRate 0.0001   Epoch: 19   Global Step: 239940   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:33:40,148-Speed 3291.75 samples/sec   Loss 0.7088   LearningRate 0.0001   Epoch: 19   Global Step: 239950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:33:43,251-Speed 3300.52 samples/sec   Loss 0.7600   LearningRate 0.0001   Epoch: 19   Global Step: 239960   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:33:46,353-Speed 3302.87 samples/sec   Loss 0.6913   LearningRate 0.0001   Epoch: 19   Global Step: 239970   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:33:49,435-Speed 3323.01 samples/sec   Loss 0.7659   LearningRate 0.0001   Epoch: 19   Global Step: 239980   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:33:52,598-Speed 3238.94 samples/sec   Loss 0.7104   LearningRate 0.0001   Epoch: 19   Global Step: 239990   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:33:55,756-Speed 3242.96 samples/sec   Loss 0.7408   LearningRate 0.0001   Epoch: 19   Global Step: 240000   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:33:58,933-Speed 3224.55 samples/sec   Loss 0.7509   LearningRate 0.0001   Epoch: 19   Global Step: 240010   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:34:02,094-Speed 3240.24 samples/sec   Loss 0.7277   LearningRate 0.0001   Epoch: 19   Global Step: 240020   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:34:05,302-Speed 3192.96 samples/sec   Loss 0.7283   LearningRate 0.0001   Epoch: 19   Global Step: 240030   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:34:08,442-Speed 3262.63 samples/sec   Loss 0.7201   LearningRate 0.0001   Epoch: 19   Global Step: 240040   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:34:11,509-Speed 3339.75 samples/sec   Loss 0.7337   LearningRate 0.0001   Epoch: 19   Global Step: 240050   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:34:14,652-Speed 3259.65 samples/sec   Loss 0.7445   LearningRate 0.0001   Epoch: 19   Global Step: 240060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:34:17,774-Speed 3280.26 samples/sec   Loss 0.6928   LearningRate 0.0001   Epoch: 19   Global Step: 240070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:34:20,895-Speed 3281.89 samples/sec   Loss 0.7246   LearningRate 0.0001   Epoch: 19   Global Step: 240080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:34:24,035-Speed 3262.76 samples/sec   Loss 0.7146   LearningRate 0.0001   Epoch: 19   Global Step: 240090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:34:27,136-Speed 3302.96 samples/sec   Loss 0.7291   LearningRate 0.0001   Epoch: 19   Global Step: 240100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:34:30,256-Speed 3283.53 samples/sec   Loss 0.7304   LearningRate 0.0001   Epoch: 19   Global Step: 240110   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:34:33,327-Speed 3334.65 samples/sec   Loss 0.7122   LearningRate 0.0001   Epoch: 19   Global Step: 240120   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:34:36,451-Speed 3279.67 samples/sec   Loss 0.7321   LearningRate 0.0001   Epoch: 19   Global Step: 240130   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:34:39,602-Speed 3250.38 samples/sec   Loss 0.7190   LearningRate 0.0001   Epoch: 19   Global Step: 240140   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:34:42,833-Speed 3170.79 samples/sec   Loss 0.7430   LearningRate 0.0001   Epoch: 19   Global Step: 240150   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:34:45,901-Speed 3338.56 samples/sec   Loss 0.7228   LearningRate 0.0001   Epoch: 19   Global Step: 240160   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:34:48,986-Speed 3319.74 samples/sec   Loss 0.6825   LearningRate 0.0001   Epoch: 19   Global Step: 240170   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:34:52,054-Speed 3339.40 samples/sec   Loss 0.6968   LearningRate 0.0001   Epoch: 19   Global Step: 240180   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:34:55,168-Speed 3288.99 samples/sec   Loss 0.7302   LearningRate 0.0001   Epoch: 19   Global Step: 240190   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:34:58,233-Speed 3341.95 samples/sec   Loss 0.6924   LearningRate 0.0001   Epoch: 19   Global Step: 240200   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:35:01,283-Speed 3358.74 samples/sec   Loss 0.7611   LearningRate 0.0001   Epoch: 19   Global Step: 240210   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:35:04,386-Speed 3301.48 samples/sec   Loss 0.7336   LearningRate 0.0001   Epoch: 19   Global Step: 240220   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:35:07,485-Speed 3305.33 samples/sec   Loss 0.7187   LearningRate 0.0001   Epoch: 19   Global Step: 240230   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:35:10,566-Speed 3324.65 samples/sec   Loss 0.7117   LearningRate 0.0001   Epoch: 19   Global Step: 240240   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:35:13,680-Speed 3289.27 samples/sec   Loss 0.7019   LearningRate 0.0001   Epoch: 19   Global Step: 240250   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:35:16,803-Speed 3279.74 samples/sec   Loss 0.7260   LearningRate 0.0001   Epoch: 19   Global Step: 240260   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:35:19,937-Speed 3268.82 samples/sec   Loss 0.7343   LearningRate 0.0001   Epoch: 19   Global Step: 240270   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:35:23,044-Speed 3296.00 samples/sec   Loss 0.7594   LearningRate 0.0001   Epoch: 19   Global Step: 240280   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:35:26,166-Speed 3281.55 samples/sec   Loss 0.7137   LearningRate 0.0001   Epoch: 19   Global Step: 240290   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:35:29,218-Speed 3355.73 samples/sec   Loss 0.7772   LearningRate 0.0001   Epoch: 19   Global Step: 240300   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:35:32,317-Speed 3305.61 samples/sec   Loss 0.7527   LearningRate 0.0001   Epoch: 19   Global Step: 240310   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:35:35,425-Speed 3296.24 samples/sec   Loss 0.7328   LearningRate 0.0001   Epoch: 19   Global Step: 240320   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:35:38,483-Speed 3349.91 samples/sec   Loss 0.7386   LearningRate 0.0001   Epoch: 19   Global Step: 240330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:35:41,650-Speed 3233.63 samples/sec   Loss 0.7089   LearningRate 0.0001   Epoch: 19   Global Step: 240340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:35:44,679-Speed 3381.51 samples/sec   Loss 0.7134   LearningRate 0.0001   Epoch: 19   Global Step: 240350   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:35:47,852-Speed 3228.42 samples/sec   Loss 0.7528   LearningRate 0.0001   Epoch: 19   Global Step: 240360   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:35:50,937-Speed 3321.05 samples/sec   Loss 0.7162   LearningRate 0.0001   Epoch: 19   Global Step: 240370   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:35:54,095-Speed 3243.40 samples/sec   Loss 0.7020   LearningRate 0.0001   Epoch: 19   Global Step: 240380   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:35:57,172-Speed 3329.49 samples/sec   Loss 0.7338   LearningRate 0.0001   Epoch: 19   Global Step: 240390   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:36:00,339-Speed 3234.42 samples/sec   Loss 0.7442   LearningRate 0.0001   Epoch: 19   Global Step: 240400   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:36:03,446-Speed 3297.28 samples/sec   Loss 0.7070   LearningRate 0.0001   Epoch: 19   Global Step: 240410   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:36:06,548-Speed 3302.14 samples/sec   Loss 0.7382   LearningRate 0.0001   Epoch: 19   Global Step: 240420   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:36:09,623-Speed 3330.40 samples/sec   Loss 0.7230   LearningRate 0.0001   Epoch: 19   Global Step: 240430   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:36:12,764-Speed 3262.00 samples/sec   Loss 0.7142   LearningRate 0.0001   Epoch: 19   Global Step: 240440   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:36:16,460-Speed 2771.23 samples/sec   Loss 0.7352   LearningRate 0.0001   Epoch: 19   Global Step: 240450   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:36:19,546-Speed 3319.07 samples/sec   Loss 0.7274   LearningRate 0.0001   Epoch: 19   Global Step: 240460   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:36:22,618-Speed 3334.00 samples/sec   Loss 0.7316   LearningRate 0.0001   Epoch: 19   Global Step: 240470   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:36:25,674-Speed 3351.62 samples/sec   Loss 0.6914   LearningRate 0.0001   Epoch: 19   Global Step: 240480   Fp16 Grad Scale: 4096   Required: 1 hours
Training: 2022-04-27 22:36:28,838-Speed 3237.71 samples/sec   Loss 0.7126   LearningRate 0.0001   Epoch: 19   Global Step: 240490   Fp16 Grad Scale: 4096   Required: 1 hours
Training: 2022-04-27 22:36:31,916-Speed 3328.79 samples/sec   Loss 0.7325   LearningRate 0.0001   Epoch: 19   Global Step: 240500   Fp16 Grad Scale: 4096   Required: 1 hours
Training: 2022-04-27 22:36:35,082-Speed 3234.24 samples/sec   Loss 0.7242   LearningRate 0.0001   Epoch: 19   Global Step: 240510   Fp16 Grad Scale: 4096   Required: 1 hours
Training: 2022-04-27 22:36:38,201-Speed 3285.42 samples/sec   Loss 0.7029   LearningRate 0.0001   Epoch: 19   Global Step: 240520   Fp16 Grad Scale: 4096   Required: 1 hours
Training: 2022-04-27 22:36:41,307-Speed 3297.81 samples/sec   Loss 0.7112   LearningRate 0.0001   Epoch: 19   Global Step: 240530   Fp16 Grad Scale: 4096   Required: 1 hours
Training: 2022-04-27 22:36:44,384-Speed 3328.95 samples/sec   Loss 0.7279   LearningRate 0.0001   Epoch: 19   Global Step: 240540   Fp16 Grad Scale: 4096   Required: 1 hours
Training: 2022-04-27 22:36:47,550-Speed 3235.40 samples/sec   Loss 0.7517   LearningRate 0.0001   Epoch: 19   Global Step: 240550   Fp16 Grad Scale: 4096   Required: 1 hours
Training: 2022-04-27 22:36:50,613-Speed 3343.57 samples/sec   Loss 0.7392   LearningRate 0.0001   Epoch: 19   Global Step: 240560   Fp16 Grad Scale: 4096   Required: 1 hours
Training: 2022-04-27 22:36:53,760-Speed 3254.97 samples/sec   Loss 0.7830   LearningRate 0.0001   Epoch: 19   Global Step: 240570   Fp16 Grad Scale: 4096   Required: 1 hours
Training: 2022-04-27 22:36:56,810-Speed 3358.37 samples/sec   Loss 0.6955   LearningRate 0.0001   Epoch: 19   Global Step: 240580   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:37:00,019-Speed 3191.85 samples/sec   Loss 0.7015   LearningRate 0.0001   Epoch: 19   Global Step: 240590   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:37:03,190-Speed 3230.44 samples/sec   Loss 0.7186   LearningRate 0.0001   Epoch: 19   Global Step: 240600   Fp16 Grad Scale: 4096   Required: 1 hours
Training: 2022-04-27 22:37:06,373-Speed 3217.69 samples/sec   Loss 0.7481   LearningRate 0.0001   Epoch: 19   Global Step: 240610   Fp16 Grad Scale: 4096   Required: 1 hours
Training: 2022-04-27 22:37:09,451-Speed 3327.93 samples/sec   Loss 0.6835   LearningRate 0.0001   Epoch: 19   Global Step: 240620   Fp16 Grad Scale: 4096   Required: 1 hours
Training: 2022-04-27 22:37:12,555-Speed 3300.23 samples/sec   Loss 0.7032   LearningRate 0.0001   Epoch: 19   Global Step: 240630   Fp16 Grad Scale: 4096   Required: 1 hours
Training: 2022-04-27 22:37:15,673-Speed 3284.64 samples/sec   Loss 0.7031   LearningRate 0.0001   Epoch: 19   Global Step: 240640   Fp16 Grad Scale: 4096   Required: 1 hours
Training: 2022-04-27 22:37:18,793-Speed 3284.27 samples/sec   Loss 0.7242   LearningRate 0.0001   Epoch: 19   Global Step: 240650   Fp16 Grad Scale: 4096   Required: 1 hours
Training: 2022-04-27 22:37:21,869-Speed 3330.12 samples/sec   Loss 0.7219   LearningRate 0.0001   Epoch: 19   Global Step: 240660   Fp16 Grad Scale: 4096   Required: 1 hours
Training: 2022-04-27 22:37:24,996-Speed 3275.41 samples/sec   Loss 0.6977   LearningRate 0.0001   Epoch: 19   Global Step: 240670   Fp16 Grad Scale: 4096   Required: 1 hours
Training: 2022-04-27 22:37:28,107-Speed 3292.53 samples/sec   Loss 0.7373   LearningRate 0.0001   Epoch: 19   Global Step: 240680   Fp16 Grad Scale: 4096   Required: 1 hours
Training: 2022-04-27 22:37:31,188-Speed 3324.40 samples/sec   Loss 0.7414   LearningRate 0.0001   Epoch: 19   Global Step: 240690   Fp16 Grad Scale: 4096   Required: 1 hours
Training: 2022-04-27 22:37:34,286-Speed 3305.87 samples/sec   Loss 0.7682   LearningRate 0.0001   Epoch: 19   Global Step: 240700   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:37:37,508-Speed 3179.67 samples/sec   Loss 0.7572   LearningRate 0.0001   Epoch: 19   Global Step: 240710   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:37:40,617-Speed 3294.30 samples/sec   Loss 0.7306   LearningRate 0.0001   Epoch: 19   Global Step: 240720   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:37:43,784-Speed 3234.38 samples/sec   Loss 0.7385   LearningRate 0.0001   Epoch: 19   Global Step: 240730   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:37:47,452-Speed 2792.71 samples/sec   Loss 0.7418   LearningRate 0.0001   Epoch: 19   Global Step: 240740   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:37:50,621-Speed 3232.17 samples/sec   Loss 0.7372   LearningRate 0.0001   Epoch: 19   Global Step: 240750   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:37:53,735-Speed 3289.97 samples/sec   Loss 0.7153   LearningRate 0.0001   Epoch: 19   Global Step: 240760   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:37:58,054-Speed 2371.69 samples/sec   Loss 0.6978   LearningRate 0.0001   Epoch: 19   Global Step: 240770   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:38:01,137-Speed 3321.82 samples/sec   Loss 0.7233   LearningRate 0.0001   Epoch: 19   Global Step: 240780   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:38:04,771-Speed 2818.81 samples/sec   Loss 0.7287   LearningRate 0.0001   Epoch: 19   Global Step: 240790   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:38:07,873-Speed 3302.81 samples/sec   Loss 0.7134   LearningRate 0.0001   Epoch: 19   Global Step: 240800   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:38:10,954-Speed 3323.71 samples/sec   Loss 0.7362   LearningRate 0.0001   Epoch: 19   Global Step: 240810   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:38:14,073-Speed 3284.28 samples/sec   Loss 0.7212   LearningRate 0.0001   Epoch: 19   Global Step: 240820   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:38:17,243-Speed 3231.71 samples/sec   Loss 0.7149   LearningRate 0.0001   Epoch: 19   Global Step: 240830   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:38:20,367-Speed 3279.01 samples/sec   Loss 0.7609   LearningRate 0.0001   Epoch: 19   Global Step: 240840   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:38:23,488-Speed 3281.86 samples/sec   Loss 0.7282   LearningRate 0.0001   Epoch: 19   Global Step: 240850   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:38:26,649-Speed 3240.77 samples/sec   Loss 0.7225   LearningRate 0.0001   Epoch: 19   Global Step: 240860   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:38:29,705-Speed 3352.33 samples/sec   Loss 0.7335   LearningRate 0.0001   Epoch: 19   Global Step: 240870   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:38:32,788-Speed 3321.86 samples/sec   Loss 0.7502   LearningRate 0.0001   Epoch: 19   Global Step: 240880   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:38:35,878-Speed 3315.06 samples/sec   Loss 0.7383   LearningRate 0.0001   Epoch: 19   Global Step: 240890   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:38:38,939-Speed 3347.09 samples/sec   Loss 0.7299   LearningRate 0.0001   Epoch: 19   Global Step: 240900   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:38:42,000-Speed 3346.14 samples/sec   Loss 0.7150   LearningRate 0.0001   Epoch: 19   Global Step: 240910   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:38:45,070-Speed 3336.86 samples/sec   Loss 0.7267   LearningRate 0.0001   Epoch: 19   Global Step: 240920   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:38:48,150-Speed 3326.19 samples/sec   Loss 0.7197   LearningRate 0.0001   Epoch: 19   Global Step: 240930   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:38:51,199-Speed 3359.17 samples/sec   Loss 0.7485   LearningRate 0.0001   Epoch: 19   Global Step: 240940   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:38:54,332-Speed 3269.78 samples/sec   Loss 0.7486   LearningRate 0.0001   Epoch: 19   Global Step: 240950   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:38:57,385-Speed 3354.58 samples/sec   Loss 0.7156   LearningRate 0.0001   Epoch: 19   Global Step: 240960   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:39:00,467-Speed 3323.61 samples/sec   Loss 0.7044   LearningRate 0.0001   Epoch: 19   Global Step: 240970   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:39:03,586-Speed 3284.51 samples/sec   Loss 0.7326   LearningRate 0.0001   Epoch: 19   Global Step: 240980   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:39:06,651-Speed 3342.27 samples/sec   Loss 0.6988   LearningRate 0.0001   Epoch: 19   Global Step: 240990   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:39:09,737-Speed 3319.02 samples/sec   Loss 0.7436   LearningRate 0.0001   Epoch: 19   Global Step: 241000   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:39:12,816-Speed 3326.88 samples/sec   Loss 0.7501   LearningRate 0.0001   Epoch: 19   Global Step: 241010   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:39:15,949-Speed 3269.55 samples/sec   Loss 0.7161   LearningRate 0.0001   Epoch: 19   Global Step: 241020   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:39:19,022-Speed 3332.53 samples/sec   Loss 0.7211   LearningRate 0.0001   Epoch: 19   Global Step: 241030   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:39:22,135-Speed 3290.83 samples/sec   Loss 0.7393   LearningRate 0.0001   Epoch: 19   Global Step: 241040   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:39:25,313-Speed 3223.46 samples/sec   Loss 0.7103   LearningRate 0.0001   Epoch: 19   Global Step: 241050   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:39:28,366-Speed 3354.91 samples/sec   Loss 0.7590   LearningRate 0.0001   Epoch: 19   Global Step: 241060   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:39:31,451-Speed 3320.41 samples/sec   Loss 0.7141   LearningRate 0.0001   Epoch: 19   Global Step: 241070   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:39:34,549-Speed 3305.86 samples/sec   Loss 0.7462   LearningRate 0.0001   Epoch: 19   Global Step: 241080   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:39:37,637-Speed 3317.60 samples/sec   Loss 0.7577   LearningRate 0.0001   Epoch: 19   Global Step: 241090   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:39:40,772-Speed 3267.17 samples/sec   Loss 0.7715   LearningRate 0.0001   Epoch: 19   Global Step: 241100   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:39:43,865-Speed 3311.61 samples/sec   Loss 0.7073   LearningRate 0.0001   Epoch: 19   Global Step: 241110   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:39:46,938-Speed 3333.32 samples/sec   Loss 0.7577   LearningRate 0.0001   Epoch: 19   Global Step: 241120   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:39:50,087-Speed 3253.14 samples/sec   Loss 0.7352   LearningRate 0.0001   Epoch: 19   Global Step: 241130   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:39:53,154-Speed 3339.05 samples/sec   Loss 0.7080   LearningRate 0.0001   Epoch: 19   Global Step: 241140   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:39:56,291-Speed 3266.08 samples/sec   Loss 0.7364   LearningRate 0.0001   Epoch: 19   Global Step: 241150   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:39:59,366-Speed 3331.01 samples/sec   Loss 0.7600   LearningRate 0.0001   Epoch: 19   Global Step: 241160   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:40:02,486-Speed 3283.18 samples/sec   Loss 0.6863   LearningRate 0.0001   Epoch: 19   Global Step: 241170   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:40:05,609-Speed 3279.79 samples/sec   Loss 0.7557   LearningRate 0.0001   Epoch: 19   Global Step: 241180   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:40:08,658-Speed 3359.09 samples/sec   Loss 0.7458   LearningRate 0.0001   Epoch: 19   Global Step: 241190   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:40:11,733-Speed 3331.81 samples/sec   Loss 0.7124   LearningRate 0.0001   Epoch: 19   Global Step: 241200   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:40:14,811-Speed 3327.84 samples/sec   Loss 0.7133   LearningRate 0.0001   Epoch: 19   Global Step: 241210   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:40:17,891-Speed 3326.09 samples/sec   Loss 0.7639   LearningRate 0.0001   Epoch: 19   Global Step: 241220   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:40:20,960-Speed 3337.38 samples/sec   Loss 0.7715   LearningRate 0.0001   Epoch: 19   Global Step: 241230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:40:24,023-Speed 3344.58 samples/sec   Loss 0.7057   LearningRate 0.0001   Epoch: 19   Global Step: 241240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:40:27,108-Speed 3319.92 samples/sec   Loss 0.7212   LearningRate 0.0001   Epoch: 19   Global Step: 241250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:40:30,225-Speed 3287.01 samples/sec   Loss 0.7204   LearningRate 0.0001   Epoch: 19   Global Step: 241260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:40:33,299-Speed 3331.50 samples/sec   Loss 0.7099   LearningRate 0.0001   Epoch: 19   Global Step: 241270   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:40:36,379-Speed 3326.08 samples/sec   Loss 0.6799   LearningRate 0.0001   Epoch: 19   Global Step: 241280   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:40:39,545-Speed 3235.10 samples/sec   Loss 0.7015   LearningRate 0.0001   Epoch: 19   Global Step: 241290   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:40:42,667-Speed 3281.20 samples/sec   Loss 0.7257   LearningRate 0.0001   Epoch: 19   Global Step: 241300   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:40:45,794-Speed 3276.12 samples/sec   Loss 0.7469   LearningRate 0.0001   Epoch: 19   Global Step: 241310   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:40:48,898-Speed 3299.75 samples/sec   Loss 0.6800   LearningRate 0.0001   Epoch: 19   Global Step: 241320   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:40:52,063-Speed 3236.55 samples/sec   Loss 0.6962   LearningRate 0.0001   Epoch: 19   Global Step: 241330   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:40:55,225-Speed 3239.14 samples/sec   Loss 0.7801   LearningRate 0.0001   Epoch: 19   Global Step: 241340   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:40:58,306-Speed 3324.91 samples/sec   Loss 0.7155   LearningRate 0.0001   Epoch: 19   Global Step: 241350   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:41:01,394-Speed 3317.32 samples/sec   Loss 0.7064   LearningRate 0.0001   Epoch: 19   Global Step: 241360   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:41:04,509-Speed 3288.60 samples/sec   Loss 0.6807   LearningRate 0.0001   Epoch: 19   Global Step: 241370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:41:07,584-Speed 3331.45 samples/sec   Loss 0.7461   LearningRate 0.0001   Epoch: 19   Global Step: 241380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-04-27 22:41:10,685-Speed 3302.35 samples/sec   Loss 0.7229   LearningRate 0.0001   Epoch: 19   Global Step: 241390   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:41:13,828-Speed 3259.14 samples/sec   Loss 0.7474   LearningRate 0.0001   Epoch: 19   Global Step: 241400   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:41:17,005-Speed 3224.79 samples/sec   Loss 0.7183   LearningRate 0.0001   Epoch: 19   Global Step: 241410   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:41:20,167-Speed 3238.90 samples/sec   Loss 0.7180   LearningRate 0.0001   Epoch: 19   Global Step: 241420   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:41:23,253-Speed 3319.07 samples/sec   Loss 0.7203   LearningRate 0.0001   Epoch: 19   Global Step: 241430   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:41:26,374-Speed 3282.31 samples/sec   Loss 0.7073   LearningRate 0.0001   Epoch: 19   Global Step: 241440   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:41:29,535-Speed 3241.04 samples/sec   Loss 0.7343   LearningRate 0.0001   Epoch: 19   Global Step: 241450   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:41:32,619-Speed 3320.57 samples/sec   Loss 0.7449   LearningRate 0.0001   Epoch: 19   Global Step: 241460   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:41:35,710-Speed 3314.23 samples/sec   Loss 0.7344   LearningRate 0.0001   Epoch: 19   Global Step: 241470   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:41:38,856-Speed 3256.43 samples/sec   Loss 0.7010   LearningRate 0.0001   Epoch: 19   Global Step: 241480   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:41:41,971-Speed 3287.98 samples/sec   Loss 0.7478   LearningRate 0.0001   Epoch: 19   Global Step: 241490   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:41:45,114-Speed 3259.11 samples/sec   Loss 0.7025   LearningRate 0.0001   Epoch: 19   Global Step: 241500   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:41:48,203-Speed 3315.48 samples/sec   Loss 0.7192   LearningRate 0.0001   Epoch: 19   Global Step: 241510   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:41:51,345-Speed 3259.76 samples/sec   Loss 0.7116   LearningRate 0.0001   Epoch: 19   Global Step: 241520   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:41:54,467-Speed 3281.07 samples/sec   Loss 0.7765   LearningRate 0.0001   Epoch: 19   Global Step: 241530   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:41:57,534-Speed 3340.00 samples/sec   Loss 0.7195   LearningRate 0.0001   Epoch: 19   Global Step: 241540   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:42:00,657-Speed 3280.18 samples/sec   Loss 0.7308   LearningRate 0.0001   Epoch: 19   Global Step: 241550   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:42:03,746-Speed 3315.80 samples/sec   Loss 0.7271   LearningRate 0.0001   Epoch: 19   Global Step: 241560   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:42:06,924-Speed 3223.68 samples/sec   Loss 0.7062   LearningRate 0.0001   Epoch: 19   Global Step: 241570   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:42:10,021-Speed 3307.35 samples/sec   Loss 0.7289   LearningRate 0.0001   Epoch: 19   Global Step: 241580   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:42:13,142-Speed 3282.08 samples/sec   Loss 0.6989   LearningRate 0.0001   Epoch: 19   Global Step: 241590   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:42:16,267-Speed 3278.13 samples/sec   Loss 0.6727   LearningRate 0.0001   Epoch: 19   Global Step: 241600   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:42:19,432-Speed 3236.25 samples/sec   Loss 0.7349   LearningRate 0.0001   Epoch: 19   Global Step: 241610   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:42:22,548-Speed 3286.69 samples/sec   Loss 0.7309   LearningRate 0.0001   Epoch: 19   Global Step: 241620   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:42:25,739-Speed 3210.55 samples/sec   Loss 0.7164   LearningRate 0.0001   Epoch: 19   Global Step: 241630   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:42:28,896-Speed 3244.84 samples/sec   Loss 0.7179   LearningRate 0.0001   Epoch: 19   Global Step: 241640   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:42:32,053-Speed 3244.00 samples/sec   Loss 0.7385   LearningRate 0.0001   Epoch: 19   Global Step: 241650   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:42:35,119-Speed 3341.13 samples/sec   Loss 0.7407   LearningRate 0.0001   Epoch: 19   Global Step: 241660   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:42:38,274-Speed 3246.76 samples/sec   Loss 0.7451   LearningRate 0.0001   Epoch: 19   Global Step: 241670   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:42:41,403-Speed 3273.47 samples/sec   Loss 0.7426   LearningRate 0.0001   Epoch: 19   Global Step: 241680   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:42:44,492-Speed 3315.96 samples/sec   Loss 0.7348   LearningRate 0.0001   Epoch: 19   Global Step: 241690   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:42:47,668-Speed 3225.02 samples/sec   Loss 0.6959   LearningRate 0.0001   Epoch: 19   Global Step: 241700   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:42:50,813-Speed 3257.31 samples/sec   Loss 0.7292   LearningRate 0.0001   Epoch: 19   Global Step: 241710   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:42:53,878-Speed 3342.17 samples/sec   Loss 0.7075   LearningRate 0.0001   Epoch: 19   Global Step: 241720   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:42:56,947-Speed 3337.10 samples/sec   Loss 0.7356   LearningRate 0.0001   Epoch: 19   Global Step: 241730   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:43:00,022-Speed 3331.57 samples/sec   Loss 0.7303   LearningRate 0.0001   Epoch: 19   Global Step: 241740   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:43:03,110-Speed 3316.72 samples/sec   Loss 0.6988   LearningRate 0.0001   Epoch: 19   Global Step: 241750   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:43:06,237-Speed 3276.30 samples/sec   Loss 0.7436   LearningRate 0.0001   Epoch: 19   Global Step: 241760   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:43:09,330-Speed 3311.12 samples/sec   Loss 0.7143   LearningRate 0.0001   Epoch: 19   Global Step: 241770   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:43:12,439-Speed 3294.83 samples/sec   Loss 0.7015   LearningRate 0.0001   Epoch: 19   Global Step: 241780   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:43:15,571-Speed 3270.99 samples/sec   Loss 0.7431   LearningRate 0.0001   Epoch: 19   Global Step: 241790   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:43:18,704-Speed 3269.35 samples/sec   Loss 0.7321   LearningRate 0.0001   Epoch: 19   Global Step: 241800   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:43:21,750-Speed 3361.81 samples/sec   Loss 0.7720   LearningRate 0.0001   Epoch: 19   Global Step: 241810   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:43:25,461-Speed 2760.47 samples/sec   Loss 0.7270   LearningRate 0.0001   Epoch: 19   Global Step: 241820   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:43:28,576-Speed 3287.98 samples/sec   Loss 0.7235   LearningRate 0.0001   Epoch: 19   Global Step: 241830   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:43:31,736-Speed 3241.86 samples/sec   Loss 0.7306   LearningRate 0.0001   Epoch: 19   Global Step: 241840   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:43:34,799-Speed 3344.35 samples/sec   Loss 0.6995   LearningRate 0.0001   Epoch: 19   Global Step: 241850   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:43:37,897-Speed 3305.80 samples/sec   Loss 0.7326   LearningRate 0.0001   Epoch: 19   Global Step: 241860   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:43:41,066-Speed 3232.86 samples/sec   Loss 0.7552   LearningRate 0.0001   Epoch: 19   Global Step: 241870   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:43:44,130-Speed 3342.87 samples/sec   Loss 0.7084   LearningRate 0.0001   Epoch: 19   Global Step: 241880   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:43:47,291-Speed 3240.98 samples/sec   Loss 0.6943   LearningRate 0.0001   Epoch: 19   Global Step: 241890   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:43:50,397-Speed 3297.83 samples/sec   Loss 0.7200   LearningRate 0.0001   Epoch: 19   Global Step: 241900   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:43:53,500-Speed 3301.45 samples/sec   Loss 0.7293   LearningRate 0.0001   Epoch: 19   Global Step: 241910   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:43:56,630-Speed 3272.08 samples/sec   Loss 0.7582   LearningRate 0.0001   Epoch: 19   Global Step: 241920   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:43:59,834-Speed 3196.20 samples/sec   Loss 0.7641   LearningRate 0.0001   Epoch: 19   Global Step: 241930   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:44:02,934-Speed 3304.22 samples/sec   Loss 0.7331   LearningRate 0.0001   Epoch: 19   Global Step: 241940   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:44:06,009-Speed 3331.30 samples/sec   Loss 0.7311   LearningRate 0.0001   Epoch: 19   Global Step: 241950   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:44:09,080-Speed 3336.09 samples/sec   Loss 0.7203   LearningRate 0.0001   Epoch: 19   Global Step: 241960   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:44:12,166-Speed 3318.73 samples/sec   Loss 0.7300   LearningRate 0.0001   Epoch: 19   Global Step: 241970   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:44:15,286-Speed 3283.54 samples/sec   Loss 0.7450   LearningRate 0.0001   Epoch: 19   Global Step: 241980   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:44:18,362-Speed 3330.32 samples/sec   Loss 0.7309   LearningRate 0.0001   Epoch: 19   Global Step: 241990   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:44:21,412-Speed 3357.42 samples/sec   Loss 0.7325   LearningRate 0.0001   Epoch: 19   Global Step: 242000   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:44:24,555-Speed 3259.10 samples/sec   Loss 0.7809   LearningRate 0.0001   Epoch: 19   Global Step: 242010   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:44:27,696-Speed 3261.02 samples/sec   Loss 0.7016   LearningRate 0.0001   Epoch: 19   Global Step: 242020   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:44:30,791-Speed 3309.92 samples/sec   Loss 0.7229   LearningRate 0.0001   Epoch: 19   Global Step: 242030   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:44:33,868-Speed 3329.39 samples/sec   Loss 0.7245   LearningRate 0.0001   Epoch: 19   Global Step: 242040   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:44:36,938-Speed 3336.95 samples/sec   Loss 0.7523   LearningRate 0.0001   Epoch: 19   Global Step: 242050   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:44:40,047-Speed 3294.80 samples/sec   Loss 0.7313   LearningRate 0.0001   Epoch: 19   Global Step: 242060   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:44:43,229-Speed 3218.80 samples/sec   Loss 0.7352   LearningRate 0.0001   Epoch: 19   Global Step: 242070   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:44:46,304-Speed 3331.58 samples/sec   Loss 0.7336   LearningRate 0.0001   Epoch: 19   Global Step: 242080   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:44:49,441-Speed 3265.29 samples/sec   Loss 0.7825   LearningRate 0.0001   Epoch: 19   Global Step: 242090   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:44:52,590-Speed 3253.18 samples/sec   Loss 0.7480   LearningRate 0.0001   Epoch: 19   Global Step: 242100   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:44:55,663-Speed 3333.86 samples/sec   Loss 0.7159   LearningRate 0.0001   Epoch: 19   Global Step: 242110   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:44:58,761-Speed 3306.10 samples/sec   Loss 0.7236   LearningRate 0.0001   Epoch: 19   Global Step: 242120   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:45:01,896-Speed 3267.06 samples/sec   Loss 0.7353   LearningRate 0.0001   Epoch: 19   Global Step: 242130   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:45:05,007-Speed 3292.98 samples/sec   Loss 0.7110   LearningRate 0.0001   Epoch: 19   Global Step: 242140   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:45:08,089-Speed 3323.67 samples/sec   Loss 0.7276   LearningRate 0.0001   Epoch: 19   Global Step: 242150   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:45:11,176-Speed 3317.82 samples/sec   Loss 0.7267   LearningRate 0.0001   Epoch: 19   Global Step: 242160   Fp16 Grad Scale: 4096   Required: 1 hours
Training: 2022-04-27 22:45:14,274-Speed 3306.89 samples/sec   Loss 0.7293   LearningRate 0.0001   Epoch: 19   Global Step: 242170   Fp16 Grad Scale: 4096   Required: 1 hours
Training: 2022-04-27 22:45:17,365-Speed 3313.33 samples/sec   Loss 0.7483   LearningRate 0.0001   Epoch: 19   Global Step: 242180   Fp16 Grad Scale: 4096   Required: 1 hours
Training: 2022-04-27 22:45:20,463-Speed 3306.33 samples/sec   Loss 0.7476   LearningRate 0.0001   Epoch: 19   Global Step: 242190   Fp16 Grad Scale: 4096   Required: 1 hours
Training: 2022-04-27 22:45:23,650-Speed 3214.49 samples/sec   Loss 0.7188   LearningRate 0.0001   Epoch: 19   Global Step: 242200   Fp16 Grad Scale: 4096   Required: 1 hours
Training: 2022-04-27 22:45:26,775-Speed 3278.61 samples/sec   Loss 0.7122   LearningRate 0.0001   Epoch: 19   Global Step: 242210   Fp16 Grad Scale: 4096   Required: 1 hours
Training: 2022-04-27 22:45:29,912-Speed 3264.16 samples/sec   Loss 0.7240   LearningRate 0.0001   Epoch: 19   Global Step: 242220   Fp16 Grad Scale: 4096   Required: 1 hours
Training: 2022-04-27 22:45:33,008-Speed 3309.38 samples/sec   Loss 0.7035   LearningRate 0.0001   Epoch: 19   Global Step: 242230   Fp16 Grad Scale: 4096   Required: 1 hours
Training: 2022-04-27 22:45:36,098-Speed 3314.36 samples/sec   Loss 0.7049   LearningRate 0.0001   Epoch: 19   Global Step: 242240   Fp16 Grad Scale: 4096   Required: 1 hours
Training: 2022-04-27 22:45:39,233-Speed 3267.10 samples/sec   Loss 0.7058   LearningRate 0.0001   Epoch: 19   Global Step: 242250   Fp16 Grad Scale: 4096   Required: 1 hours
Training: 2022-04-27 22:45:42,353-Speed 3283.62 samples/sec   Loss 0.7028   LearningRate 0.0001   Epoch: 19   Global Step: 242260   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:45:45,449-Speed 3309.11 samples/sec   Loss 0.6885   LearningRate 0.0001   Epoch: 19   Global Step: 242270   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:45:48,616-Speed 3234.12 samples/sec   Loss 0.7544   LearningRate 0.0001   Epoch: 19   Global Step: 242280   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:45:51,732-Speed 3286.54 samples/sec   Loss 0.7168   LearningRate 0.0001   Epoch: 19   Global Step: 242290   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:45:54,823-Speed 3314.76 samples/sec   Loss 0.7159   LearningRate 0.0001   Epoch: 19   Global Step: 242300   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:45:57,915-Speed 3312.59 samples/sec   Loss 0.7308   LearningRate 0.0001   Epoch: 19   Global Step: 242310   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:46:01,043-Speed 3274.48 samples/sec   Loss 0.7046   LearningRate 0.0001   Epoch: 19   Global Step: 242320   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:46:04,131-Speed 3316.87 samples/sec   Loss 0.7142   LearningRate 0.0001   Epoch: 19   Global Step: 242330   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:46:07,249-Speed 3285.92 samples/sec   Loss 0.7626   LearningRate 0.0001   Epoch: 19   Global Step: 242340   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:46:10,315-Speed 3340.17 samples/sec   Loss 0.7854   LearningRate 0.0001   Epoch: 19   Global Step: 242350   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:46:13,471-Speed 3246.17 samples/sec   Loss 0.7218   LearningRate 0.0001   Epoch: 19   Global Step: 242360   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:46:16,563-Speed 3312.07 samples/sec   Loss 0.7187   LearningRate 0.0001   Epoch: 19   Global Step: 242370   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:46:19,672-Speed 3294.90 samples/sec   Loss 0.6855   LearningRate 0.0001   Epoch: 19   Global Step: 242380   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:46:22,743-Speed 3336.03 samples/sec   Loss 0.7317   LearningRate 0.0001   Epoch: 19   Global Step: 242390   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:46:25,826-Speed 3322.22 samples/sec   Loss 0.7056   LearningRate 0.0001   Epoch: 19   Global Step: 242400   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:46:28,928-Speed 3301.90 samples/sec   Loss 0.7372   LearningRate 0.0001   Epoch: 19   Global Step: 242410   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:46:32,047-Speed 3284.53 samples/sec   Loss 0.6978   LearningRate 0.0001   Epoch: 19   Global Step: 242420   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:46:35,150-Speed 3301.03 samples/sec   Loss 0.7633   LearningRate 0.0001   Epoch: 19   Global Step: 242430   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:46:38,310-Speed 3241.90 samples/sec   Loss 0.6804   LearningRate 0.0001   Epoch: 19   Global Step: 242440   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:46:41,408-Speed 3308.02 samples/sec   Loss 0.7274   LearningRate 0.0001   Epoch: 19   Global Step: 242450   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:46:44,493-Speed 3320.76 samples/sec   Loss 0.7016   LearningRate 0.0001   Epoch: 19   Global Step: 242460   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:46:47,669-Speed 3225.20 samples/sec   Loss 0.7318   LearningRate 0.0001   Epoch: 19   Global Step: 242470   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:46:50,839-Speed 3231.22 samples/sec   Loss 0.7540   LearningRate 0.0001   Epoch: 19   Global Step: 242480   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:46:53,925-Speed 3318.78 samples/sec   Loss 0.7183   LearningRate 0.0001   Epoch: 19   Global Step: 242490   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:46:57,007-Speed 3323.98 samples/sec   Loss 0.6977   LearningRate 0.0001   Epoch: 19   Global Step: 242500   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:47:00,128-Speed 3281.55 samples/sec   Loss 0.7215   LearningRate 0.0001   Epoch: 19   Global Step: 242510   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:47:03,281-Speed 3248.38 samples/sec   Loss 0.7175   LearningRate 0.0001   Epoch: 19   Global Step: 242520   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:47:06,387-Speed 3297.95 samples/sec   Loss 0.7010   LearningRate 0.0001   Epoch: 19   Global Step: 242530   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:47:09,470-Speed 3323.29 samples/sec   Loss 0.7322   LearningRate 0.0001   Epoch: 19   Global Step: 242540   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-04-27 22:47:12,643-Speed 3227.74 samples/sec   Loss 0.7321   LearningRate 0.0001   Epoch: 19   Global Step: 242550   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:47:15,770-Speed 3276.42 samples/sec   Loss 0.6899   LearningRate 0.0001   Epoch: 19   Global Step: 242560   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:47:18,906-Speed 3266.24 samples/sec   Loss 0.7029   LearningRate 0.0001   Epoch: 19   Global Step: 242570   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:47:21,988-Speed 3322.52 samples/sec   Loss 0.6983   LearningRate 0.0001   Epoch: 19   Global Step: 242580   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:47:25,155-Speed 3234.98 samples/sec   Loss 0.7231   LearningRate 0.0001   Epoch: 19   Global Step: 242590   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:47:28,232-Speed 3328.73 samples/sec   Loss 0.7054   LearningRate 0.0001   Epoch: 19   Global Step: 242600   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:47:31,426-Speed 3207.71 samples/sec   Loss 0.7212   LearningRate 0.0001   Epoch: 19   Global Step: 242610   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:47:34,495-Speed 3336.62 samples/sec   Loss 0.7119   LearningRate 0.0001   Epoch: 19   Global Step: 242620   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:47:37,577-Speed 3323.45 samples/sec   Loss 0.6950   LearningRate 0.0001   Epoch: 19   Global Step: 242630   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:47:40,651-Speed 3332.92 samples/sec   Loss 0.7396   LearningRate 0.0001   Epoch: 19   Global Step: 242640   Fp16 Grad Scale: 8192   Required: 1 hours
Training: 2022-04-27 22:47:43,777-Speed 3276.23 samples/sec   Loss 0.7146   LearningRate 0.0001   Epoch: 19   Global Step: 242650   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:47:46,870-Speed 3311.94 samples/sec   Loss 0.7487   LearningRate 0.0001   Epoch: 19   Global Step: 242660   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:47:50,006-Speed 3267.08 samples/sec   Loss 0.7250   LearningRate 0.0001   Epoch: 19   Global Step: 242670   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:47:53,173-Speed 3233.76 samples/sec   Loss 0.7285   LearningRate 0.0001   Epoch: 19   Global Step: 242680   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:47:56,260-Speed 3318.19 samples/sec   Loss 0.7248   LearningRate 0.0001   Epoch: 19   Global Step: 242690   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:47:59,353-Speed 3311.94 samples/sec   Loss 0.7019   LearningRate 0.0001   Epoch: 19   Global Step: 242700   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:48:02,548-Speed 3206.52 samples/sec   Loss 0.7063   LearningRate 0.0001   Epoch: 19   Global Step: 242710   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:48:05,657-Speed 3294.04 samples/sec   Loss 0.7390   LearningRate 0.0001   Epoch: 19   Global Step: 242720   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:48:08,755-Speed 3306.96 samples/sec   Loss 0.7392   LearningRate 0.0001   Epoch: 19   Global Step: 242730   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:48:11,859-Speed 3299.67 samples/sec   Loss 0.7157   LearningRate 0.0001   Epoch: 19   Global Step: 242740   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:48:15,006-Speed 3254.59 samples/sec   Loss 0.7131   LearningRate 0.0001   Epoch: 19   Global Step: 242750   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 22:48:18,184-Speed 3223.79 samples/sec   Loss 0.7131   LearningRate 0.0001   Epoch: 19   Global Step: 242760   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 22:48:21,267-Speed 3322.02 samples/sec   Loss 0.6836   LearningRate 0.0001   Epoch: 19   Global Step: 242770   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 22:48:24,468-Speed 3200.08 samples/sec   Loss 0.7375   LearningRate 0.0001   Epoch: 19   Global Step: 242780   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:48:27,599-Speed 3272.10 samples/sec   Loss 0.7213   LearningRate 0.0001   Epoch: 19   Global Step: 242790   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:48:30,786-Speed 3214.25 samples/sec   Loss 0.7335   LearningRate 0.0001   Epoch: 19   Global Step: 242800   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:48:33,860-Speed 3331.86 samples/sec   Loss 0.7387   LearningRate 0.0001   Epoch: 19   Global Step: 242810   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:48:36,996-Speed 3266.89 samples/sec   Loss 0.7081   LearningRate 0.0001   Epoch: 19   Global Step: 242820   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:48:40,097-Speed 3303.21 samples/sec   Loss 0.7422   LearningRate 0.0001   Epoch: 19   Global Step: 242830   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:48:43,165-Speed 3338.20 samples/sec   Loss 0.7092   LearningRate 0.0001   Epoch: 19   Global Step: 242840   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:48:46,230-Speed 3342.55 samples/sec   Loss 0.7183   LearningRate 0.0001   Epoch: 19   Global Step: 242850   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:48:49,455-Speed 3176.19 samples/sec   Loss 0.6728   LearningRate 0.0001   Epoch: 19   Global Step: 242860   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:48:52,515-Speed 3347.17 samples/sec   Loss 0.7159   LearningRate 0.0000   Epoch: 19   Global Step: 242870   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:48:55,604-Speed 3316.49 samples/sec   Loss 0.7282   LearningRate 0.0000   Epoch: 19   Global Step: 242880   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:48:58,676-Speed 3334.06 samples/sec   Loss 0.6858   LearningRate 0.0000   Epoch: 19   Global Step: 242890   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:49:01,796-Speed 3283.10 samples/sec   Loss 0.7367   LearningRate 0.0000   Epoch: 19   Global Step: 242900   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:49:04,880-Speed 3321.96 samples/sec   Loss 0.7169   LearningRate 0.0000   Epoch: 19   Global Step: 242910   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:49:08,054-Speed 3226.99 samples/sec   Loss 0.7170   LearningRate 0.0000   Epoch: 19   Global Step: 242920   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:49:11,152-Speed 3306.27 samples/sec   Loss 0.7278   LearningRate 0.0000   Epoch: 19   Global Step: 242930   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:49:14,331-Speed 3222.18 samples/sec   Loss 0.7465   LearningRate 0.0000   Epoch: 19   Global Step: 242940   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:49:17,566-Speed 3166.74 samples/sec   Loss 0.6885   LearningRate 0.0000   Epoch: 19   Global Step: 242950   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:49:20,710-Speed 3258.10 samples/sec   Loss 0.6814   LearningRate 0.0000   Epoch: 19   Global Step: 242960   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:49:23,792-Speed 3322.66 samples/sec   Loss 0.7548   LearningRate 0.0000   Epoch: 19   Global Step: 242970   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:49:26,888-Speed 3309.25 samples/sec   Loss 0.7531   LearningRate 0.0000   Epoch: 19   Global Step: 242980   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 22:49:29,986-Speed 3305.96 samples/sec   Loss 0.7177   LearningRate 0.0000   Epoch: 19   Global Step: 242990   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:49:33,040-Speed 3354.57 samples/sec   Loss 0.7683   LearningRate 0.0000   Epoch: 19   Global Step: 243000   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:49:36,097-Speed 3350.56 samples/sec   Loss 0.6877   LearningRate 0.0000   Epoch: 19   Global Step: 243010   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:49:39,187-Speed 3314.68 samples/sec   Loss 0.7210   LearningRate 0.0000   Epoch: 19   Global Step: 243020   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:49:42,271-Speed 3321.62 samples/sec   Loss 0.7137   LearningRate 0.0000   Epoch: 19   Global Step: 243030   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:49:45,377-Speed 3297.93 samples/sec   Loss 0.7148   LearningRate 0.0000   Epoch: 19   Global Step: 243040   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:49:48,479-Speed 3301.79 samples/sec   Loss 0.7031   LearningRate 0.0000   Epoch: 19   Global Step: 243050   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:49:51,557-Speed 3328.23 samples/sec   Loss 0.7267   LearningRate 0.0000   Epoch: 19   Global Step: 243060   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:49:54,734-Speed 3224.23 samples/sec   Loss 0.7279   LearningRate 0.0000   Epoch: 19   Global Step: 243070   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:49:57,808-Speed 3331.37 samples/sec   Loss 0.7468   LearningRate 0.0000   Epoch: 19   Global Step: 243080   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:50:00,926-Speed 3285.49 samples/sec   Loss 0.7264   LearningRate 0.0000   Epoch: 19   Global Step: 243090   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 22:50:04,077-Speed 3251.35 samples/sec   Loss 0.6963   LearningRate 0.0000   Epoch: 19   Global Step: 243100   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:50:07,151-Speed 3332.02 samples/sec   Loss 0.7224   LearningRate 0.0000   Epoch: 19   Global Step: 243110   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:50:10,217-Speed 3341.52 samples/sec   Loss 0.7279   LearningRate 0.0000   Epoch: 19   Global Step: 243120   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:50:13,311-Speed 3310.24 samples/sec   Loss 0.7329   LearningRate 0.0000   Epoch: 19   Global Step: 243130   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:50:16,363-Speed 3356.02 samples/sec   Loss 0.7375   LearningRate 0.0000   Epoch: 19   Global Step: 243140   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:50:19,467-Speed 3299.67 samples/sec   Loss 0.7266   LearningRate 0.0000   Epoch: 19   Global Step: 243150   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:50:22,530-Speed 3344.46 samples/sec   Loss 0.7266   LearningRate 0.0000   Epoch: 19   Global Step: 243160   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:50:25,599-Speed 3338.25 samples/sec   Loss 0.7363   LearningRate 0.0000   Epoch: 19   Global Step: 243170   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:50:28,693-Speed 3310.55 samples/sec   Loss 0.6879   LearningRate 0.0000   Epoch: 19   Global Step: 243180   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:50:31,751-Speed 3349.74 samples/sec   Loss 0.7243   LearningRate 0.0000   Epoch: 19   Global Step: 243190   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:50:34,840-Speed 3316.09 samples/sec   Loss 0.7215   LearningRate 0.0000   Epoch: 19   Global Step: 243200   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 22:50:37,896-Speed 3351.49 samples/sec   Loss 0.7379   LearningRate 0.0000   Epoch: 19   Global Step: 243210   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 22:50:41,041-Speed 3257.17 samples/sec   Loss 0.7387   LearningRate 0.0000   Epoch: 19   Global Step: 243220   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:50:44,130-Speed 3315.12 samples/sec   Loss 0.7355   LearningRate 0.0000   Epoch: 19   Global Step: 243230   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:50:47,220-Speed 3316.28 samples/sec   Loss 0.6997   LearningRate 0.0000   Epoch: 19   Global Step: 243240   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:50:50,315-Speed 3308.75 samples/sec   Loss 0.7427   LearningRate 0.0000   Epoch: 19   Global Step: 243250   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:50:53,482-Speed 3234.18 samples/sec   Loss 0.7369   LearningRate 0.0000   Epoch: 19   Global Step: 243260   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:50:56,571-Speed 3316.42 samples/sec   Loss 0.7034   LearningRate 0.0000   Epoch: 19   Global Step: 243270   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:50:59,685-Speed 3289.36 samples/sec   Loss 0.7593   LearningRate 0.0000   Epoch: 19   Global Step: 243280   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:51:02,854-Speed 3231.86 samples/sec   Loss 0.7120   LearningRate 0.0000   Epoch: 19   Global Step: 243290   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:51:05,923-Speed 3338.13 samples/sec   Loss 0.7186   LearningRate 0.0000   Epoch: 19   Global Step: 243300   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:51:09,029-Speed 3297.93 samples/sec   Loss 0.7381   LearningRate 0.0000   Epoch: 19   Global Step: 243310   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:51:12,111-Speed 3322.91 samples/sec   Loss 0.7675   LearningRate 0.0000   Epoch: 19   Global Step: 243320   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 22:51:15,247-Speed 3267.03 samples/sec   Loss 0.7361   LearningRate 0.0000   Epoch: 19   Global Step: 243330   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 22:51:18,428-Speed 3219.62 samples/sec   Loss 0.7173   LearningRate 0.0000   Epoch: 19   Global Step: 243340   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 22:51:21,499-Speed 3335.95 samples/sec   Loss 0.7264   LearningRate 0.0000   Epoch: 19   Global Step: 243350   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 22:51:24,579-Speed 3325.84 samples/sec   Loss 0.7083   LearningRate 0.0000   Epoch: 19   Global Step: 243360   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:51:27,777-Speed 3202.62 samples/sec   Loss 0.7388   LearningRate 0.0000   Epoch: 19   Global Step: 243370   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:51:30,890-Speed 3290.67 samples/sec   Loss 0.7244   LearningRate 0.0000   Epoch: 19   Global Step: 243380   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:51:33,945-Speed 3353.18 samples/sec   Loss 0.7170   LearningRate 0.0000   Epoch: 19   Global Step: 243390   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:51:37,044-Speed 3305.00 samples/sec   Loss 0.7262   LearningRate 0.0000   Epoch: 19   Global Step: 243400   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:51:40,101-Speed 3350.41 samples/sec   Loss 0.7184   LearningRate 0.0000   Epoch: 19   Global Step: 243410   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:51:43,259-Speed 3244.12 samples/sec   Loss 0.7263   LearningRate 0.0000   Epoch: 19   Global Step: 243420   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:51:46,366-Speed 3296.26 samples/sec   Loss 0.7243   LearningRate 0.0000   Epoch: 19   Global Step: 243430   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:51:49,467-Speed 3303.97 samples/sec   Loss 0.7497   LearningRate 0.0000   Epoch: 19   Global Step: 243440   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:51:52,623-Speed 3245.64 samples/sec   Loss 0.7168   LearningRate 0.0000   Epoch: 19   Global Step: 243450   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:51:55,736-Speed 3290.40 samples/sec   Loss 0.7272   LearningRate 0.0000   Epoch: 19   Global Step: 243460   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 22:51:58,773-Speed 3372.83 samples/sec   Loss 0.7377   LearningRate 0.0000   Epoch: 19   Global Step: 243470   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:52:01,950-Speed 3223.99 samples/sec   Loss 0.7516   LearningRate 0.0000   Epoch: 19   Global Step: 243480   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:52:05,094-Speed 3258.65 samples/sec   Loss 0.7024   LearningRate 0.0000   Epoch: 19   Global Step: 243490   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:52:08,213-Speed 3283.43 samples/sec   Loss 0.7121   LearningRate 0.0000   Epoch: 19   Global Step: 243500   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:52:11,310-Speed 3307.62 samples/sec   Loss 0.6973   LearningRate 0.0000   Epoch: 19   Global Step: 243510   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:52:14,390-Speed 3326.01 samples/sec   Loss 0.7584   LearningRate 0.0000   Epoch: 19   Global Step: 243520   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:52:17,494-Speed 3300.15 samples/sec   Loss 0.7277   LearningRate 0.0000   Epoch: 19   Global Step: 243530   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:52:20,584-Speed 3314.79 samples/sec   Loss 0.7249   LearningRate 0.0000   Epoch: 19   Global Step: 243540   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:52:23,698-Speed 3289.37 samples/sec   Loss 0.6783   LearningRate 0.0000   Epoch: 19   Global Step: 243550   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:52:26,854-Speed 3246.18 samples/sec   Loss 0.7123   LearningRate 0.0000   Epoch: 19   Global Step: 243560   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:52:29,996-Speed 3259.52 samples/sec   Loss 0.7252   LearningRate 0.0000   Epoch: 19   Global Step: 243570   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 22:52:33,111-Speed 3288.79 samples/sec   Loss 0.7259   LearningRate 0.0000   Epoch: 19   Global Step: 243580   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 22:52:36,208-Speed 3307.05 samples/sec   Loss 0.7016   LearningRate 0.0000   Epoch: 19   Global Step: 243590   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:52:39,382-Speed 3227.34 samples/sec   Loss 0.7380   LearningRate 0.0000   Epoch: 19   Global Step: 243600   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:52:42,546-Speed 3236.98 samples/sec   Loss 0.7110   LearningRate 0.0000   Epoch: 19   Global Step: 243610   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:52:45,652-Speed 3297.80 samples/sec   Loss 0.7004   LearningRate 0.0000   Epoch: 19   Global Step: 243620   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:52:48,751-Speed 3305.77 samples/sec   Loss 0.7275   LearningRate 0.0000   Epoch: 19   Global Step: 243630   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:52:51,867-Speed 3287.20 samples/sec   Loss 0.7268   LearningRate 0.0000   Epoch: 19   Global Step: 243640   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:52:54,954-Speed 3318.54 samples/sec   Loss 0.7176   LearningRate 0.0000   Epoch: 19   Global Step: 243650   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:52:58,025-Speed 3335.82 samples/sec   Loss 0.7154   LearningRate 0.0000   Epoch: 19   Global Step: 243660   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:53:01,122-Speed 3307.42 samples/sec   Loss 0.7523   LearningRate 0.0000   Epoch: 19   Global Step: 243670   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:53:04,264-Speed 3259.00 samples/sec   Loss 0.7278   LearningRate 0.0000   Epoch: 19   Global Step: 243680   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:53:07,327-Speed 3344.93 samples/sec   Loss 0.7113   LearningRate 0.0000   Epoch: 19   Global Step: 243690   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:53:10,436-Speed 3294.22 samples/sec   Loss 0.7580   LearningRate 0.0000   Epoch: 19   Global Step: 243700   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:53:13,642-Speed 3195.32 samples/sec   Loss 0.7388   LearningRate 0.0000   Epoch: 19   Global Step: 243710   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:53:16,805-Speed 3238.35 samples/sec   Loss 0.7373   LearningRate 0.0000   Epoch: 19   Global Step: 243720   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:53:19,877-Speed 3334.41 samples/sec   Loss 0.7274   LearningRate 0.0000   Epoch: 19   Global Step: 243730   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:53:22,983-Speed 3298.11 samples/sec   Loss 0.7494   LearningRate 0.0000   Epoch: 19   Global Step: 243740   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:53:26,187-Speed 3196.41 samples/sec   Loss 0.7301   LearningRate 0.0000   Epoch: 19   Global Step: 243750   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:53:29,306-Speed 3284.35 samples/sec   Loss 0.7434   LearningRate 0.0000   Epoch: 19   Global Step: 243760   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:53:32,414-Speed 3295.64 samples/sec   Loss 0.7304   LearningRate 0.0000   Epoch: 19   Global Step: 243770   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:53:35,565-Speed 3251.36 samples/sec   Loss 0.7134   LearningRate 0.0000   Epoch: 19   Global Step: 243780   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:53:38,685-Speed 3282.66 samples/sec   Loss 0.7390   LearningRate 0.0000   Epoch: 19   Global Step: 243790   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 22:53:41,897-Speed 3189.03 samples/sec   Loss 0.7205   LearningRate 0.0000   Epoch: 19   Global Step: 243800   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 22:53:44,959-Speed 3344.58 samples/sec   Loss 0.6982   LearningRate 0.0000   Epoch: 19   Global Step: 243810   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:53:48,083-Speed 3279.46 samples/sec   Loss 0.7046   LearningRate 0.0000   Epoch: 19   Global Step: 243820   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:53:51,302-Speed 3182.00 samples/sec   Loss 0.6955   LearningRate 0.0000   Epoch: 19   Global Step: 243830   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:53:54,399-Speed 3307.79 samples/sec   Loss 0.6879   LearningRate 0.0000   Epoch: 19   Global Step: 243840   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:53:57,496-Speed 3307.51 samples/sec   Loss 0.7145   LearningRate 0.0000   Epoch: 19   Global Step: 243850   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:54:00,550-Speed 3353.73 samples/sec   Loss 0.7381   LearningRate 0.0000   Epoch: 19   Global Step: 243860   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:54:03,607-Speed 3350.96 samples/sec   Loss 0.7070   LearningRate 0.0000   Epoch: 19   Global Step: 243870   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:54:06,729-Speed 3281.43 samples/sec   Loss 0.7284   LearningRate 0.0000   Epoch: 19   Global Step: 243880   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:54:09,797-Speed 3338.22 samples/sec   Loss 0.7237   LearningRate 0.0000   Epoch: 19   Global Step: 243890   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:54:12,956-Speed 3242.21 samples/sec   Loss 0.7195   LearningRate 0.0000   Epoch: 19   Global Step: 243900   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:54:16,104-Speed 3254.33 samples/sec   Loss 0.7205   LearningRate 0.0000   Epoch: 19   Global Step: 243910   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 22:54:19,152-Speed 3359.78 samples/sec   Loss 0.7261   LearningRate 0.0000   Epoch: 19   Global Step: 243920   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:54:22,274-Speed 3281.79 samples/sec   Loss 0.6814   LearningRate 0.0000   Epoch: 19   Global Step: 243930   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:54:25,422-Speed 3253.45 samples/sec   Loss 0.7098   LearningRate 0.0000   Epoch: 19   Global Step: 243940   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:54:28,571-Speed 3253.24 samples/sec   Loss 0.7594   LearningRate 0.0000   Epoch: 19   Global Step: 243950   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:54:31,626-Speed 3352.06 samples/sec   Loss 0.7074   LearningRate 0.0000   Epoch: 19   Global Step: 243960   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:54:34,719-Speed 3311.73 samples/sec   Loss 0.7180   LearningRate 0.0000   Epoch: 19   Global Step: 243970   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:54:37,771-Speed 3356.91 samples/sec   Loss 0.7332   LearningRate 0.0000   Epoch: 19   Global Step: 243980   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 22:54:40,827-Speed 3351.59 samples/sec   Loss 0.7168   LearningRate 0.0000   Epoch: 19   Global Step: 243990   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 22:54:43,966-Speed 3263.21 samples/sec   Loss 0.7154   LearningRate 0.0000   Epoch: 19   Global Step: 244000   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 22:54:47,062-Speed 3309.01 samples/sec   Loss 0.7160   LearningRate 0.0000   Epoch: 19   Global Step: 244010   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 22:54:50,116-Speed 3353.68 samples/sec   Loss 0.7156   LearningRate 0.0000   Epoch: 19   Global Step: 244020   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 22:54:53,231-Speed 3288.63 samples/sec   Loss 0.7421   LearningRate 0.0000   Epoch: 19   Global Step: 244030   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 22:54:56,333-Speed 3302.56 samples/sec   Loss 0.7305   LearningRate 0.0000   Epoch: 19   Global Step: 244040   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 22:54:59,472-Speed 3262.76 samples/sec   Loss 0.7366   LearningRate 0.0000   Epoch: 19   Global Step: 244050   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 22:55:02,587-Speed 3288.30 samples/sec   Loss 0.7054   LearningRate 0.0000   Epoch: 19   Global Step: 244060   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 22:55:05,720-Speed 3269.26 samples/sec   Loss 0.7353   LearningRate 0.0000   Epoch: 19   Global Step: 244070   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 22:55:08,780-Speed 3347.71 samples/sec   Loss 0.7023   LearningRate 0.0000   Epoch: 19   Global Step: 244080   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:55:11,929-Speed 3253.20 samples/sec   Loss 0.7337   LearningRate 0.0000   Epoch: 19   Global Step: 244090   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:55:15,041-Speed 3290.84 samples/sec   Loss 0.7239   LearningRate 0.0000   Epoch: 19   Global Step: 244100   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:55:18,217-Speed 3225.97 samples/sec   Loss 0.7307   LearningRate 0.0000   Epoch: 19   Global Step: 244110   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:55:21,318-Speed 3302.85 samples/sec   Loss 0.6906   LearningRate 0.0000   Epoch: 19   Global Step: 244120   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:55:24,447-Speed 3272.95 samples/sec   Loss 0.7175   LearningRate 0.0000   Epoch: 19   Global Step: 244130   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:55:27,589-Speed 3260.91 samples/sec   Loss 0.7194   LearningRate 0.0000   Epoch: 19   Global Step: 244140   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:55:30,730-Speed 3261.27 samples/sec   Loss 0.6808   LearningRate 0.0000   Epoch: 19   Global Step: 244150   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:55:33,824-Speed 3310.40 samples/sec   Loss 0.7254   LearningRate 0.0000   Epoch: 19   Global Step: 244160   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:55:36,936-Speed 3291.51 samples/sec   Loss 0.7185   LearningRate 0.0000   Epoch: 19   Global Step: 244170   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:55:40,050-Speed 3288.89 samples/sec   Loss 0.7049   LearningRate 0.0000   Epoch: 19   Global Step: 244180   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 22:55:43,166-Speed 3287.71 samples/sec   Loss 0.7411   LearningRate 0.0000   Epoch: 19   Global Step: 244190   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:55:46,219-Speed 3354.95 samples/sec   Loss 0.7057   LearningRate 0.0000   Epoch: 19   Global Step: 244200   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:55:49,338-Speed 3283.57 samples/sec   Loss 0.7236   LearningRate 0.0000   Epoch: 19   Global Step: 244210   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:55:52,474-Speed 3267.38 samples/sec   Loss 0.7315   LearningRate 0.0000   Epoch: 19   Global Step: 244220   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:55:55,583-Speed 3294.09 samples/sec   Loss 0.7426   LearningRate 0.0000   Epoch: 19   Global Step: 244230   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:55:58,645-Speed 3345.09 samples/sec   Loss 0.7403   LearningRate 0.0000   Epoch: 19   Global Step: 244240   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 22:56:01,832-Speed 3214.33 samples/sec   Loss 0.7111   LearningRate 0.0000   Epoch: 19   Global Step: 244250   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 22:56:04,962-Speed 3272.45 samples/sec   Loss 0.7254   LearningRate 0.0000   Epoch: 19   Global Step: 244260   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 22:56:08,094-Speed 3270.16 samples/sec   Loss 0.7390   LearningRate 0.0000   Epoch: 19   Global Step: 244270   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 22:56:11,215-Speed 3282.14 samples/sec   Loss 0.7091   LearningRate 0.0000   Epoch: 19   Global Step: 244280   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 22:56:14,339-Speed 3279.60 samples/sec   Loss 0.7156   LearningRate 0.0000   Epoch: 19   Global Step: 244290   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 22:56:17,532-Speed 3207.80 samples/sec   Loss 0.7312   LearningRate 0.0000   Epoch: 19   Global Step: 244300   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 22:56:20,616-Speed 3321.39 samples/sec   Loss 0.6985   LearningRate 0.0000   Epoch: 19   Global Step: 244310   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 22:56:23,750-Speed 3268.73 samples/sec   Loss 0.6964   LearningRate 0.0000   Epoch: 19   Global Step: 244320   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 22:56:26,905-Speed 3246.11 samples/sec   Loss 0.7394   LearningRate 0.0000   Epoch: 19   Global Step: 244330   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 22:56:30,046-Speed 3261.34 samples/sec   Loss 0.6837   LearningRate 0.0000   Epoch: 19   Global Step: 244340   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:56:33,174-Speed 3274.65 samples/sec   Loss 0.7124   LearningRate 0.0000   Epoch: 19   Global Step: 244350   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:56:36,331-Speed 3245.32 samples/sec   Loss 0.7539   LearningRate 0.0000   Epoch: 19   Global Step: 244360   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:56:39,424-Speed 3311.35 samples/sec   Loss 0.7119   LearningRate 0.0000   Epoch: 19   Global Step: 244370   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:56:42,512-Speed 3316.54 samples/sec   Loss 0.7483   LearningRate 0.0000   Epoch: 19   Global Step: 244380   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:56:45,621-Speed 3297.24 samples/sec   Loss 0.7475   LearningRate 0.0000   Epoch: 19   Global Step: 244390   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:56:48,742-Speed 3281.54 samples/sec   Loss 0.7334   LearningRate 0.0000   Epoch: 19   Global Step: 244400   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:56:51,871-Speed 3273.63 samples/sec   Loss 0.7210   LearningRate 0.0000   Epoch: 19   Global Step: 244410   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:56:55,001-Speed 3272.59 samples/sec   Loss 0.7196   LearningRate 0.0000   Epoch: 19   Global Step: 244420   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:56:58,075-Speed 3331.80 samples/sec   Loss 0.7092   LearningRate 0.0000   Epoch: 19   Global Step: 244430   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:57:01,205-Speed 3273.05 samples/sec   Loss 0.7087   LearningRate 0.0000   Epoch: 19   Global Step: 244440   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 22:57:04,333-Speed 3274.84 samples/sec   Loss 0.6984   LearningRate 0.0000   Epoch: 19   Global Step: 244450   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 22:57:07,417-Speed 3321.33 samples/sec   Loss 0.7422   LearningRate 0.0000   Epoch: 19   Global Step: 244460   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 22:57:10,528-Speed 3292.16 samples/sec   Loss 0.7204   LearningRate 0.0000   Epoch: 19   Global Step: 244470   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 22:57:13,648-Speed 3283.54 samples/sec   Loss 0.7412   LearningRate 0.0000   Epoch: 19   Global Step: 244480   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 22:57:16,744-Speed 3308.82 samples/sec   Loss 0.7070   LearningRate 0.0000   Epoch: 19   Global Step: 244490   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 22:57:19,824-Speed 3325.55 samples/sec   Loss 0.7202   LearningRate 0.0000   Epoch: 19   Global Step: 244500   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 22:57:22,909-Speed 3320.29 samples/sec   Loss 0.7129   LearningRate 0.0000   Epoch: 19   Global Step: 244510   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 22:57:26,036-Speed 3275.89 samples/sec   Loss 0.7064   LearningRate 0.0000   Epoch: 19   Global Step: 244520   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 22:57:29,144-Speed 3295.38 samples/sec   Loss 0.7191   LearningRate 0.0000   Epoch: 19   Global Step: 244530   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 22:57:32,244-Speed 3304.96 samples/sec   Loss 0.7145   LearningRate 0.0000   Epoch: 19   Global Step: 244540   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 22:57:35,420-Speed 3224.85 samples/sec   Loss 0.7591   LearningRate 0.0000   Epoch: 19   Global Step: 244550   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 22:57:38,546-Speed 3277.03 samples/sec   Loss 0.7256   LearningRate 0.0000   Epoch: 19   Global Step: 244560   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:57:41,715-Speed 3232.40 samples/sec   Loss 0.7410   LearningRate 0.0000   Epoch: 19   Global Step: 244570   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:57:44,804-Speed 3315.66 samples/sec   Loss 0.7628   LearningRate 0.0000   Epoch: 19   Global Step: 244580   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:57:47,945-Speed 3261.11 samples/sec   Loss 0.6937   LearningRate 0.0000   Epoch: 19   Global Step: 244590   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:57:51,132-Speed 3214.19 samples/sec   Loss 0.7020   LearningRate 0.0000   Epoch: 19   Global Step: 244600   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:57:54,322-Speed 3210.56 samples/sec   Loss 0.7270   LearningRate 0.0000   Epoch: 19   Global Step: 244610   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:57:57,430-Speed 3295.95 samples/sec   Loss 0.7323   LearningRate 0.0000   Epoch: 19   Global Step: 244620   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:58:00,503-Speed 3333.66 samples/sec   Loss 0.7599   LearningRate 0.0000   Epoch: 19   Global Step: 244630   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:58:03,595-Speed 3312.55 samples/sec   Loss 0.7154   LearningRate 0.0000   Epoch: 19   Global Step: 244640   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:58:06,791-Speed 3205.88 samples/sec   Loss 0.7067   LearningRate 0.0000   Epoch: 19   Global Step: 244650   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:58:09,878-Speed 3317.82 samples/sec   Loss 0.7505   LearningRate 0.0000   Epoch: 19   Global Step: 244660   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 22:58:12,972-Speed 3310.94 samples/sec   Loss 0.7436   LearningRate 0.0000   Epoch: 19   Global Step: 244670   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 22:58:16,145-Speed 3228.11 samples/sec   Loss 0.7311   LearningRate 0.0000   Epoch: 19   Global Step: 244680   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 22:58:19,316-Speed 3230.12 samples/sec   Loss 0.6853   LearningRate 0.0000   Epoch: 19   Global Step: 244690   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 22:58:22,401-Speed 3320.71 samples/sec   Loss 0.7287   LearningRate 0.0000   Epoch: 19   Global Step: 244700   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:58:25,585-Speed 3216.83 samples/sec   Loss 0.7115   LearningRate 0.0000   Epoch: 19   Global Step: 244710   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:58:28,741-Speed 3245.29 samples/sec   Loss 0.7310   LearningRate 0.0000   Epoch: 19   Global Step: 244720   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 22:58:31,848-Speed 3296.96 samples/sec   Loss 0.7525   LearningRate 0.0000   Epoch: 19   Global Step: 244730   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 22:58:34,930-Speed 3323.51 samples/sec   Loss 0.7177   LearningRate 0.0000   Epoch: 19   Global Step: 244740   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 22:58:38,057-Speed 3276.58 samples/sec   Loss 0.7199   LearningRate 0.0000   Epoch: 19   Global Step: 244750   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 22:58:41,166-Speed 3294.80 samples/sec   Loss 0.6967   LearningRate 0.0000   Epoch: 19   Global Step: 244760   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 22:58:44,302-Speed 3265.28 samples/sec   Loss 0.6879   LearningRate 0.0000   Epoch: 19   Global Step: 244770   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 22:58:47,426-Speed 3279.75 samples/sec   Loss 0.7161   LearningRate 0.0000   Epoch: 19   Global Step: 244780   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 22:58:50,510-Speed 3320.49 samples/sec   Loss 0.7354   LearningRate 0.0000   Epoch: 19   Global Step: 244790   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 22:58:53,646-Speed 3266.96 samples/sec   Loss 0.7127   LearningRate 0.0000   Epoch: 19   Global Step: 244800   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 22:58:56,691-Speed 3363.78 samples/sec   Loss 0.6852   LearningRate 0.0000   Epoch: 19   Global Step: 244810   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 22:58:59,749-Speed 3349.66 samples/sec   Loss 0.7162   LearningRate 0.0000   Epoch: 19   Global Step: 244820   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:59:02,865-Speed 3287.45 samples/sec   Loss 0.7292   LearningRate 0.0000   Epoch: 19   Global Step: 244830   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:59:05,960-Speed 3310.15 samples/sec   Loss 0.7141   LearningRate 0.0000   Epoch: 19   Global Step: 244840   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:59:09,074-Speed 3289.07 samples/sec   Loss 0.7498   LearningRate 0.0000   Epoch: 19   Global Step: 244850   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:59:12,170-Speed 3308.62 samples/sec   Loss 0.6817   LearningRate 0.0000   Epoch: 19   Global Step: 244860   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:59:15,309-Speed 3263.57 samples/sec   Loss 0.7632   LearningRate 0.0000   Epoch: 19   Global Step: 244870   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:59:18,459-Speed 3251.59 samples/sec   Loss 0.7367   LearningRate 0.0000   Epoch: 19   Global Step: 244880   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:59:21,535-Speed 3329.93 samples/sec   Loss 0.7206   LearningRate 0.0000   Epoch: 19   Global Step: 244890   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:59:24,641-Speed 3298.50 samples/sec   Loss 0.6799   LearningRate 0.0000   Epoch: 19   Global Step: 244900   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:59:27,761-Speed 3283.06 samples/sec   Loss 0.7134   LearningRate 0.0000   Epoch: 19   Global Step: 244910   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:59:30,889-Speed 3274.90 samples/sec   Loss 0.7243   LearningRate 0.0000   Epoch: 19   Global Step: 244920   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 22:59:33,949-Speed 3346.45 samples/sec   Loss 0.7277   LearningRate 0.0000   Epoch: 19   Global Step: 244930   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:59:37,041-Speed 3314.13 samples/sec   Loss 0.7230   LearningRate 0.0000   Epoch: 19   Global Step: 244940   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 22:59:40,071-Speed 3380.21 samples/sec   Loss 0.7115   LearningRate 0.0000   Epoch: 19   Global Step: 244950   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 22:59:43,190-Speed 3283.61 samples/sec   Loss 0.7361   LearningRate 0.0000   Epoch: 19   Global Step: 244960   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 22:59:46,272-Speed 3323.44 samples/sec   Loss 0.7228   LearningRate 0.0000   Epoch: 19   Global Step: 244970   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 22:59:49,453-Speed 3220.89 samples/sec   Loss 0.7412   LearningRate 0.0000   Epoch: 19   Global Step: 244980   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 22:59:52,547-Speed 3310.34 samples/sec   Loss 0.7456   LearningRate 0.0000   Epoch: 19   Global Step: 244990   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 22:59:55,695-Speed 3253.98 samples/sec   Loss 0.7160   LearningRate 0.0000   Epoch: 19   Global Step: 245000   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 22:59:58,776-Speed 3324.98 samples/sec   Loss 0.7089   LearningRate 0.0000   Epoch: 19   Global Step: 245010   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:00:01,942-Speed 3234.74 samples/sec   Loss 0.7098   LearningRate 0.0000   Epoch: 19   Global Step: 245020   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:00:05,036-Speed 3311.47 samples/sec   Loss 0.7280   LearningRate 0.0000   Epoch: 19   Global Step: 245030   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:00:08,178-Speed 3259.58 samples/sec   Loss 0.7229   LearningRate 0.0000   Epoch: 19   Global Step: 245040   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:00:11,296-Speed 3285.40 samples/sec   Loss 0.6885   LearningRate 0.0000   Epoch: 19   Global Step: 245050   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:00:14,390-Speed 3310.58 samples/sec   Loss 0.6976   LearningRate 0.0000   Epoch: 19   Global Step: 245060   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:00:17,469-Speed 3327.42 samples/sec   Loss 0.7413   LearningRate 0.0000   Epoch: 19   Global Step: 245070   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:00:20,565-Speed 3309.40 samples/sec   Loss 0.7407   LearningRate 0.0000   Epoch: 19   Global Step: 245080   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:00:23,708-Speed 3258.98 samples/sec   Loss 0.7589   LearningRate 0.0000   Epoch: 19   Global Step: 245090   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:00:26,799-Speed 3313.54 samples/sec   Loss 0.7389   LearningRate 0.0000   Epoch: 19   Global Step: 245100   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:00:29,918-Speed 3284.31 samples/sec   Loss 0.7147   LearningRate 0.0000   Epoch: 19   Global Step: 245110   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:00:32,981-Speed 3344.18 samples/sec   Loss 0.7088   LearningRate 0.0000   Epoch: 19   Global Step: 245120   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:00:36,114-Speed 3269.42 samples/sec   Loss 0.7302   LearningRate 0.0000   Epoch: 19   Global Step: 245130   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:00:39,223-Speed 3295.12 samples/sec   Loss 0.7013   LearningRate 0.0000   Epoch: 19   Global Step: 245140   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:00:42,370-Speed 3253.89 samples/sec   Loss 0.7338   LearningRate 0.0000   Epoch: 19   Global Step: 245150   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:00:45,446-Speed 3330.83 samples/sec   Loss 0.7135   LearningRate 0.0000   Epoch: 19   Global Step: 245160   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:00:48,523-Speed 3329.07 samples/sec   Loss 0.7142   LearningRate 0.0000   Epoch: 19   Global Step: 245170   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:00:51,671-Speed 3253.16 samples/sec   Loss 0.7284   LearningRate 0.0000   Epoch: 19   Global Step: 245180   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:00:54,753-Speed 3324.08 samples/sec   Loss 0.7585   LearningRate 0.0000   Epoch: 19   Global Step: 245190   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:00:57,831-Speed 3327.70 samples/sec   Loss 0.7315   LearningRate 0.0000   Epoch: 19   Global Step: 245200   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:01:00,897-Speed 3341.11 samples/sec   Loss 0.7325   LearningRate 0.0000   Epoch: 19   Global Step: 245210   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:01:04,007-Speed 3293.78 samples/sec   Loss 0.6981   LearningRate 0.0000   Epoch: 19   Global Step: 245220   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:01:07,112-Speed 3298.38 samples/sec   Loss 0.7224   LearningRate 0.0000   Epoch: 19   Global Step: 245230   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:01:10,158-Speed 3363.02 samples/sec   Loss 0.7016   LearningRate 0.0000   Epoch: 19   Global Step: 245240   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:01:13,310-Speed 3249.47 samples/sec   Loss 0.7045   LearningRate 0.0000   Epoch: 19   Global Step: 245250   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:01:16,420-Speed 3294.11 samples/sec   Loss 0.7073   LearningRate 0.0000   Epoch: 19   Global Step: 245260   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:01:19,499-Speed 3326.11 samples/sec   Loss 0.7005   LearningRate 0.0000   Epoch: 19   Global Step: 245270   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:01:22,598-Speed 3305.91 samples/sec   Loss 0.7211   LearningRate 0.0000   Epoch: 19   Global Step: 245280   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:01:25,666-Speed 3338.80 samples/sec   Loss 0.7274   LearningRate 0.0000   Epoch: 19   Global Step: 245290   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:01:28,762-Speed 3308.22 samples/sec   Loss 0.7032   LearningRate 0.0000   Epoch: 19   Global Step: 245300   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:01:31,815-Speed 3354.86 samples/sec   Loss 0.6988   LearningRate 0.0000   Epoch: 19   Global Step: 245310   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:01:34,879-Speed 3343.56 samples/sec   Loss 0.7291   LearningRate 0.0000   Epoch: 19   Global Step: 245320   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 23:01:37,969-Speed 3314.40 samples/sec   Loss 0.6804   LearningRate 0.0000   Epoch: 19   Global Step: 245330   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 23:01:41,102-Speed 3270.41 samples/sec   Loss 0.6901   LearningRate 0.0000   Epoch: 19   Global Step: 245340   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 23:01:44,179-Speed 3328.56 samples/sec   Loss 0.7233   LearningRate 0.0000   Epoch: 19   Global Step: 245350   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 23:01:47,270-Speed 3314.50 samples/sec   Loss 0.7131   LearningRate 0.0000   Epoch: 19   Global Step: 245360   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:01:50,372-Speed 3300.95 samples/sec   Loss 0.7254   LearningRate 0.0000   Epoch: 19   Global Step: 245370   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:01:53,430-Speed 3349.92 samples/sec   Loss 0.7149   LearningRate 0.0000   Epoch: 19   Global Step: 245380   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:01:56,513-Speed 3322.78 samples/sec   Loss 0.7410   LearningRate 0.0000   Epoch: 19   Global Step: 245390   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:01:59,603-Speed 3314.82 samples/sec   Loss 0.7581   LearningRate 0.0000   Epoch: 19   Global Step: 245400   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:02:02,724-Speed 3281.63 samples/sec   Loss 0.7127   LearningRate 0.0000   Epoch: 19   Global Step: 245410   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:02:05,831-Speed 3297.51 samples/sec   Loss 0.7527   LearningRate 0.0000   Epoch: 19   Global Step: 245420   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:02:08,874-Speed 3365.64 samples/sec   Loss 0.6919   LearningRate 0.0000   Epoch: 19   Global Step: 245430   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:02:11,948-Speed 3332.66 samples/sec   Loss 0.6936   LearningRate 0.0000   Epoch: 19   Global Step: 245440   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:02:15,115-Speed 3234.76 samples/sec   Loss 0.7125   LearningRate 0.0000   Epoch: 19   Global Step: 245450   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:02:18,240-Speed 3277.39 samples/sec   Loss 0.7192   LearningRate 0.0000   Epoch: 19   Global Step: 245460   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:02:21,326-Speed 3319.06 samples/sec   Loss 0.7165   LearningRate 0.0000   Epoch: 19   Global Step: 245470   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:02:24,429-Speed 3301.13 samples/sec   Loss 0.7381   LearningRate 0.0000   Epoch: 19   Global Step: 245480   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:02:27,598-Speed 3231.87 samples/sec   Loss 0.6804   LearningRate 0.0000   Epoch: 19   Global Step: 245490   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:02:30,705-Speed 3297.14 samples/sec   Loss 0.7054   LearningRate 0.0000   Epoch: 19   Global Step: 245500   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:02:33,767-Speed 3345.21 samples/sec   Loss 0.7437   LearningRate 0.0000   Epoch: 19   Global Step: 245510   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:02:36,894-Speed 3275.10 samples/sec   Loss 0.6771   LearningRate 0.0000   Epoch: 19   Global Step: 245520   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:02:40,118-Speed 3177.42 samples/sec   Loss 0.7025   LearningRate 0.0000   Epoch: 19   Global Step: 245530   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:02:43,262-Speed 3258.07 samples/sec   Loss 0.7242   LearningRate 0.0000   Epoch: 19   Global Step: 245540   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:02:46,348-Speed 3319.65 samples/sec   Loss 0.7520   LearningRate 0.0000   Epoch: 19   Global Step: 245550   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:02:49,514-Speed 3235.76 samples/sec   Loss 0.7284   LearningRate 0.0000   Epoch: 19   Global Step: 245560   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 23:02:52,619-Speed 3299.15 samples/sec   Loss 0.7266   LearningRate 0.0000   Epoch: 19   Global Step: 245570   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 23:02:55,725-Speed 3297.64 samples/sec   Loss 0.7335   LearningRate 0.0000   Epoch: 19   Global Step: 245580   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:02:58,859-Speed 3268.35 samples/sec   Loss 0.7420   LearningRate 0.0000   Epoch: 19   Global Step: 245590   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:03:01,969-Speed 3293.43 samples/sec   Loss 0.6968   LearningRate 0.0000   Epoch: 19   Global Step: 245600   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:03:05,083-Speed 3289.77 samples/sec   Loss 0.7297   LearningRate 0.0000   Epoch: 19   Global Step: 245610   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:03:08,207-Speed 3278.87 samples/sec   Loss 0.7401   LearningRate 0.0000   Epoch: 19   Global Step: 245620   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:03:11,295-Speed 3316.42 samples/sec   Loss 0.7229   LearningRate 0.0000   Epoch: 19   Global Step: 245630   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:03:14,453-Speed 3244.06 samples/sec   Loss 0.7439   LearningRate 0.0000   Epoch: 19   Global Step: 245640   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:03:17,541-Speed 3316.71 samples/sec   Loss 0.7519   LearningRate 0.0000   Epoch: 19   Global Step: 245650   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:03:20,657-Speed 3288.09 samples/sec   Loss 0.7274   LearningRate 0.0000   Epoch: 19   Global Step: 245660   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:03:23,804-Speed 3254.20 samples/sec   Loss 0.7091   LearningRate 0.0000   Epoch: 19   Global Step: 245670   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:03:26,927-Speed 3280.17 samples/sec   Loss 0.7384   LearningRate 0.0000   Epoch: 19   Global Step: 245680   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 23:03:30,067-Speed 3262.64 samples/sec   Loss 0.7248   LearningRate 0.0000   Epoch: 19   Global Step: 245690   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 23:03:33,174-Speed 3296.63 samples/sec   Loss 0.7251   LearningRate 0.0000   Epoch: 19   Global Step: 245700   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 23:03:36,317-Speed 3259.56 samples/sec   Loss 0.7296   LearningRate 0.0000   Epoch: 19   Global Step: 245710   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:03:39,462-Speed 3256.77 samples/sec   Loss 0.6897   LearningRate 0.0000   Epoch: 19   Global Step: 245720   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:03:42,567-Speed 3297.86 samples/sec   Loss 0.7309   LearningRate 0.0000   Epoch: 19   Global Step: 245730   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:03:45,696-Speed 3274.34 samples/sec   Loss 0.6925   LearningRate 0.0000   Epoch: 19   Global Step: 245740   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:03:48,818-Speed 3281.45 samples/sec   Loss 0.7153   LearningRate 0.0000   Epoch: 19   Global Step: 245750   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:03:51,925-Speed 3296.63 samples/sec   Loss 0.7226   LearningRate 0.0000   Epoch: 19   Global Step: 245760   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:03:55,032-Speed 3296.43 samples/sec   Loss 0.7627   LearningRate 0.0000   Epoch: 19   Global Step: 245770   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:03:58,102-Speed 3336.70 samples/sec   Loss 0.7235   LearningRate 0.0000   Epoch: 19   Global Step: 245780   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:04:01,249-Speed 3254.26 samples/sec   Loss 0.7174   LearningRate 0.0000   Epoch: 19   Global Step: 245790   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:04:04,372-Speed 3280.33 samples/sec   Loss 0.7069   LearningRate 0.0000   Epoch: 19   Global Step: 245800   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:04:07,507-Speed 3267.76 samples/sec   Loss 0.7399   LearningRate 0.0000   Epoch: 19   Global Step: 245810   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 23:04:10,605-Speed 3306.45 samples/sec   Loss 0.7303   LearningRate 0.0000   Epoch: 19   Global Step: 245820   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 23:04:13,716-Speed 3291.75 samples/sec   Loss 0.7251   LearningRate 0.0000   Epoch: 19   Global Step: 245830   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 23:04:16,820-Speed 3300.36 samples/sec   Loss 0.7404   LearningRate 0.0000   Epoch: 19   Global Step: 245840   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 23:04:19,924-Speed 3300.06 samples/sec   Loss 0.7226   LearningRate 0.0000   Epoch: 19   Global Step: 245850   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 23:04:23,023-Speed 3305.55 samples/sec   Loss 0.7353   LearningRate 0.0000   Epoch: 19   Global Step: 245860   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 23:04:26,124-Speed 3303.03 samples/sec   Loss 0.7205   LearningRate 0.0000   Epoch: 19   Global Step: 245870   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:04:29,265-Speed 3261.37 samples/sec   Loss 0.7207   LearningRate 0.0000   Epoch: 19   Global Step: 245880   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:04:32,375-Speed 3293.00 samples/sec   Loss 0.7266   LearningRate 0.0000   Epoch: 19   Global Step: 245890   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:04:35,499-Speed 3278.84 samples/sec   Loss 0.7283   LearningRate 0.0000   Epoch: 19   Global Step: 245900   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:04:38,596-Speed 3307.49 samples/sec   Loss 0.6837   LearningRate 0.0000   Epoch: 19   Global Step: 245910   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:04:41,836-Speed 3161.22 samples/sec   Loss 0.7228   LearningRate 0.0000   Epoch: 19   Global Step: 245920   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:04:44,964-Speed 3274.99 samples/sec   Loss 0.7217   LearningRate 0.0000   Epoch: 19   Global Step: 245930   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:04:48,105-Speed 3261.90 samples/sec   Loss 0.7182   LearningRate 0.0000   Epoch: 19   Global Step: 245940   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:04:51,185-Speed 3325.33 samples/sec   Loss 0.7116   LearningRate 0.0000   Epoch: 19   Global Step: 245950   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:04:54,285-Speed 3303.58 samples/sec   Loss 0.7224   LearningRate 0.0000   Epoch: 19   Global Step: 245960   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:04:57,338-Speed 3356.01 samples/sec   Loss 0.7052   LearningRate 0.0000   Epoch: 19   Global Step: 245970   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:05:00,450-Speed 3290.86 samples/sec   Loss 0.7088   LearningRate 0.0000   Epoch: 19   Global Step: 245980   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:05:03,543-Speed 3311.97 samples/sec   Loss 0.7320   LearningRate 0.0000   Epoch: 19   Global Step: 245990   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:05:06,643-Speed 3303.78 samples/sec   Loss 0.7187   LearningRate 0.0000   Epoch: 19   Global Step: 246000   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:05:09,690-Speed 3361.84 samples/sec   Loss 0.7231   LearningRate 0.0000   Epoch: 19   Global Step: 246010   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:05:12,829-Speed 3263.43 samples/sec   Loss 0.7441   LearningRate 0.0000   Epoch: 19   Global Step: 246020   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:05:15,906-Speed 3328.75 samples/sec   Loss 0.7155   LearningRate 0.0000   Epoch: 19   Global Step: 246030   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:05:19,045-Speed 3263.66 samples/sec   Loss 0.7065   LearningRate 0.0000   Epoch: 19   Global Step: 246040   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:05:22,099-Speed 3354.54 samples/sec   Loss 0.7292   LearningRate 0.0000   Epoch: 19   Global Step: 246050   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:05:25,181-Speed 3323.37 samples/sec   Loss 0.7295   LearningRate 0.0000   Epoch: 19   Global Step: 246060   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:05:28,303-Speed 3280.90 samples/sec   Loss 0.7472   LearningRate 0.0000   Epoch: 19   Global Step: 246070   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:05:31,421-Speed 3285.22 samples/sec   Loss 0.7547   LearningRate 0.0000   Epoch: 19   Global Step: 246080   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:05:34,502-Speed 3324.37 samples/sec   Loss 0.7210   LearningRate 0.0000   Epoch: 19   Global Step: 246090   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:05:37,653-Speed 3250.44 samples/sec   Loss 0.7151   LearningRate 0.0000   Epoch: 19   Global Step: 246100   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:05:40,752-Speed 3305.91 samples/sec   Loss 0.7290   LearningRate 0.0000   Epoch: 19   Global Step: 246110   Fp16 Grad Scale: 4096   Required: 0 hours
Training: 2022-04-27 23:05:43,864-Speed 3291.26 samples/sec   Loss 0.7473   LearningRate 0.0000   Epoch: 19   Global Step: 246120   Fp16 Grad Scale: 4096   Required: 0 hours
Training: 2022-04-27 23:05:47,049-Speed 3215.72 samples/sec   Loss 0.7198   LearningRate 0.0000   Epoch: 19   Global Step: 246130   Fp16 Grad Scale: 4096   Required: 0 hours
Training: 2022-04-27 23:05:50,135-Speed 3319.88 samples/sec   Loss 0.7088   LearningRate 0.0000   Epoch: 19   Global Step: 246140   Fp16 Grad Scale: 4096   Required: 0 hours
Training: 2022-04-27 23:05:53,225-Speed 3315.31 samples/sec   Loss 0.7190   LearningRate 0.0000   Epoch: 19   Global Step: 246150   Fp16 Grad Scale: 4096   Required: 0 hours
Training: 2022-04-27 23:05:56,312-Speed 3317.57 samples/sec   Loss 0.6849   LearningRate 0.0000   Epoch: 19   Global Step: 246160   Fp16 Grad Scale: 4096   Required: 0 hours
Training: 2022-04-27 23:05:59,371-Speed 3348.42 samples/sec   Loss 0.7358   LearningRate 0.0000   Epoch: 19   Global Step: 246170   Fp16 Grad Scale: 4096   Required: 0 hours
Training: 2022-04-27 23:06:02,521-Speed 3251.67 samples/sec   Loss 0.7050   LearningRate 0.0000   Epoch: 19   Global Step: 246180   Fp16 Grad Scale: 4096   Required: 0 hours
Training: 2022-04-27 23:06:05,615-Speed 3310.97 samples/sec   Loss 0.7157   LearningRate 0.0000   Epoch: 19   Global Step: 246190   Fp16 Grad Scale: 4096   Required: 0 hours
Training: 2022-04-27 23:06:08,718-Speed 3301.41 samples/sec   Loss 0.7508   LearningRate 0.0000   Epoch: 19   Global Step: 246200   Fp16 Grad Scale: 4096   Required: 0 hours
Training: 2022-04-27 23:06:11,897-Speed 3221.30 samples/sec   Loss 0.6828   LearningRate 0.0000   Epoch: 19   Global Step: 246210   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:06:14,967-Speed 3336.37 samples/sec   Loss 0.7379   LearningRate 0.0000   Epoch: 19   Global Step: 246220   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:06:18,029-Speed 3346.11 samples/sec   Loss 0.7348   LearningRate 0.0000   Epoch: 19   Global Step: 246230   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:06:21,086-Speed 3351.10 samples/sec   Loss 0.7411   LearningRate 0.0000   Epoch: 19   Global Step: 246240   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:06:24,157-Speed 3335.75 samples/sec   Loss 0.7112   LearningRate 0.0000   Epoch: 19   Global Step: 246250   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:06:27,212-Speed 3353.32 samples/sec   Loss 0.7250   LearningRate 0.0000   Epoch: 19   Global Step: 246260   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:06:30,271-Speed 3348.35 samples/sec   Loss 0.7279   LearningRate 0.0000   Epoch: 19   Global Step: 246270   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:06:33,341-Speed 3336.65 samples/sec   Loss 0.7013   LearningRate 0.0000   Epoch: 19   Global Step: 246280   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:06:36,476-Speed 3267.67 samples/sec   Loss 0.6702   LearningRate 0.0000   Epoch: 19   Global Step: 246290   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:06:39,614-Speed 3263.90 samples/sec   Loss 0.7259   LearningRate 0.0000   Epoch: 19   Global Step: 246300   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:06:42,795-Speed 3220.58 samples/sec   Loss 0.7422   LearningRate 0.0000   Epoch: 19   Global Step: 246310   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:06:45,880-Speed 3319.64 samples/sec   Loss 0.7300   LearningRate 0.0000   Epoch: 19   Global Step: 246320   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:06:48,940-Speed 3347.60 samples/sec   Loss 0.6996   LearningRate 0.0000   Epoch: 19   Global Step: 246330   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:06:52,016-Speed 3330.71 samples/sec   Loss 0.6883   LearningRate 0.0000   Epoch: 19   Global Step: 246340   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:06:55,177-Speed 3239.41 samples/sec   Loss 0.7368   LearningRate 0.0000   Epoch: 19   Global Step: 246350   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:06:58,269-Speed 3313.23 samples/sec   Loss 0.7398   LearningRate 0.0000   Epoch: 19   Global Step: 246360   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:07:01,414-Speed 3257.67 samples/sec   Loss 0.7568   LearningRate 0.0000   Epoch: 19   Global Step: 246370   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:07:04,487-Speed 3333.09 samples/sec   Loss 0.6689   LearningRate 0.0000   Epoch: 19   Global Step: 246380   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:07:07,590-Speed 3301.45 samples/sec   Loss 0.7242   LearningRate 0.0000   Epoch: 19   Global Step: 246390   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:07:10,704-Speed 3289.45 samples/sec   Loss 0.6967   LearningRate 0.0000   Epoch: 19   Global Step: 246400   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:07:13,817-Speed 3290.63 samples/sec   Loss 0.7283   LearningRate 0.0000   Epoch: 19   Global Step: 246410   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 23:07:16,898-Speed 3324.08 samples/sec   Loss 0.7327   LearningRate 0.0000   Epoch: 19   Global Step: 246420   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 23:07:19,978-Speed 3325.63 samples/sec   Loss 0.7553   LearningRate 0.0000   Epoch: 19   Global Step: 246430   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 23:07:23,053-Speed 3331.60 samples/sec   Loss 0.7195   LearningRate 0.0000   Epoch: 19   Global Step: 246440   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 23:07:26,105-Speed 3356.31 samples/sec   Loss 0.7420   LearningRate 0.0000   Epoch: 19   Global Step: 246450   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:07:29,183-Speed 3327.48 samples/sec   Loss 0.7263   LearningRate 0.0000   Epoch: 19   Global Step: 246460   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:07:32,256-Speed 3333.60 samples/sec   Loss 0.7311   LearningRate 0.0000   Epoch: 19   Global Step: 246470   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:07:35,350-Speed 3310.33 samples/sec   Loss 0.7048   LearningRate 0.0000   Epoch: 19   Global Step: 246480   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:07:38,407-Speed 3350.76 samples/sec   Loss 0.7199   LearningRate 0.0000   Epoch: 19   Global Step: 246490   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:07:41,517-Speed 3293.47 samples/sec   Loss 0.7243   LearningRate 0.0000   Epoch: 19   Global Step: 246500   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:07:44,577-Speed 3347.24 samples/sec   Loss 0.7294   LearningRate 0.0000   Epoch: 19   Global Step: 246510   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:07:47,712-Speed 3268.25 samples/sec   Loss 0.7338   LearningRate 0.0000   Epoch: 19   Global Step: 246520   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:07:50,961-Speed 3152.09 samples/sec   Loss 0.7114   LearningRate 0.0000   Epoch: 19   Global Step: 246530   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:07:54,015-Speed 3353.55 samples/sec   Loss 0.7159   LearningRate 0.0000   Epoch: 19   Global Step: 246540   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:07:57,092-Speed 3329.56 samples/sec   Loss 0.7556   LearningRate 0.0000   Epoch: 19   Global Step: 246550   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:08:00,185-Speed 3311.94 samples/sec   Loss 0.7334   LearningRate 0.0000   Epoch: 19   Global Step: 246560   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:08:03,340-Speed 3246.01 samples/sec   Loss 0.6904   LearningRate 0.0000   Epoch: 19   Global Step: 246570   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:08:06,519-Speed 3222.64 samples/sec   Loss 0.7434   LearningRate 0.0000   Epoch: 19   Global Step: 246580   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:08:09,615-Speed 3308.31 samples/sec   Loss 0.7113   LearningRate 0.0000   Epoch: 19   Global Step: 246590   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:08:12,669-Speed 3353.78 samples/sec   Loss 0.7177   LearningRate 0.0000   Epoch: 19   Global Step: 246600   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:08:15,771-Speed 3302.68 samples/sec   Loss 0.7331   LearningRate 0.0000   Epoch: 19   Global Step: 246610   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:08:18,908-Speed 3265.39 samples/sec   Loss 0.7397   LearningRate 0.0000   Epoch: 19   Global Step: 246620   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:08:21,961-Speed 3355.16 samples/sec   Loss 0.7225   LearningRate 0.0000   Epoch: 19   Global Step: 246630   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:08:25,035-Speed 3331.45 samples/sec   Loss 0.7206   LearningRate 0.0000   Epoch: 19   Global Step: 246640   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:08:28,223-Speed 3213.45 samples/sec   Loss 0.7287   LearningRate 0.0000   Epoch: 19   Global Step: 246650   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 23:08:31,331-Speed 3295.52 samples/sec   Loss 0.7381   LearningRate 0.0000   Epoch: 19   Global Step: 246660   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:08:34,390-Speed 3349.04 samples/sec   Loss 0.7248   LearningRate 0.0000   Epoch: 19   Global Step: 246670   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:08:37,473-Speed 3322.66 samples/sec   Loss 0.7049   LearningRate 0.0000   Epoch: 19   Global Step: 246680   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:08:40,575-Speed 3301.58 samples/sec   Loss 0.7301   LearningRate 0.0000   Epoch: 19   Global Step: 246690   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:08:43,669-Speed 3311.11 samples/sec   Loss 0.7612   LearningRate 0.0000   Epoch: 19   Global Step: 246700   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:08:46,742-Speed 3333.22 samples/sec   Loss 0.7212   LearningRate 0.0000   Epoch: 19   Global Step: 246710   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:08:49,862-Speed 3282.93 samples/sec   Loss 0.7002   LearningRate 0.0000   Epoch: 19   Global Step: 246720   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:08:52,964-Speed 3302.73 samples/sec   Loss 0.7300   LearningRate 0.0000   Epoch: 19   Global Step: 246730   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:08:56,014-Speed 3358.10 samples/sec   Loss 0.7614   LearningRate 0.0000   Epoch: 19   Global Step: 246740   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:08:59,083-Speed 3337.87 samples/sec   Loss 0.7480   LearningRate 0.0000   Epoch: 19   Global Step: 246750   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:09:02,227-Speed 3257.45 samples/sec   Loss 0.6973   LearningRate 0.0000   Epoch: 19   Global Step: 246760   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:09:05,322-Speed 3310.14 samples/sec   Loss 0.7293   LearningRate 0.0000   Epoch: 19   Global Step: 246770   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:09:08,372-Speed 3358.83 samples/sec   Loss 0.7279   LearningRate 0.0000   Epoch: 19   Global Step: 246780   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:09:11,476-Speed 3300.17 samples/sec   Loss 0.7167   LearningRate 0.0000   Epoch: 19   Global Step: 246790   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:09:14,668-Speed 3209.02 samples/sec   Loss 0.7151   LearningRate 0.0000   Epoch: 19   Global Step: 246800   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:09:17,754-Speed 3319.20 samples/sec   Loss 0.7570   LearningRate 0.0000   Epoch: 19   Global Step: 246810   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:09:20,825-Speed 3334.90 samples/sec   Loss 0.7104   LearningRate 0.0000   Epoch: 19   Global Step: 246820   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:09:23,909-Speed 3321.43 samples/sec   Loss 0.7591   LearningRate 0.0000   Epoch: 19   Global Step: 246830   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:09:27,002-Speed 3312.42 samples/sec   Loss 0.7231   LearningRate 0.0000   Epoch: 19   Global Step: 246840   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:09:30,144-Speed 3260.01 samples/sec   Loss 0.7068   LearningRate 0.0000   Epoch: 19   Global Step: 246850   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:09:33,226-Speed 3323.22 samples/sec   Loss 0.7144   LearningRate 0.0000   Epoch: 19   Global Step: 246860   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:09:36,296-Speed 3336.61 samples/sec   Loss 0.7183   LearningRate 0.0000   Epoch: 19   Global Step: 246870   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:09:39,407-Speed 3292.60 samples/sec   Loss 0.7069   LearningRate 0.0000   Epoch: 19   Global Step: 246880   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:09:42,565-Speed 3243.91 samples/sec   Loss 0.7221   LearningRate 0.0000   Epoch: 19   Global Step: 246890   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:09:45,685-Speed 3283.40 samples/sec   Loss 0.7095   LearningRate 0.0000   Epoch: 19   Global Step: 246900   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:09:48,812-Speed 3274.68 samples/sec   Loss 0.7192   LearningRate 0.0000   Epoch: 19   Global Step: 246910   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:09:51,885-Speed 3333.40 samples/sec   Loss 0.7145   LearningRate 0.0000   Epoch: 19   Global Step: 246920   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:09:54,943-Speed 3350.33 samples/sec   Loss 0.7074   LearningRate 0.0000   Epoch: 19   Global Step: 246930   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:09:57,997-Speed 3354.21 samples/sec   Loss 0.7147   LearningRate 0.0000   Epoch: 19   Global Step: 246940   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:10:01,055-Speed 3348.98 samples/sec   Loss 0.7223   LearningRate 0.0000   Epoch: 19   Global Step: 246950   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:10:04,131-Speed 3329.78 samples/sec   Loss 0.7051   LearningRate 0.0000   Epoch: 19   Global Step: 246960   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:10:07,203-Speed 3334.45 samples/sec   Loss 0.7366   LearningRate 0.0000   Epoch: 19   Global Step: 246970   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:10:10,254-Speed 3357.23 samples/sec   Loss 0.7215   LearningRate 0.0000   Epoch: 19   Global Step: 246980   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:10:13,342-Speed 3317.32 samples/sec   Loss 0.7050   LearningRate 0.0000   Epoch: 19   Global Step: 246990   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:10:16,551-Speed 3192.62 samples/sec   Loss 0.7240   LearningRate 0.0000   Epoch: 19   Global Step: 247000   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:10:19,682-Speed 3271.14 samples/sec   Loss 0.6971   LearningRate 0.0000   Epoch: 19   Global Step: 247010   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:10:22,772-Speed 3314.33 samples/sec   Loss 0.7340   LearningRate 0.0000   Epoch: 19   Global Step: 247020   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:10:25,852-Speed 3325.49 samples/sec   Loss 0.6943   LearningRate 0.0000   Epoch: 19   Global Step: 247030   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:10:28,970-Speed 3285.58 samples/sec   Loss 0.7484   LearningRate 0.0000   Epoch: 19   Global Step: 247040   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:10:32,086-Speed 3287.36 samples/sec   Loss 0.7317   LearningRate 0.0000   Epoch: 19   Global Step: 247050   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 23:10:35,141-Speed 3352.91 samples/sec   Loss 0.7499   LearningRate 0.0000   Epoch: 19   Global Step: 247060   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:10:38,220-Speed 3327.84 samples/sec   Loss 0.7368   LearningRate 0.0000   Epoch: 19   Global Step: 247070   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:10:41,393-Speed 3227.88 samples/sec   Loss 0.7182   LearningRate 0.0000   Epoch: 19   Global Step: 247080   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:10:44,504-Speed 3292.75 samples/sec   Loss 0.7100   LearningRate 0.0000   Epoch: 19   Global Step: 247090   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:10:47,623-Speed 3284.61 samples/sec   Loss 0.7067   LearningRate 0.0000   Epoch: 19   Global Step: 247100   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:10:50,733-Speed 3293.00 samples/sec   Loss 0.7058   LearningRate 0.0000   Epoch: 19   Global Step: 247110   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:10:53,814-Speed 3325.05 samples/sec   Loss 0.7244   LearningRate 0.0000   Epoch: 19   Global Step: 247120   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:10:56,962-Speed 3254.13 samples/sec   Loss 0.7162   LearningRate 0.0000   Epoch: 19   Global Step: 247130   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:11:00,029-Speed 3339.26 samples/sec   Loss 0.7645   LearningRate 0.0000   Epoch: 19   Global Step: 247140   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:11:03,085-Speed 3352.77 samples/sec   Loss 0.6921   LearningRate 0.0000   Epoch: 19   Global Step: 247150   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:11:06,206-Speed 3281.73 samples/sec   Loss 0.7018   LearningRate 0.0000   Epoch: 19   Global Step: 247160   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 23:11:09,270-Speed 3342.88 samples/sec   Loss 0.7206   LearningRate 0.0000   Epoch: 19   Global Step: 247170   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:11:12,409-Speed 3263.10 samples/sec   Loss 0.7265   LearningRate 0.0000   Epoch: 19   Global Step: 247180   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:11:15,554-Speed 3257.23 samples/sec   Loss 0.7405   LearningRate 0.0000   Epoch: 19   Global Step: 247190   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:11:18,693-Speed 3262.93 samples/sec   Loss 0.6743   LearningRate 0.0000   Epoch: 19   Global Step: 247200   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:11:21,772-Speed 3327.33 samples/sec   Loss 0.7095   LearningRate 0.0000   Epoch: 19   Global Step: 247210   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:11:24,844-Speed 3333.67 samples/sec   Loss 0.7184   LearningRate 0.0000   Epoch: 19   Global Step: 247220   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:11:27,981-Speed 3265.41 samples/sec   Loss 0.6944   LearningRate 0.0000   Epoch: 19   Global Step: 247230   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:11:31,091-Speed 3293.44 samples/sec   Loss 0.7183   LearningRate 0.0000   Epoch: 19   Global Step: 247240   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:11:34,153-Speed 3345.56 samples/sec   Loss 0.7336   LearningRate 0.0000   Epoch: 19   Global Step: 247250   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:11:37,232-Speed 3327.52 samples/sec   Loss 0.7162   LearningRate 0.0000   Epoch: 19   Global Step: 247260   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:11:40,311-Speed 3326.01 samples/sec   Loss 0.7165   LearningRate 0.0000   Epoch: 19   Global Step: 247270   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 23:11:43,378-Speed 3340.09 samples/sec   Loss 0.7185   LearningRate 0.0000   Epoch: 19   Global Step: 247280   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 23:11:46,428-Speed 3359.39 samples/sec   Loss 0.7102   LearningRate 0.0000   Epoch: 19   Global Step: 247290   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 23:11:49,496-Speed 3338.66 samples/sec   Loss 0.7354   LearningRate 0.0000   Epoch: 19   Global Step: 247300   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 23:11:52,581-Speed 3320.08 samples/sec   Loss 0.6923   LearningRate 0.0000   Epoch: 19   Global Step: 247310   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 23:11:55,663-Speed 3323.39 samples/sec   Loss 0.7314   LearningRate 0.0000   Epoch: 19   Global Step: 247320   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 23:11:58,785-Speed 3281.37 samples/sec   Loss 0.7132   LearningRate 0.0000   Epoch: 19   Global Step: 247330   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 23:12:01,943-Speed 3243.36 samples/sec   Loss 0.7010   LearningRate 0.0000   Epoch: 19   Global Step: 247340   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 23:12:05,087-Speed 3258.42 samples/sec   Loss 0.7221   LearningRate 0.0000   Epoch: 19   Global Step: 247350   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 23:12:08,178-Speed 3314.43 samples/sec   Loss 0.7227   LearningRate 0.0000   Epoch: 19   Global Step: 247360   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:12:11,288-Speed 3292.67 samples/sec   Loss 0.7430   LearningRate 0.0000   Epoch: 19   Global Step: 247370   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:12:14,491-Speed 3197.66 samples/sec   Loss 0.7098   LearningRate 0.0000   Epoch: 19   Global Step: 247380   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:12:17,626-Speed 3268.01 samples/sec   Loss 0.6851   LearningRate 0.0000   Epoch: 19   Global Step: 247390   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:12:20,725-Speed 3305.56 samples/sec   Loss 0.7157   LearningRate 0.0000   Epoch: 19   Global Step: 247400   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:12:23,838-Speed 3289.66 samples/sec   Loss 0.7410   LearningRate 0.0000   Epoch: 19   Global Step: 247410   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:12:26,971-Speed 3270.50 samples/sec   Loss 0.7689   LearningRate 0.0000   Epoch: 19   Global Step: 247420   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:12:30,105-Speed 3267.83 samples/sec   Loss 0.7168   LearningRate 0.0000   Epoch: 19   Global Step: 247430   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:12:33,221-Speed 3288.08 samples/sec   Loss 0.7157   LearningRate 0.0000   Epoch: 19   Global Step: 247440   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:12:36,291-Speed 3336.47 samples/sec   Loss 0.6811   LearningRate 0.0000   Epoch: 19   Global Step: 247450   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:12:39,357-Speed 3341.41 samples/sec   Loss 0.7375   LearningRate 0.0000   Epoch: 19   Global Step: 247460   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 23:12:42,475-Speed 3284.79 samples/sec   Loss 0.7298   LearningRate 0.0000   Epoch: 19   Global Step: 247470   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 23:12:45,547-Speed 3334.05 samples/sec   Loss 0.6908   LearningRate 0.0000   Epoch: 19   Global Step: 247480   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 23:12:48,673-Speed 3277.41 samples/sec   Loss 0.7138   LearningRate 0.0000   Epoch: 19   Global Step: 247490   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 23:12:51,806-Speed 3269.53 samples/sec   Loss 0.7858   LearningRate 0.0000   Epoch: 19   Global Step: 247500   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:12:54,986-Speed 3220.95 samples/sec   Loss 0.7094   LearningRate 0.0000   Epoch: 19   Global Step: 247510   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:12:58,063-Speed 3329.25 samples/sec   Loss 0.7028   LearningRate 0.0000   Epoch: 19   Global Step: 247520   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:13:01,189-Speed 3276.24 samples/sec   Loss 0.6918   LearningRate 0.0000   Epoch: 19   Global Step: 247530   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:13:04,291-Speed 3302.32 samples/sec   Loss 0.7129   LearningRate 0.0000   Epoch: 19   Global Step: 247540   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:13:07,456-Speed 3237.14 samples/sec   Loss 0.6919   LearningRate 0.0000   Epoch: 19   Global Step: 247550   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:13:10,549-Speed 3311.70 samples/sec   Loss 0.7357   LearningRate 0.0000   Epoch: 19   Global Step: 247560   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:13:13,639-Speed 3314.23 samples/sec   Loss 0.6875   LearningRate 0.0000   Epoch: 19   Global Step: 247570   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:13:16,743-Speed 3301.01 samples/sec   Loss 0.6906   LearningRate 0.0000   Epoch: 19   Global Step: 247580   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:13:19,858-Speed 3287.97 samples/sec   Loss 0.7195   LearningRate 0.0000   Epoch: 19   Global Step: 247590   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:13:22,955-Speed 3307.24 samples/sec   Loss 0.7122   LearningRate 0.0000   Epoch: 19   Global Step: 247600   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:13:26,072-Speed 3286.28 samples/sec   Loss 0.7269   LearningRate 0.0000   Epoch: 19   Global Step: 247610   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:13:29,300-Speed 3172.92 samples/sec   Loss 0.7085   LearningRate 0.0000   Epoch: 19   Global Step: 247620   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:13:32,442-Speed 3260.44 samples/sec   Loss 0.7040   LearningRate 0.0000   Epoch: 19   Global Step: 247630   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:13:35,525-Speed 3322.10 samples/sec   Loss 0.7452   LearningRate 0.0000   Epoch: 19   Global Step: 247640   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:13:38,624-Speed 3304.99 samples/sec   Loss 0.7201   LearningRate 0.0000   Epoch: 19   Global Step: 247650   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:13:41,726-Speed 3302.51 samples/sec   Loss 0.7084   LearningRate 0.0000   Epoch: 19   Global Step: 247660   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:13:44,832-Speed 3297.91 samples/sec   Loss 0.6995   LearningRate 0.0000   Epoch: 19   Global Step: 247670   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:13:47,956-Speed 3279.02 samples/sec   Loss 0.7136   LearningRate 0.0000   Epoch: 19   Global Step: 247680   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:13:51,071-Speed 3287.69 samples/sec   Loss 0.6486   LearningRate 0.0000   Epoch: 19   Global Step: 247690   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:13:54,192-Speed 3281.95 samples/sec   Loss 0.7327   LearningRate 0.0000   Epoch: 19   Global Step: 247700   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:13:57,283-Speed 3315.11 samples/sec   Loss 0.7312   LearningRate 0.0000   Epoch: 19   Global Step: 247710   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:14:00,343-Speed 3346.95 samples/sec   Loss 0.7363   LearningRate 0.0000   Epoch: 19   Global Step: 247720   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:14:03,503-Speed 3241.27 samples/sec   Loss 0.7277   LearningRate 0.0000   Epoch: 19   Global Step: 247730   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:14:06,644-Speed 3261.13 samples/sec   Loss 0.7325   LearningRate 0.0000   Epoch: 19   Global Step: 247740   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:14:09,718-Speed 3332.68 samples/sec   Loss 0.7283   LearningRate 0.0000   Epoch: 19   Global Step: 247750   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:14:12,863-Speed 3256.46 samples/sec   Loss 0.6776   LearningRate 0.0000   Epoch: 19   Global Step: 247760   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:14:15,968-Speed 3299.59 samples/sec   Loss 0.7321   LearningRate 0.0000   Epoch: 19   Global Step: 247770   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:14:19,056-Speed 3317.14 samples/sec   Loss 0.7363   LearningRate 0.0000   Epoch: 19   Global Step: 247780   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:14:22,179-Speed 3279.83 samples/sec   Loss 0.7386   LearningRate 0.0000   Epoch: 19   Global Step: 247790   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:14:25,314-Speed 3267.13 samples/sec   Loss 0.7262   LearningRate 0.0000   Epoch: 19   Global Step: 247800   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:14:28,453-Speed 3263.06 samples/sec   Loss 0.7281   LearningRate 0.0000   Epoch: 19   Global Step: 247810   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:14:31,585-Speed 3270.71 samples/sec   Loss 0.7191   LearningRate 0.0000   Epoch: 19   Global Step: 247820   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:14:34,736-Speed 3251.04 samples/sec   Loss 0.7412   LearningRate 0.0000   Epoch: 19   Global Step: 247830   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:14:37,920-Speed 3217.26 samples/sec   Loss 0.7367   LearningRate 0.0000   Epoch: 19   Global Step: 247840   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:14:41,035-Speed 3287.90 samples/sec   Loss 0.7176   LearningRate 0.0000   Epoch: 19   Global Step: 247850   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:14:44,176-Speed 3261.43 samples/sec   Loss 0.7331   LearningRate 0.0000   Epoch: 19   Global Step: 247860   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:14:47,262-Speed 3319.55 samples/sec   Loss 0.7029   LearningRate 0.0000   Epoch: 19   Global Step: 247870   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:14:50,448-Speed 3214.25 samples/sec   Loss 0.7583   LearningRate 0.0000   Epoch: 19   Global Step: 247880   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:14:53,634-Speed 3215.16 samples/sec   Loss 0.7418   LearningRate 0.0000   Epoch: 19   Global Step: 247890   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:14:56,725-Speed 3314.48 samples/sec   Loss 0.7541   LearningRate 0.0000   Epoch: 19   Global Step: 247900   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:14:59,804-Speed 3327.02 samples/sec   Loss 0.6751   LearningRate 0.0000   Epoch: 19   Global Step: 247910   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:15:02,909-Speed 3298.60 samples/sec   Loss 0.7410   LearningRate 0.0000   Epoch: 19   Global Step: 247920   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:15:06,071-Speed 3239.76 samples/sec   Loss 0.6958   LearningRate 0.0000   Epoch: 19   Global Step: 247930   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:15:09,128-Speed 3350.68 samples/sec   Loss 0.6941   LearningRate 0.0000   Epoch: 19   Global Step: 247940   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:15:12,206-Speed 3327.94 samples/sec   Loss 0.7276   LearningRate 0.0000   Epoch: 19   Global Step: 247950   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:15:15,293-Speed 3318.17 samples/sec   Loss 0.7226   LearningRate 0.0000   Epoch: 19   Global Step: 247960   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:15:18,362-Speed 3337.30 samples/sec   Loss 0.6977   LearningRate 0.0000   Epoch: 19   Global Step: 247970   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:15:21,462-Speed 3304.64 samples/sec   Loss 0.7039   LearningRate 0.0000   Epoch: 19   Global Step: 247980   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:15:24,553-Speed 3313.08 samples/sec   Loss 0.7429   LearningRate 0.0000   Epoch: 19   Global Step: 247990   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:15:27,664-Speed 3293.74 samples/sec   Loss 0.7194   LearningRate 0.0000   Epoch: 19   Global Step: 248000   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:15:30,762-Speed 3305.77 samples/sec   Loss 0.7178   LearningRate 0.0000   Epoch: 19   Global Step: 248010   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:15:33,900-Speed 3264.80 samples/sec   Loss 0.7462   LearningRate 0.0000   Epoch: 19   Global Step: 248020   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:15:37,023-Speed 3279.64 samples/sec   Loss 0.7216   LearningRate 0.0000   Epoch: 19   Global Step: 248030   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:15:40,180-Speed 3245.42 samples/sec   Loss 0.7015   LearningRate 0.0000   Epoch: 19   Global Step: 248040   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:15:43,315-Speed 3267.04 samples/sec   Loss 0.7544   LearningRate 0.0000   Epoch: 19   Global Step: 248050   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:15:46,433-Speed 3284.79 samples/sec   Loss 0.6915   LearningRate 0.0000   Epoch: 19   Global Step: 248060   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:15:49,536-Speed 3300.94 samples/sec   Loss 0.7029   LearningRate 0.0000   Epoch: 19   Global Step: 248070   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:15:52,719-Speed 3218.34 samples/sec   Loss 0.7110   LearningRate 0.0000   Epoch: 19   Global Step: 248080   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:15:55,841-Speed 3282.01 samples/sec   Loss 0.7163   LearningRate 0.0000   Epoch: 19   Global Step: 248090   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 23:15:58,926-Speed 3320.17 samples/sec   Loss 0.7276   LearningRate 0.0000   Epoch: 19   Global Step: 248100   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 23:16:02,110-Speed 3216.91 samples/sec   Loss 0.7691   LearningRate 0.0000   Epoch: 19   Global Step: 248110   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-04-27 23:16:05,290-Speed 3221.30 samples/sec   Loss 0.6804   LearningRate 0.0000   Epoch: 19   Global Step: 248120   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:16:08,353-Speed 3344.28 samples/sec   Loss 0.7440   LearningRate 0.0000   Epoch: 19   Global Step: 248130   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:16:11,408-Speed 3352.27 samples/sec   Loss 0.7041   LearningRate 0.0000   Epoch: 19   Global Step: 248140   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:16:14,583-Speed 3225.71 samples/sec   Loss 0.7067   LearningRate 0.0000   Epoch: 19   Global Step: 248150   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:16:17,759-Speed 3225.45 samples/sec   Loss 0.7346   LearningRate 0.0000   Epoch: 19   Global Step: 248160   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:16:20,864-Speed 3299.03 samples/sec   Loss 0.7203   LearningRate 0.0000   Epoch: 19   Global Step: 248170   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:16:23,925-Speed 3346.59 samples/sec   Loss 0.7066   LearningRate 0.0000   Epoch: 19   Global Step: 248180   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:16:27,024-Speed 3305.85 samples/sec   Loss 0.7494   LearningRate 0.0000   Epoch: 19   Global Step: 248190   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:16:30,241-Speed 3183.57 samples/sec   Loss 0.6842   LearningRate 0.0000   Epoch: 19   Global Step: 248200   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:16:33,312-Speed 3335.02 samples/sec   Loss 0.7442   LearningRate 0.0000   Epoch: 19   Global Step: 248210   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:16:36,399-Speed 3318.14 samples/sec   Loss 0.7261   LearningRate 0.0000   Epoch: 19   Global Step: 248220   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:16:39,513-Speed 3289.91 samples/sec   Loss 0.7124   LearningRate 0.0000   Epoch: 19   Global Step: 248230   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:16:42,610-Speed 3307.88 samples/sec   Loss 0.7437   LearningRate 0.0000   Epoch: 19   Global Step: 248240   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:16:45,710-Speed 3303.98 samples/sec   Loss 0.6977   LearningRate 0.0000   Epoch: 19   Global Step: 248250   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:16:48,786-Speed 3329.61 samples/sec   Loss 0.6934   LearningRate 0.0000   Epoch: 19   Global Step: 248260   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:16:51,878-Speed 3313.35 samples/sec   Loss 0.7257   LearningRate 0.0000   Epoch: 19   Global Step: 248270   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:16:55,009-Speed 3271.15 samples/sec   Loss 0.7196   LearningRate 0.0000   Epoch: 19   Global Step: 248280   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:16:58,069-Speed 3347.88 samples/sec   Loss 0.7016   LearningRate 0.0000   Epoch: 19   Global Step: 248290   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:17:01,183-Speed 3289.09 samples/sec   Loss 0.7220   LearningRate 0.0000   Epoch: 19   Global Step: 248300   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:17:04,335-Speed 3250.45 samples/sec   Loss 0.7139   LearningRate 0.0000   Epoch: 19   Global Step: 248310   Fp16 Grad Scale: 8192   Required: 0 hours
Training: 2022-04-27 23:17:07,428-Speed 3310.90 samples/sec   Loss 0.7703   LearningRate 0.0000   Epoch: 19   Global Step: 248320   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:17:10,479-Speed 3356.84 samples/sec   Loss 0.7572   LearningRate 0.0000   Epoch: 19   Global Step: 248330   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:17:13,570-Speed 3315.09 samples/sec   Loss 0.6957   LearningRate 0.0000   Epoch: 19   Global Step: 248340   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:17:16,692-Speed 3280.15 samples/sec   Loss 0.7138   LearningRate 0.0000   Epoch: 19   Global Step: 248350   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:17:19,808-Speed 3287.17 samples/sec   Loss 0.7205   LearningRate 0.0000   Epoch: 19   Global Step: 248360   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:17:22,865-Speed 3351.36 samples/sec   Loss 0.7146   LearningRate 0.0000   Epoch: 19   Global Step: 248370   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:17:25,969-Speed 3299.69 samples/sec   Loss 0.7411   LearningRate 0.0000   Epoch: 19   Global Step: 248380   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:17:29,071-Speed 3302.71 samples/sec   Loss 0.7359   LearningRate 0.0000   Epoch: 19   Global Step: 248390   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:17:32,145-Speed 3331.28 samples/sec   Loss 0.7493   LearningRate 0.0000   Epoch: 19   Global Step: 248400   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:17:35,426-Speed 3122.22 samples/sec   Loss 0.6978   LearningRate 0.0000   Epoch: 19   Global Step: 248410   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-04-27 23:17:38,502-Speed 3330.48 samples/sec   Loss 0.7207   LearningRate 0.0000   Epoch: 19   Global Step: 248420   Fp16 Grad Scale: 32768   Required: -0 hours